Catégorie : Uncategorized
-
Search PDFs with AI and Python
With neural search seeing rapid adoption, more people are looking at using it for indexing and searching through their unstructured data. I know several folks already building PDF search engines powered by AI, so I figured I’d give it a stab too. How hard could it possibly be? https://jina.ai/news/search-pdfs-with-ai-and-python-part-1/ Search PDFs with AI and Python:…
-
Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI
The success of today’s AI applications requires not only model training (Model- centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools can not perform AL tasks efficiently. To this end, this paper presents an efficient MLOps system for AL, named ALaaS (Active-Learning-as-a-Service). Specifically, ALaaS…
-
ALaaS: Active Learning as a Service.
Active Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled from a full dataset so to reduce labeling cost. It provides a out-of-the-box and standalone experience for users to quickly utilize active learning. https://github.com/MLSysOps/Active-Learning-as-a-Service MLSysOps/Active-Learning-as-a-Service: A scalable & efficient active learning/data selection system for everyone.…
-
Pandas DataFrame Tutorial – Beginner’s Guide to GPU Accelerated DataFrames in Python
This post is the first installment of the series of introductions to the RAPIDS ecosystem. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process geospatial, signal, and system log data, or use…
-
Best NLP Papers — October 2022
If you work in NLP, it’s important to keep up to date with the latest research. In this post, we look at some of the best papers on NLP that were published in October 2022 https://txt.cohere.ai/best-nlp-papers-october-2022/ Best NLP Papers — October 2022 TL;DR:- Use an invoice scanning application to scan invoices and extract data for…
-
GitHub – jupyter-naas/awesome-notebooks: Ready to use data science templates, organized by tools to jumpstart your projects and data products in minutes. 😎 published by the Naas community.
https://github.com/jupyter-naas/awesome-notebooks
-
GitHub – kathrinse/be_great: A novel approach for synthesizing tabular data using pretrained large language models
GReaT framework utilizes the capabilities of pretrained large language Transformer models to synthesize realistic tabular data. https://github.com/kathrinse/be_great
-
Outperform OpenAI GPT-3 with SetFit for text-classifiation
https://www.philschmid.de/getting-started-setfit Outperform OpenAI GPT-3 with SetFit for text-classifiation 2. Create Dataset. We are going to use the ag_news dataset, which a news article classification dataset with 4 classes: World (0), Sports (1), Business (2), Sci/Tech (3).. The test split of the dataset contains 7600 examples, which is will be used to evaluate our model. The…
-
How to Add a User Authentication Service in Streamlit
https://towardsdatascience.com/how-to-add-a-user-authentication-service-in-streamlit-a8b93bf02031 How to Add a User Authentication Service in Streamlit Photo by Franck on Unsplash Streamlit. Streamlit has come an extremely long way since its inception back in October of 2019. It has empowered the software development community and has effectively democratized the way we develop and deploy apps to the cloud. towardsdatascience.com
-
AI Automated dubbing : Building an end-to-end Youtube audio translation platform using AWS serverless architecture | by Ramsri Goutham | The Startup | Medium
https://medium.com/swlh/ai-automated-dubbing-building-and-end-to-end-youtube-audio-translation-platform-using-aws-ebe5b0f153dd
-
Lightly is One of the First Open Source Frameworks for Self-Supervised Learning | by Jesus Rodriguez | Oct, 2022 | Medium
The library implements some of the most common algorithms and pretrained models for SSL. https://jrodthoughts.medium.com/lightly-is-one-of-the-first-open-source-frameworks-for-self-supervised-learning-e1754f063f0f
-
Create a Maintainable Data Pipeline with Prefect and DVC
Make Your Pipelines Easier to Support and Maintain https://towardsdatascience.com/create-a-maintainable-data-pipeline-with-prefect-and-dvc-1d691ea5bcea https://github.com/khuyentran1401/prefect-dvc/ Create a Maintainable Data Pipeline with Prefect and DVC Let’s look at how you can leverage Prefect and DVC together to create a maintainable data pipeline. Easy to Understand. When the components of the pipeline are well-defined and named appropriately, reviewers can focus on the…
-
Hyperparameter Optimization of Machine Learning Algorithms
To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model’s performance. In this paper, optimizing the hyper-parameters of common machine learning models is studied. We introduce several state-of-the-art optimization techniques and discuss how to apply…
-
ZenML 0.20.0: Our Biggest Release Yet!
https://blog.zenml.io/zenml-revamped/ ZenML 0.20.0: Our Biggest Release Yet! | ZenML Blog The 0.20.0 release is a seminal release in the history of ZenML. Following ten months of continuous feedback and iteration, we bring you a whole new architecture and redesign of ZenML – and a new dashboard to boot! Collaboration among teams has also been taken…
-
🧨 Stable Diffusion in JAX / Flax !
🤗 Hugging Face Diffusers supports Flax since version 0.5.1! This allows for super fast inference on Google TPUs, such as those available in Colab or through Google Cloud Platform https://huggingface.co/blog/stable_diffusion_jax
-
Azure Kubernetes Service Ignite Announcements – Microsoft Community Hub
https://techcommunity.microsoft.com/t5/apps-on-azure-blog/azure-kubernetes-service-ignite-announcements/ba-p/3650443
-
GitHub – AutoViML/featurewiz: Use advanced feature engineering strategies and select best features from your data set with a single line of code.
https://github.com/AutoViML/featurewiz
-
React and Next.js is DEAD — Something New is (Finally) Replacing It (For Good)
Qwik offers the fastest possible page load times – regardless of the complexity of your website. Qwik is so fast because it allows fully interactive sites to load with almost no JavaScript and pickup from where the server left off. https://medium.com/javascript-in-plain-english/react-and-next-js-is-dead-something-new-is-finally-replacing-it-for-good-c792c48806f6 React and Next.js is DEAD — Something New is (Finally) Replacing It (For Good) Is this…
-
From table to text with Narratable
When we talk about Natural Language Processing (NLP), we’re used to the term data-to-text. But at Narrativa we’ve taken it to another level: we’ve created the first BLOOM-based open-source model that converts data from a table into text. Introducing Narratable. https://www.narrativa.com/our-new-open-source-model-is-here-from-table-to-text-with-narratable/ Our new open-source model is here: from table to text with Narratable In the…
-
Tree Mover’s Distance: Bridging Graph Metrics and Stability of Graph Neural Networks
Understanding generalization and robustness of machine learning models fundamentally relies on assuming an appropriate metric on the data space. Identifying such a metric is particularly challenging for non-Euclidean data such as graphs. Here, we propose a pseudometric for attributed graphs, the Tree Mover’s Distance (TMD), and study its relation to generalization. Via a hierarchical optimal…
-
Machine Capable of Creating Thinking
Create your own Mid-journey via VQGAN+CLIP https://bobrupakroy.medium.com/machine-capable-of-creating-thinking-5b1d3ace1f41 Machine Capable of Creating Thinking Create your own Mid-journey via VQGAN+CLIP bobrupakroy.medium.com
-
Synthetic Data Metrics
Researchers at MIT Startup ‘DataCebo,’ Introduce Synthetic Data Metrics: An Open-Source Python Library That Evaluates Synthetic Data By Comparing It To The Real Data That You’re Trying To Mimic https://github.com/sdv-dev/SDMetrics GitHub – sdv-dev/SDMetrics: Metrics to evaluate quality and efficacy of synthetic datasets. The Synthetic Data Vault Project was first created at MIT’s Data to AI…
-
Yellowbrick: Machine Learning Visualization
Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib. https://www.scikit-yb.org/en/latest/
-
Skyplane: 110x faster data transfers on any cloud
Skyplane is an open-source developer tool for transferring data across cloud object stores. Skyplane is 164x faster than rsync and 113x faster than AWS DataSync. What would have taken nearly half of a working day now takes less than two minutes. https://medium.com/@paras_jain/skyplane-110x-faster-data-transfers-on-any-cloud-8f0165c1d711
-
DeepMind unveils first AI to discover faster matrix multiplication algorithms
AlphaTensor discovered algorithms that are more efficient than the state of the art for many matrix sizes and outperform human-designed ones. https://venturebeat.com/ai/deepmind-unveils-first-ai-to-discover-faster-matrix-multiplication-algorithms/
-
HoloViz Blog
Panel is an open-source library that lets you create custom interactive web apps and dashboards by connecting widgets to plots, images, tables, and text – all while writing only Python! https://blog.holoviz.org/panel_0.14.html
-
GitHub – axa-group/Parsr: Transforms PDF, Documents and Images into Enriched Structured Data
https://github.com/axa-group/Parsr
-
The Rise of the Semantic Layer: Metrics On-The-Fly
A semantic layer is something we use every day. We build dashboards with yearly and monthly aggregations. We design dimensions for drilling down reports by region, product, or whatever metrics we are interested in. What has changed is that we no longer use a singular business intelligence tool; different teams use different visualizations (BI, notebooks,…
-
Google AI Blog: View Synthesis with Transformers
A long-standing problem in the intersection of computer vision and computer graphics, view synthesis is the task of creating new views of a scene from multiple pictures of that scene. This has received increased attention, since the introduction of neural radiance fields (NeRF). The problem is challenging because to accurately synthesize new views of a…
-
Dockerize your SQL Server and use it in ASP.NET Core with Entity Framework Core
https://www.twilio.com/blog/containerize-your-sql-server-with-docker-and-aspnet-core-with-ef-core
-
Why You Should Learn Machine Learning Engineering and Not Data Science
According to "Thomas H. Davenport and DJ Patil, Data Scientist: The Sexiest Job of the 21st Century (2012), Harvard Business Review" Data Science is one of the most sought after jobs on the job market. But is this still the case? Or is there already a more desirable one? There is! Machine learning engineering is…
-
Machine Learning for Everyone
« Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it. If you ever tried to read articles about machine learning on the Internet, most likely you stumbled upon two types of them: thick academic trilogies filled with theorems (I…
-
Active Learning with AutoNLP and Prodigy
https://huggingface.co/blog/autonlp-prodigy
-
GitHub – AykutSarac/jsoncrack.com: 🔮 Seamlessly visualize your JSON data instantly into graphs; paste, import or fetch!
https://github.com/AykutSarac/jsoncrack.com
-
7 Machine Learning Portfolio Projects to Boost the Resume – KDnuggets
https://www.kdnuggets.com/2022/09/7-machine-learning-portfolio-projects-boost-resume.html
-
GitHub – Eventual-Inc/Daft: Python DataFrame for Complex Data
https://github.com/Eventual-Inc/Daft
-
3 Notion No-Code Tools and Resources
https://medium.com/wearenocode/3-notion-no-code-tools-and-resources-you-dont-know-2875ecf9af28
-
14 September, 2022 01:04
https://blog.n8n.io/hyperautomation-trends/
-
Pickle Scanning
https://huggingface.co/docs/hub/security-pickle
-
GitHub – MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://github.com/MaartenGr/BERTopic
-
GitHub – amazon-research/alexa-teacher-models
https://github.com/amazon-research/alexa-teacher-models
-
11 September, 2022 21:08
https://hbr.org/2022/09/emotions-arent-the-enemy-of-good-decision-making
-
10X Faster Hugging Face with Modin
https://ponder.io/faster-hugging-face-with-modin/
-
Meta AIs shocking insight about Big Data and Deep Learning | by Devansh- Machine Learning Made Simple | Geek Culture | Sep, 2022 | Medium
https://medium.com/geekculture/meta-ais-shocking-insight-about-big-data-and-deep-learning-857f9f2b9ac5
-
🧨 Diffusers
https://huggingface.co/docs/diffusers/index
-
Unsupervised scene sketch to photo synthesis – Amazon Science
https://www.amazon.science/publications/unsupervised-scene-sketch-to-photo-synthesis
-
GitHub – salesforce/OmniXAI: OmniXAI: A Library for eXplainable AI
https://github.com/salesforce/OmniXAI
-
12 Most Popular NLP Projects of 2022 So Far | ODSC.com Blogs
https://odsc.com/blog/12-most-popular-nlp-projects-of-2022-so-far/?utm_campaign=Newsletters&utm_medium=email&_hsmi=2&_hsenc=p2ANqtz—uIUHBTu33N6O7sCbV1wl-mBl73GFaGCilasAwgR8B_QnGGuEocTcN4pcNMxCMEwK16tiOW_WqIBrM70ZZOU5O-PFig&utm_content=2&utm_source=hs_email