Catégorie : Notes
-
1 January, 2022 22:39
https://www.marktechpost.com/2022/01/01/ibms-approach-towards-preserving-adversarial-robustness-of-machine-learning-systems/
-
Alternative Feature Selection Methods in Machine Learning – KDnuggets
https://www.kdnuggets.com/2021/12/alternative-feature-selection-methods-machine-learning.html
-
A Gentle Introduction to PySpark
https://medium.com/@gahogg/a-gentle-introduction-to-pyspark-b4e9a06199b3 A Gentle Introduction to PySpark Learn the Analytics Engine used by Facebook, Netflix, and other Tech Giants medium.com
-
How We Serverlessly Migrated 1.58 Billion Elasticsearch Documents
https://blog.streammonkey.com/how-we-serverlessly-migrated-1-58-billion-elasticsearch-documents-33ad3d0d7c4f
-
Fine-Tuning Bert for Tweets Classification ft. Hugging Face | by Rajan Choudhary | Dec, 2021 | Medium
https://codistro.medium.com/fine-tuning-bert-for-tweets-classification-ft-hugging-face-8afebadd5dbf
-
New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face
https://www.philschmid.de/serverless-transformers-sagemaker-huggingface New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face Amazon SageMaker Serverless Inference. Amazon SageMaker Serverless Inference is a fully managed serverless inference option that makes it easy for you to deploy and scale ML models built on top of AWS Lambda and fully integrated into the Amazon SageMaker service. Serverless Inference…
-
Microsoft Introduces the Next Generation of the Conversational Language Understanding Client Library – MarkTechPost
https://www.marktechpost.com/2021/12/27/microsoft-introduces-the-next-generation-of-the-conversational-language-understanding-client-library/
-
TencentARC/GFPGAN: GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
https://github.com/TencentARC/GFPGAN
-
Advanced NLP with spaCy
https://spacy.io/universe/project/spacy-course
-
MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://github.com/MaartenGr/BERTopic
-
ZenML
https://zenml.io/ ZenML – Reproducible Open-Source MLOps | ZenML ZenML is the open-source MLOps framework for reproducible ML pipelines and production-ready Machine Learning. zenml.io
-
Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition
https://www.philschmid.de/huggingface-transformers-keras-tf
-
End-to-End AutoML Pipeline with H2O AutoML, MLflow, FastAPI, and Streamlit | by Kenneth Leung | Dec, 2021 | Towards Data Science
https://towardsdatascience.com/end-to-end-automl-train-and-serve-with-h2o-mlflow-fastapi-and-streamlit-5d36eedfe606
-
explosion/spacy-streamlit: 👑 spaCy building blocks and visualizers for Streamlit apps
https://github.com/explosion/spacy-streamlit
-
Quickly build Explainable AI dashboards that show the inner workings of so-called « blackbox » machine learning models.
https://github.com/oegedijk/explainerdashboard GitHub – oegedijk/explainerdashboard: Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models. Below a multi-line example, adding a few extra parameters. You can group onehot encoded categorical variables together using the cats parameter. You can either pass a dict specifying a list of onehot cols per categorical…
-
LePetit: A pre-training efficient and lightning fast French Language Model | by Micheli Vincent | Illuin | Medium
https://medium.com/illuin/lepetit-a-pre-training-efficient-and-lightning-fast-french-language-model-96495ad726b3
-
Interpretable_Text_Classification_And_Clustering – a Hugging Face Space by Hellisotherpeople
https://huggingface.co/spaces/Hellisotherpeople/Interpretable_Text_Classification_And_Clustering
-
The birth of an important discovery in deep clustering | by Giansalvo Cirrincione | Dec, 2021 | Towards Data Science
https://towardsdatascience.com/the-birth-of-an-important-discovery-in-deep-clustering-c2791f2f2d82
-
Google AI Blog: Training Machine Learning Models More Efficiently with Dataset Distillation
https://ai.googleblog.com/2021/12/training-machine-learning-models-more.html?m=1
-
d4data/bias-detection-model · Hugging Face
https://huggingface.co/d4data/bias-detection-model
-
Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models
Committee-based models (ensembles or cascades) construct models by combining existing pre-trained ones. While ensembles and cascades are well-known techniques that were proposed before deep learning, they are not considered a core building block of deep model architectures and are rarely compared to in recent literature on developing efficient models. In this work, we go back…
-
vscode.dev Visual Studio Code for the Web
Back in 2019, when the .dev top-level domain opened, we picked up vscode.dev and quickly parked it, pointing at our website code.visualstudio.com (or, if you are from the Boston area like me, we "pahked it"). Like a lot of people who buy a .dev domain, we had no idea what we were going to do…
-
Clustering sentence embeddings to identify intents in short text
The unsupervised learning problem of clustering short-text messages can be turned into a constrained optimization problem to automatically tune UMAP + HDBSCAN hyperparameters. The chatintents package makes it easy to implement this tuning process. User dialogue interactions can be a tremendous source of informati on on how to improve products or services. Understanding why people…
-
7 Considerations Before Pushing Machine Learning Models to Production
Machine learning models are not deterministic: if you train a neural network twice using the same data and the same hyperparameters, you won’t get the same model. The outputs of these two models on the same test data may look very similar, but they are not identical. This difference is due to multiple reasons. A…
-
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning | DeepMind
https://deepmind.com/blog/article/AlphaStar-Grandmaster-level-in-StarCraft-II-using-multi-agent-reinforcement-learning
-
TIL: 1.4 Million Jupyter Notebooks
https://koaning.io/til/2021-10-12-many-notebooks/
-
Deep Neural Networks and Tabular Data: A Survey
Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous data sets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their application to modeling tabular data (inference or generation) remains highly challenging. This work provides…
-
MIT’s Automatic Data-Driven Media Bias Measurement Method Achieves Human-Level Results
Today more than ever, people are voicing concerns regarding biases in news media. Especially in the political arena, there are accusations of favouritism or disfavour in reporting, often expressed through the emphasizing or ignoring of certain political actors, policies, events, or topics. Many regard this as a corruption of the fourth estate… Read More
-
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model – Microsoft Research
Published October 11, 2021 By Ali Alvi , Group Program Manager (Microsoft Turing) Paresh Kharya , Senior Director of Product Management, Accelerated Computing, NVIDIA Research Area We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with…
-
Attention with linear biases: Code for our ALiBi method for transformer language models.
This repository contains the ALiBi code and models for our paper Train Short, Test Long. This file explains how to run our experiments on the WikiText-103 dataset. Read the paper here. Attention with Linear Biases (ALiBi) is very simple! Instead of adding position embeddings at the bottom of the transformer stack (which we don't) we…
-
A step-by-step guide in designing knowledge-driven models using Bayesian theorem.
Data is the fuel for models but you may have witnessed situations where there is no data but solely a domain expert that can very well describe or even predict “the situation” given the circumstances. I will summarize the concepts of knowledge-driven models in terms of Bayesian probabilistic, followed by a hands-on tutorial to demonstrate…
-
Another JupyterLab Extension You Should Know About
Mito is a fre e JupyterLab extension that enables exploring and transforming datasets with the ease of Excel. Mito is a missing pandas extension that we were waiting for years When you start Mito, it shows a spreadsheet view of a pandas Dataframe. With a few clicks, you can perform any CRUD operation. CRUD stands…
-
Google’s Zero-Label Language Learning Achieves Results Competitive With Supervised Learning
While contemporary deep learning models continue to achieve outstanding results across a wide range of tasks, these models are known to have huge data appetites. The emergence of large-scale pretrained language models such as Open AI’s GPT-3 has helped reduce the need for task-specific labelled data in natural language processing… https://medium.com/syncedreview/googles-zero-label-language-learning-achieves-results-competitive-with-supervised-learning-e6dbd984d0e1
-
WIT (Wikipedia-based Image Text) Dataset
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia…
-
EfficientNetV2: Smaller Models and Faster Training
https://arxiv.org/abs/2104.00298 [2104.00298] EfficientNetV2: Smaller Models and Faster Training This paper introduces EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models. To develop this family of models, we use a combination of training-aware neural architecture search and scaling, to jointly optimize training speed and parameter efficiency.…
-
Introduction to MLOps for Data Science
https://pub.towardsai.net/introduction-to-mlops-for-data-science-e2ca5a759f68
-
MIT Presents New Approach for Sequence-to-Sequence Learning with Latent Neural Grammars | Synced
https://syncedreview.com/2021/09/17/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-106/
-
Unsupervised Deep Video Denoising
https://sreyas-mohan.github.io/udvd/ Unsupervised Deep Video Denoising Real Microscopy Videos Denoised Using UDVD UDVD trained on real microscopy datasets.. Deep convolutional neural networks (CNNs) currently achieve state-of-the-art performance in denoising videos. sreyas-mohan.github.io
-
UC Berkeley Uses a Causal Perspective to Formalise the Desiderata for Representation Learning | by Synced | SyncedReview | Sep, 2021 | Medium
https://medium.com/syncedreview/uc-berkeley-uses-a-causal-perspective-to-formalise-the-desiderata-for-representation-learning-f2a7cf9c96a3
-
An introduction to Emotional AI in Business | by Jair Ribeiro | CodeX | Sep, 2021 | Medium
https://medium.com/codex/an-introduction-to-emotional-ai-in-business-ccb72268923c
-
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
https://github.com/salesforce/warp-drive GitHub – salesforce/warp-drive WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit). Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared … github.com
-
H3: Uber’s Hexagonal Hierarchical Spatial Index
https://eng.uber.com/h3/ H3: Uber’s Hexagonal Hierarchical Spatial Index H3: Uber’s Hexagonal Hierarchical Spatial Index. For map projection, we chose to use gnomonic projections centered on icosahedron faces. This projects from Earth as a sphere to an icosahedron, a twenty-sided platonic solid. eng.uber.com
-
Accelerate Transformers on State of the Art Hardware
https://huggingface.co/hardware Optimum: the ML Hardware Optimization Toolkit for Production We’re on a journey to advance and democratize artificial intelligence through open source and open science. huggingface.co
-
16 September, 2021 22:47
https://www.marktechpost.com/2021/09/13/tensorflow-introduces-tensorflow-similarity-an-easy-and-fast-python-package-to-train-similarity-models-using-tensorflow/ TensorFlow Introduces ‘TensorFlow Similarity’, An Easy And Fast Python Package To Train Similarity Models Using TensorFlow – MarkTechPost Asif Razzaq is the editor and cofounder of Marktechpost, LLC. He is an AI Tech Journalist and Digital Health Business Strategist with robust medical device and biotech industry experience and an enviable portfolio in development of…
-
SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression
https://arxiv.org/abs/2007.08954 [2007.08954] SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression – arXiv.org Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to…
-
Top 10 roles in AI and data science
https://hackernoon.com/top-10-roles-for-your-data-science-team-e7f05d90d961
-
Weights & Biases – Developer tools for machine learning
https://wandb.ai/site Weights & Biases – Developer tools for ML A central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live, and share your findings. wandb.ai
-
Inferrd – Deploy your AI models on a GPU 10x faster than any cloud for up to 90% cheaper.
https://inferrd.com/ Inferrd | Deploy AI on GPUs Deploy API on GPUs, in less than a minute, without cold starts, starting at $10 for a 1GB model. inferrd.com