Catégorie : Notes

4 February, 2021 19:59

https://github.com/huggingface/transformers/issues/9996

4 février 2021
How to create stunning visualizations using python from scratch – KDnuggets

https://www.kdnuggets.com/2021/02/stunning-visualizations-using-python.html

4 février 2021
3 February, 2021 10:06

https://github.com/explosion/spaCy/releases/tag/v3.0.0/

3 février 2021
Run VSCode (codeserver) on Google Colab or Kaggle Notebooks

https://github.com/abhishekkrthakur/colabcode

2 février 2021
Data Science in Production: Building Automated Data/ML pipelines in Apache Airflow

https://towardsdatascience.com/data-science-in-production-building-automated-data-ml-pipelines-in-apache-airflow-1fa6d434aeb8

1 février 2021
eDEX-UI – Coolest Linux Terminal

http://www.linuxandubuntu.com/home/edex-ui-coolest-linux-terminal eDEX-UI – Coolest Linux Terminal Linux terminal may be the most boring application for new Linux users but it’s undoubtedly very useful. Once a user gets used to it, it’s more powerful than a GUI app. eDEX-UI, a fullscreen Linux terminal, system monitor, and network monitor is the coolest terminal application I have ever…

31 janvier 2021
Language Models are Open Knowledge Graphs .. but are hard to mine!

https://towardsdatascience.com/language-models-are-open-knowledge-graphs-but-are-hard-to-mine-13e128f3d64d Language Models are Open Knowledge Graphs .. but are hard to mine! Join me as I dive into the latest research on creating knowledge graphs using transformer based language models towardsdatascience.com

28 janvier 2021
24 January, 2021 23:40

https://devblogs.microsoft.com/python/python-in-visual-studio-code-january-2021-release/ Python in Visual Studio Code – January 2021 Release | Python We are pleased to announce that the January 2021 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. devblogs.microsoft.com

24 janvier 2021
GitHub – jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).

https://github.com/jalammar/ecco

9 janvier 2021
Data Science Infographic

https://github.com/dataprofessor/infographic GitHub – dataprofessor/infographic: Infographic Number Title Preview YouTube Video; 01: Building the Machine Learning Model. Construindo um Modelo Supervisionado de Machine Learning (Portugese Translation of "Building the Machine Learning Model") github.com

7 janvier 2021
SIRUS: Stable and Interpretable RUle Set

https://gitlab.com/drti/sirus DRTI / sirus · GitLab SIRUS is a regression and classification algorithm based on random forests, which takes the form of a stable and short list of rules. gitlab.com

7 janvier 2021
CLIP: Connecting Text and Images

https://openai.com/blog/clip/

6 janvier 2021
Build Your own Recommendation Engine-Netflix Demystified: Demo+Code

https://towardsai.net/p/machine-learning/build-your-own-recommendation-engine-netflix-demystified-demo-code-550401d4885e Build Your own Recommendation Engine-Netflix Demystified: Demo+Code – Towards AI – Towards AI — The Best of Tech, Science, and Engineering User-Based Collaborative filtering-Similar reccos for different movies. 3. Item-based filtering: First thing first, it does not need any user-level data, and the recommendation engine can be up and running even in an isolated…

3 janvier 2021
MLOps Tooling Landscape v2

https://huyenchip.com/2020/12/30/mlops-v2.html MLOps Tooling Landscape v2 (+84 new tools) – Dec ’20 Last June, I published the post What I learned from looking at 200 machine learning tools.The post got some attention and I got a lot of messages from people telling me about new tools. I updated the old list to now include 284 tools.…

2 janvier 2021
DynaSent: Dynamic Sentiment Analysis Dataset

https://github.com/cgpotts/dynasent GitHub – cgpotts/dynasent: DynaSent: Dynamic Sentiment Analysis Dataset Details: ‘hit_ids’: List of Amazon Mechanical Turk Human Interface Tasks (HITs) in which this example appeared during validation.The values are anonymized but used consistently throughout the dataset. ‘sentence’: The example text. ‘indices_into_review_text’: indices of ‘sentence’ into the original review in the Yelp Academic Dataset. ‘model_0_label’: prediction…

2 janvier 2021
Koan – A word2vec negative sampling implementation with correct CBOW update. kan only depends on Eigen.

https://github.com/bloomberg/koan GitHub – bloomberg/koan: A word2vec negative sampling implementation with correct CBOW update. Although continuous bag of word (CBOW) embeddings can be trained more quickly than skipgram (SG) embeddings, it is a common belief that SG embeddings tend to perform better in practice. This was observed by the original authors of Word2Vec [1] and also…

2 janvier 2021
The Pile An 800GB Dataset of Diverse Text for Language Modeling

https://pile.eleuther.ai/ The Pile The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. pile.eleuther.ai

2 janvier 2021
Ludwig

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code. https://github.com/ludwig-ai/ludwig

2 janvier 2021
Learn Python – Full Course for Beginners Tutorial

https://www.youtube.com/watch?v=rfscVS0vtbw&feature=youtu.be Learn Python – Full Course for Beginners [Tutorial] – YouTube This course will give you a full introduction into all of the core concepts in python. Follow along with the videos and you’ll be a python programmer in no t… www.youtube.com

2 janvier 2021
GitHub – jrieke/traingenerator: 🧙 A web app to generate template code for machine learn ing

https://github.com/jrieke/traingenerator

30 décembre 2020
Financial Times Visual Vocabulary

A poster (available in English, Japanese, traditional Chinese and simplified Chinese) and web site to assist designers and journalists to select the optimal symbology for data visualisations, by the Financial Times Visual Journalism Team. The FT Visual Vocabulary is at the core of a newsroom-wide training session aimed at improving chart literacy. This learning resource…

28 décembre 2020
CS 229 – Machine Learning Tips and Tricks Cheatsheet

https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

28 décembre 2020
Deep Hybrid Learning — a fusion of conventional ML with state of the art DL

https://towardsdatascience.com/deep-hybrid-learning-a-fusion-of-conventional-ml-with-state-of-the-art-dl-cb43887fe14 Deep Hybrid Learning — a fusion of conventional ML with state of the art DL | by Aditya Bhattacharya | Towards Data Science Image source: Pixabay Considering state-of-the-art methods for unstructured data analysis, Deep Learning has been known to play an extremely vital role in coming up sophisticated algorithms and model architectures, to auto-unwrap…

28 décembre 2020
What to expect from cloud data analytics in 2021

https://cloudblog-withgoogle-com.cdn.ampproject.org/c/s/cloudblog.withgoogle.com/products/data-analytics/what-to-expect-from-cloud-data-analytics-in-2021/amp/

25 décembre 2020
Machine Learning in Production

https://towardsdatascience.com/machine-learning-in-production-95e1999bba84

25 décembre 2020
Classifying Sentiment from Text Reviews

https://towardsdatascience.com/classifying-sentiment-from-text-reviews-a2c65ea468d6

25 décembre 2020
Build a Natural Language Classifier With Bert and Tensorflow

https://medium.com/better-programming/build-a-natural-language-classifier-with-bert-and-tensorflow-4770d4442d41 How to Build a Classifier with BERT in TensorFlow | Better Programming We cover how to build a natural language classifier using transformers (BERT) and TensorFlow 2 in Python. This is a simple, step-by-step tutorial. medium.com

25 décembre 2020
.NET Core with NGINX on Linux

https://irina.codes/net-core-with-nginx-on-linux/ .NET Core with NGINX on Linux – Irina Scurtu Having .NET Core with NGINX on Linux is easier that you might imagine. In this article I will talk about my experience related to NGINX and what it takes to configure it for the first time. irina.codes

25 décembre 2020
NumPy Illustrated: The Visual Guide to NumPy

https://medium.com/better-programming/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d NumPy Illustrated: The Visual Guide to NumPy | by Lev Maximov | Better Programming | Dec, 2020 | Medium NumPy is a fundamental library that most of the widely used Python data processing libraries are built upon (pandas), inspired by (PyTorch), or can efficiently share data with (TensorFlow, Keras… medium.com

25 décembre 2020
Haystack is an end-to-end framework for Question Answering & Neural search

https://github.com/deepset-ai/haystack deepset-ai/haystack: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub… – GitHub Core Features. Latest models: Utilize all latest transformer based models (e.g. BERT, RoBERTa, MiniLM) for extractive QA, generative QA and document retrieval.; Modular: Multiple choices to fit your tech stack and…

25 décembre 2020
Building A Faster & Accurate Search Engine with Transformers & Haystack

https://datamuni.com/@shivanandroy/Building-a-faster-accurate-search-engine-with-transformers DataMuni No non-sense, to-the-point, peer-reviewed articles and tutorials about Machine Learning, Deep Learning, Artificial Intelligence, Python and all things Data Science. datamuni.com

25 décembre 2020
Console

A list of the latest open source projects curated by an Amazon software engineer delivered to your email weekly. https://console.substack.com/

24 décembre 2020
Google, Apple, and others show large language models trained on public data expose personal information | VentureBeat

https://venturebeat.com/2020/12/16/google-apple-and-others-show-large-language-models-trained-on-public-data-expose-personal-information/

24 décembre 2020
Supporting content decision makers with machine learning

https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f

23 décembre 2020
Jupyter Notebook in Excel

https://towardsdatascience.com/python-jupyter-notebooks-in-excel-5ab34fc6439

23 décembre 2020
Spark and Docker: Your Spark development cycle just got 10x faster ! – Data Mechanics Blog

https://www.datamechanics.co/blog-post/spark-and-docker-your-spark-development-cycle-just-got-ten-times-faster

22 décembre 2020
How to Manage Python Dependencies in Spark – The Databricks Blog

https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html

22 décembre 2020
Unsupervised synonym harvesting

https://towardsdatascience.com/unsupervised-synonym-harvesting-d592eaaf3c15 Unsupervised synonym harvesting. A hybrid approach combining symbolic… | by Ajit Rajasekharan | Dec, 2020 | Towards Data Science Extracting all the different ways a particular term can be referred to (synonym harvesting) is key for applications in biomedical domain where drugs, genes etc. have many synonyms. While there are… towardsdatascience.com

22 décembre 2020
H2O Wave

H2O Wave is a software stack for building beautiful, low-latency, realtime, browser-based applications and dashboards entirely in Python without using HTML, Javascript, or CSS. https://h2oai.github.io/wave/docs/getting-started/

20 décembre 2020
GitHub – WillianFuks/tfcausalimpact: Google’s Causal Impact Algorithm Implemented on Top of TensorFlow Probability

https://github.com/WillianFuks/tfcausalimpact

20 décembre 2020
OpenBlender

Fuel your ML Engines with Relevant Data to Boost Performance https://www.openblender.io/#/welcome

19 décembre 2020
Azure Functions Kafka Trigger Performance Tips, Concept, and Architecture

https://tsuyoshiushio.medium.com/azure-functions-kafka-trigger-performance-tips-concept-and-architecture-ec94a31d8b93

19 décembre 2020
Main Types of Neural Networks and its Applications

A tutorial on the main types of neural networks and their applications to real-world challenges. https://medium.com/towards-artificial-intelligence/main-types-of-neural-networks-and-its-applications-tutorial-734480d7ec8e

19 décembre 2020
Codex Atlanticus

https://codex-atlanticus.it/

18 décembre 2020
GitHub – graphdeeplearning/graphtransformer: Source code for « A Generalization of Transformer Networks to Graphs », DLG-AAAI’21.

We propose a generalization of transformer neural network architecture for arbitrary graphs: Graph Transformer. https://github.com/graphdeeplearning/graphtransformer

18 décembre 2020
Facebook ReAgent its a Reinforcement Learning Framework you Need to Know About | by Jesus Rodriguez | DataSeries | Dec, 2020 | Medium

ReAgent is a new framework that streamlines the implementation of reasoning systems. https://medium.com/dataseries/facebook-reagent-its-a-reinforcement-learning-framework-you-need-to-know-about-bf5e30dba77e

18 décembre 2020
Tiny four-bit computers are now all you need to train AI

Deep learning is an inefficient energy hog. It requires massive amounts of data and abundant computational resources, which explodes its electricity consumption. In the last few years, the overall research trend has made the problem worse. Models of gargantuan proportions—trained on billions of data points for several days—are in vogue, and likely won’t be going…

16 décembre 2020
GitHub – activeloopai/Hub: The fastest way to access and manage datasets for PyTorch and TensorFlow. Easily build scalable data pipelines. Leading Data 2.0 http://activeloop.ai

https://github.com/activeloopai/Hub!~OMSelectionMarkerStart~!!~OMSelectionMarkerEnd~!

16 décembre 2020