Catégorie : Notes
-
4 February, 2021 19:59
https://github.com/huggingface/transformers/issues/9996
-
How to create stunning visualizations using python from scratch – KDnuggets
https://www.kdnuggets.com/2021/02/stunning-visualizations-using-python.html
-
3 February, 2021 10:06
https://github.com/explosion/spaCy/releases/tag/v3.0.0/
-
Run VSCode (codeserver) on Google Colab or Kaggle Notebooks
https://github.com/abhishekkrthakur/colabcode
-
Data Science in Production: Building Automated Data/ML pipelines in Apache Airflow
https://towardsdatascience.com/data-science-in-production-building-automated-data-ml-pipelines-in-apache-airflow-1fa6d434aeb8
-
eDEX-UI – Coolest Linux Terminal
http://www.linuxandubuntu.com/home/edex-ui-coolest-linux-terminal eDEX-UI – Coolest Linux Terminal Linux terminal may be the most boring application for new Linux users but it’s undoubtedly very useful. Once a user gets used to it, it’s more powerful than a GUI app. eDEX-UI, a fullscreen Linux terminal, system monitor, and network monitor is the coolest terminal application I have ever…
-
Language Models are Open Knowledge Graphs .. but are hard to mine!
https://towardsdatascience.com/language-models-are-open-knowledge-graphs-but-are-hard-to-mine-13e128f3d64d Language Models are Open Knowledge Graphs .. but are hard to mine! Join me as I dive into the latest research on creating knowledge graphs using transformer based language models towardsdatascience.com
-
24 January, 2021 23:40
https://devblogs.microsoft.com/python/python-in-visual-studio-code-january-2021-release/ Python in Visual Studio Code – January 2021 Release | Python We are pleased to announce that the January 2021 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. devblogs.microsoft.com
-
GitHub – jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).
https://github.com/jalammar/ecco
-
Data Science Infographic
https://github.com/dataprofessor/infographic GitHub – dataprofessor/infographic: Infographic Number Title Preview YouTube Video; 01: Building the Machine Learning Model. Construindo um Modelo Supervisionado de Machine Learning (Portugese Translation of "Building the Machine Learning Model") github.com
-
SIRUS: Stable and Interpretable RUle Set
https://gitlab.com/drti/sirus DRTI / sirus · GitLab SIRUS is a regression and classification algorithm based on random forests, which takes the form of a stable and short list of rules. gitlab.com
-
CLIP: Connecting Text and Images
https://openai.com/blog/clip/
-
Build Your own Recommendation Engine-Netflix Demystified: Demo+Code
https://towardsai.net/p/machine-learning/build-your-own-recommendation-engine-netflix-demystified-demo-code-550401d4885e Build Your own Recommendation Engine-Netflix Demystified: Demo+Code – Towards AI – Towards AI — The Best of Tech, Science, and Engineering User-Based Collaborative filtering-Similar reccos for different movies. 3. Item-based filtering: First thing first, it does not need any user-level data, and the recommendation engine can be up and running even in an isolated…
-
MLOps Tooling Landscape v2
https://huyenchip.com/2020/12/30/mlops-v2.html MLOps Tooling Landscape v2 (+84 new tools) – Dec ’20 Last June, I published the post What I learned from looking at 200 machine learning tools.The post got some attention and I got a lot of messages from people telling me about new tools. I updated the old list to now include 284 tools.…
-
DynaSent: Dynamic Sentiment Analysis Dataset
https://github.com/cgpotts/dynasent GitHub – cgpotts/dynasent: DynaSent: Dynamic Sentiment Analysis Dataset Details: ‘hit_ids’: List of Amazon Mechanical Turk Human Interface Tasks (HITs) in which this example appeared during validation.The values are anonymized but used consistently throughout the dataset. ‘sentence’: The example text. ‘indices_into_review_text’: indices of ‘sentence’ into the original review in the Yelp Academic Dataset. ‘model_0_label’: prediction…
-
Koan – A word2vec negative sampling implementation with correct CBOW update. kan only depends on Eigen.
https://github.com/bloomberg/koan GitHub – bloomberg/koan: A word2vec negative sampling implementation with correct CBOW update. Although continuous bag of word (CBOW) embeddings can be trained more quickly than skipgram (SG) embeddings, it is a common belief that SG embeddings tend to perform better in practice. This was observed by the original authors of Word2Vec [1] and also…
-
The Pile An 800GB Dataset of Diverse Text for Language Modeling
https://pile.eleuther.ai/ The Pile The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. pile.eleuther.ai
-
Ludwig
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code. https://github.com/ludwig-ai/ludwig
-
Learn Python – Full Course for Beginners Tutorial
https://www.youtube.com/watch?v=rfscVS0vtbw&feature=youtu.be Learn Python – Full Course for Beginners [Tutorial] – YouTube This course will give you a full introduction into all of the core concepts in python. Follow along with the videos and you’ll be a python programmer in no t… www.youtube.com
-
GitHub – jrieke/traingenerator: 🧙 A web app to generate template code for machine learn ing
https://github.com/jrieke/traingenerator
-
Financial Times Visual Vocabulary
A poster (available in English, Japanese, traditional Chinese and simplified Chinese) and web site to assist designers and journalists to select the optimal symbology for data visualisations, by the Financial Times Visual Journalism Team. The FT Visual Vocabulary is at the core of a newsroom-wide training session aimed at improving chart literacy. This learning resource…
-
CS 229 – Machine Learning Tips and Tricks Cheatsheet
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
-
Deep Hybrid Learning — a fusion of conventional ML with state of the art DL
https://towardsdatascience.com/deep-hybrid-learning-a-fusion-of-conventional-ml-with-state-of-the-art-dl-cb43887fe14 Deep Hybrid Learning — a fusion of conventional ML with state of the art DL | by Aditya Bhattacharya | Towards Data Science Image source: Pixabay Considering state-of-the-art methods for unstructured data analysis, Deep Learning has been known to play an extremely vital role in coming up sophisticated algorithms and model architectures, to auto-unwrap…
-
What to expect from cloud data analytics in 2021
https://cloudblog-withgoogle-com.cdn.ampproject.org/c/s/cloudblog.withgoogle.com/products/data-analytics/what-to-expect-from-cloud-data-analytics-in-2021/amp/
-
Machine Learning in Production
https://towardsdatascience.com/machine-learning-in-production-95e1999bba84
-
Classifying Sentiment from Text Reviews
https://towardsdatascience.com/classifying-sentiment-from-text-reviews-a2c65ea468d6
-
Build a Natural Language Classifier With Bert and Tensorflow
https://medium.com/better-programming/build-a-natural-language-classifier-with-bert-and-tensorflow-4770d4442d41 How to Build a Classifier with BERT in TensorFlow | Better Programming We cover how to build a natural language classifier using transformers (BERT) and TensorFlow 2 in Python. This is a simple, step-by-step tutorial. medium.com
-
.NET Core with NGINX on Linux
https://irina.codes/net-core-with-nginx-on-linux/ .NET Core with NGINX on Linux – Irina Scurtu Having .NET Core with NGINX on Linux is easier that you might imagine. In this article I will talk about my experience related to NGINX and what it takes to configure it for the first time. irina.codes
-
NumPy Illustrated: The Visual Guide to NumPy
https://medium.com/better-programming/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d NumPy Illustrated: The Visual Guide to NumPy | by Lev Maximov | Better Programming | Dec, 2020 | Medium NumPy is a fundamental library that most of the widely used Python data processing libraries are built upon (pandas), inspired by (PyTorch), or can efficiently share data with (TensorFlow, Keras… medium.com
-
Haystack is an end-to-end framework for Question Answering & Neural search
https://github.com/deepset-ai/haystack deepset-ai/haystack: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub… – GitHub Core Features. Latest models: Utilize all latest transformer based models (e.g. BERT, RoBERTa, MiniLM) for extractive QA, generative QA and document retrieval.; Modular: Multiple choices to fit your tech stack and…
-
Building A Faster & Accurate Search Engine with Transformers & Haystack
https://datamuni.com/@shivanandroy/Building-a-faster-accurate-search-engine-with-transformers DataMuni No non-sense, to-the-point, peer-reviewed articles and tutorials about Machine Learning, Deep Learning, Artificial Intelligence, Python and all things Data Science. datamuni.com
-
Console
A list of the latest open source projects curated by an Amazon software engineer delivered to your email weekly. https://console.substack.com/
-
Google, Apple, and others show large language models trained on public data expose personal information | VentureBeat
https://venturebeat.com/2020/12/16/google-apple-and-others-show-large-language-models-trained-on-public-data-expose-personal-information/
-
Supporting content decision makers with machine learning
https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f
-
Jupyter Notebook in Excel
https://towardsdatascience.com/python-jupyter-notebooks-in-excel-5ab34fc6439
-
Spark and Docker: Your Spark development cycle just got 10x faster ! – Data Mechanics Blog
https://www.datamechanics.co/blog-post/spark-and-docker-your-spark-development-cycle-just-got-ten-times-faster
-
How to Manage Python Dependencies in Spark – The Databricks Blog
https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html
-
Unsupervised synonym harvesting
https://towardsdatascience.com/unsupervised-synonym-harvesting-d592eaaf3c15 Unsupervised synonym harvesting. A hybrid approach combining symbolic… | by Ajit Rajasekharan | Dec, 2020 | Towards Data Science Extracting all the different ways a particular term can be referred to (synonym harvesting) is key for applications in biomedical domain where drugs, genes etc. have many synonyms. While there are… towardsdatascience.com
-
H2O Wave
H2O Wave is a software stack for building beautiful, low-latency, realtime, browser-based applications and dashboards entirely in Python without using HTML, Javascript, or CSS. https://h2oai.github.io/wave/docs/getting-started/
-
GitHub – WillianFuks/tfcausalimpact: Google’s Causal Impact Algorithm Implemented on Top of TensorFlow Probability
https://github.com/WillianFuks/tfcausalimpact
-
OpenBlender
Fuel your ML Engines with Relevant Data to Boost Performance https://www.openblender.io/#/welcome
-
Azure Functions Kafka Trigger Performance Tips, Concept, and Architecture
https://tsuyoshiushio.medium.com/azure-functions-kafka-trigger-performance-tips-concept-and-architecture-ec94a31d8b93
-
Main Types of Neural Networks and its Applications
A tutorial on the main types of neural networks and their applications to real-world challenges. https://medium.com/towards-artificial-intelligence/main-types-of-neural-networks-and-its-applications-tutorial-734480d7ec8e
-
Codex Atlanticus
https://codex-atlanticus.it/
-
GitHub – graphdeeplearning/graphtransformer: Source code for « A Generalization of Transformer Networks to Graphs », DLG-AAAI’21.
We propose a generalization of transformer neural network architecture for arbitrary graphs: Graph Transformer. https://github.com/graphdeeplearning/graphtransformer
-
Facebook ReAgent its a Reinforcement Learning Framework you Need to Know About | by Jesus Rodriguez | DataSeries | Dec, 2020 | Medium
ReAgent is a new framework that streamlines the implementation of reasoning systems. https://medium.com/dataseries/facebook-reagent-its-a-reinforcement-learning-framework-you-need-to-know-about-bf5e30dba77e
-
Tiny four-bit computers are now all you need to train AI
Deep learning is an inefficient energy hog. It requires massive amounts of data and abundant computational resources, which explodes its electricity consumption. In the last few years, the overall research trend has made the problem worse. Models of gargantuan proportions—trained on billions of data points for several days—are in vogue, and likely won’t be going…
-
GitHub – activeloopai/Hub: The fastest way to access and manage datasets for PyTorch and TensorFlow. Easily build scalable data pipelines. Leading Data 2.0 http://activeloop.ai
https://github.com/activeloopai/Hub!~OMSelectionMarkerStart~!!~OMSelectionMarkerEnd~!