Catégorie : Notes
-
GitHub – eugeneyan/python-collab-template: 🛠 Python project template with unit tests, code coverage, linting, type checking, Makefile wrapper, and GitHub Actions.
https://github.com/eugeneyan/python-collab-template
-
General-purpose, long-context autoregressive modeling with Perceiver AR (Google & DeepMind, 2022)
https://arxiv.org/abs/2202.07765 [2202.07765] General-purpose, long-context autoregressive modeling with Perceiver AR – arXiv.org Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture…
-
29 June, 2022 21:32
https://dirty-cat.github.io/stable/
-
Full article: Five components of social design: A unified framework to support research and practice
https://www.tandfonline.com/doi/full/10.1080/14606925.2022.2088098
-
Researchers From INRIA France Propose ‘Pythae’: An Open-Source Python Library Unifying Common And State-of-the-Art Generative AutoEncoder (GAE) Implementations – MarkTechPost
https://www.marktechpost.com/2022/06/24/researchers-from-inria-france-propose-pythae-an-open-source-python-library-unifying-common-and-state-of-the-art-generative-autoencoder-gae-implementations/ https://github.com/clementchadebec/benchmark_VAE
-
GitHub – SelfExplainML/PiML-Toolbox: PiML (Python Interpretable Machine Learning) toolbox for model development and validation
https://github.com/SelfExplainML/PiML-Toolbox
-
Building Data Reliability Systems Is Hard
https://seattledataguy.substack.com/p/building-data-reliability-systems
-
GitHub – LineaLabs/lineapy: Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.
https://github.com/LineaLabs/lineapy
-
Tracking Progress in Natural Language Processing
https://github.com/sebastianruder/NLP-progress sebastianruder/NLP-progress: Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. – GitHub Wish list. These are tasks and datasets that are still missing: Bilingual dictionary induction; Discourse parsing; Keyphrase extraction; Knowledge base population (KBP) github.com
-
23 June, 2022 10:54
https://github.com/facebookresearch/metaseq/tree/main/projects/OPT
-
Simple scalable graph neural networks | by Michael Bronstein | Towards Data Science
https://towardsdatascience.com/simple-scalable-graph-neural-networks-7eb04f366d07
-
Amazon AI Researchers Open-Source ‘Syne Tune’: A Novel Python Library For Distributed HPO With An Emphasis On Enabling Reproducible Machine Learning Research – MarkTechPost
Researchers at AWS introduced Syne Tune, a library for distributed, large-scale hyperparameter optimization (HPO). The modular design of Syne Tune makes it simple to add new optimization algorithms and swap between various execution backends to support experimentation. https://www.marktechpost.com/2022/06/22/amazon-ai-researchers-open-source-syne-tune-a-novel-python-library-for-distributed-hpo-with-an-emphasis-on-enabling-reproducible-machine-learning-research/
-
Salesforce AI Open-Sources ‘OmniXAI’: A Python-based Machine Learning Library That Provides One-Stop Explainable AI (XAI) Solution To analyze, Debug, And Interprets AI Models – MarkTechPost
https://www.marktechpost.com/2022/06/20/salesforce-ai-open-sources-omnixai-a-python-based-machine-learning-library-that-provides-one-stop-explainable-ai-xai-solution-to-analyze-debug-and-interprets-ai-models/
-
GitHub – mingrammer/diagrams: Diagram as Code for prototyping cloud system architectures
https://github.com/mingrammer/diagrams
-
Synthetic Data Is About To Transform Artificial Intelligence
https://www.forbes.com/sites/robtoews/2022/06/12/synthetic-data-is-about-to-transform-artificial-intelligence/?sh=5569c0007523post Synthetic Data Is About To Transform Artificial Intelligence Synthetic data is one of those ideas that seems almost too good to be true. www.forbes.com
-
Spancat: a new approach for span labeling
https://explosion.ai/blog/spancat
-
GitHub – salesforce/TaiChi: Open source library for few shot NLP
https://github.com/salesforce/TaiChi
-
15 June, 2022 07:18
https://dvc.org/blog/DVC-VS-Code-extension
-
scikit-learn/sklearn-transformers · Hugging Face
https://huggingface.co/scikit-learn/sklearn-transformers
-
9 June, 2022 11:24
https://devblogs.microsoft.com/devops/devops-dojo-okrs-objectives-and-key-results/?utm_content=buffer0b5f4&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
-
Google AI Blog: Vector-Quantized Image Modeling with Improved VQGAN
https://ai.googleblog.com/2022/05/vector-quantized-image-modeling-with.html?m=1
-
GitHub – asahi417/tner: Language model fine-tuning on NER with an easy interface and cross-domain evaluation.
https://github.com/asahi417/tner
-
GitHub – Nixtla/neuralforecast: Scalable and user friendly neural forecasting algorithms for time series data .
https://github.com/Nixtla/neuralforecast
-
GitHub – astrojuanlu/talk-dataframes: Talk « Beyond pandas: The great Python dataframe showdown »
https://github.com/astrojuanlu/talk-dataframes
-
Stenography
https://stenography.dev/
-
GitHub – jboynyc/textnets: Text analysis with networks.
https://github.com/jboynyc/textnets
-
How to Load Any HuggingFace Model in spaCy
https://github.com/explosion/spaCy/discussions/10768 How to Load Any HuggingFace Model in spaCy #10768 – GitHub You can use a different model just by changing the name parameter. Note that if you have any other components that rely on your Transformer, you will need to re-train your pipeline after doing this – you can’t just change the name and…
-
GitHub – labmlai/neox: Simple Annotated implementation of GPT-NeoX in PyTorch
https://github.com/labmlai/neox
-
Python for Data Analysis, 3E
https://wesmckinney.com/book/
-
Data Mesh Architecture
https://www.datamesh-architecture.com/
-
Extract knowledge from text: End-to-end information extraction pipeline with spaCy and Neo4j
https://towardsdatascience.com/extract-knowledge-from-text-end-to-end-information-extraction-pipeline-with-spacy-and-neo4j-502b2b1e0754 Extract knowledge from text: End-to-end information extraction pipeline with spaCy and Neo4j | by Tomaz Bratanic | May, 2022 | Towards Data Science The goal of information extraction pipeline is to extract structured information from unstructured text. Image by the author. While I have already implemented and written about an IE pipeline, I’ve noticed…
-
Creating the Whole Machine Learning Pipeline with PyCaret
https://www.datasource.ai/uploads/624e8836466a40923b64b901b5050c0f.html Creating the Whole Machine Learning Pipeline with PyCaret Recreating the entire experiment without PyCaret requires more than 100 lines of code in most libraries. The library also allows you to do more advanced things, such as advanced pre-processing, ensembling, generalized stacking, and other techniques that allow you to fully customize the ML pipeline and…
-
GitHub – NannyML
Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance. https://github.com/NannyML/nannyml
-
Haystack
Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry. https://haystack.deepset.ai/overview/intro
-
GitHub – SelfExplainML/PiML-Toolbox: PiML (Python Interpretable Machine Learning) toolbox for model development and validation.
https://github.com/SelfExplainML/PiML-Toolbox
-
13 May, 2022 15:10
https://www.scikit-yb.org/en/latest/
-
GitHub – explosion/coreferee: Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages
https://github.com/explosion/coreferee
-
Run Python in Your HTML
https://pyscript.net/ PyScript | Run Python in your HTML Run Python code in your HTML. pyscript.net
-
Scalene: a high-performance CPU, GPU and memory profiler for Python
https://github.com/plasma-umass/scalene plasma-umass/scalene: Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python – GitHub Scalene Overview Scalene talk (PyCon US 2021) This talk presented at PyCon 2021 walks through Scalene’s advantages and how to use it to debug the performance of an application (and provides some technical details on its internals). We highly recommend…
-
Salesforce AI Introduces ‘AI Economist’: A Reinforcement Learning (RL) System That Learns Dynamic Tax Policies To Optimize Equality Along With Productivity In Simulated Economies, Outperforming Alternative Tax Systems – MarkTechPost
https://www.marktechpost.com/2022/05/10/salesforce-ai-introduces-ai-economist-a-reinforcement-learning-rl-system-that-learns-dynamic-tax-policies-to-optimize-equality-along-with-productivity-in-simulated-economies-outperforming-alte/
-
9 May, 2022 20:42
https://github.com/Nixtla/statsforecast/tree/main/experiments/arima_prophet_adapter
-
GitHub – jobergum/browser-ml-inference: Edge Inference in Browser with Transformer NLP model
https://github.com/jobergum/browser-ml-inference
-
GitHub – replicate/cog: Containers for machine learning
https://github.com/replicate/cog
-
Startup Starter Pack | Essentials to launch your startup
https://startupstarterpack.com/
-
Tackling multiple tasks with a single visual language model
https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
-
29 April, 2022 06:28
https://scikit-learn.org/1.1/whats_new/v1.1.html
-
GitHub – chriskiehl/Gooey: Turn (almost) any Python command line program into a full GUI application with one line
https://github.com/chriskiehl/Gooey
-
GitHub – ddangelov/Top2Vec: Top2Vec learns jointly embedded topic, document and word vectors.
https://github.com/ddangelov/Top2Vec