Notes – Page 5 – Notes de Francis

GitHub – eugeneyan/python-collab-template: 🛠 Python project template with unit tests, code coverage, linting, type checking, Makefile wrapper, and GitHub Actions.

https://github.com/eugeneyan/python-collab-template

6 juillet 2022

General-purpose, long-context autoregressive modeling with Perceiver AR (Google & DeepMind, 2022)

https://arxiv.org/abs/2202.07765 [2202.07765] General-purpose, long-context autoregressive modeling with Perceiver AR – arXiv.org Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture…

1 juillet 2022

29 June, 2022 21:32

https://dirty-cat.github.io/stable/

29 juin 2022

Full article: Five components of social design: A unified framework to support research and practice

https://www.tandfonline.com/doi/full/10.1080/14606925.2022.2088098

29 juin 2022

Researchers From INRIA France Propose ‘Pythae’: An Open-Source Python Library Unifying Common And State-of-the-Art Generative AutoEncoder (GAE) Implementations – MarkTechPost

https://www.marktechpost.com/2022/06/24/researchers-from-inria-france-propose-pythae-an-open-source-python-library-unifying-common-and-state-of-the-art-generative-autoencoder-gae-implementations/ https://github.com/clementchadebec/benchmark_VAE

26 juin 2022

GitHub – SelfExplainML/PiML-Toolbox: PiML (Python Interpretable Machine Learning) toolbox for model development and validation

https://github.com/SelfExplainML/PiML-Toolbox

26 juin 2022

Building Data Reliability Systems Is Hard

https://seattledataguy.substack.com/p/building-data-reliability-systems

25 juin 2022

GitHub – LineaLabs/lineapy: Data engineering, simplified. LineaPy creates a frictionless path for taking your data science artifact from development to production.

https://github.com/LineaLabs/lineapy

25 juin 2022

Tracking Progress in Natural Language Processing

https://github.com/sebastianruder/NLP-progress sebastianruder/NLP-progress: Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks. – GitHub Wish list. These are tasks and datasets that are still missing: Bilingual dictionary induction; Discourse parsing; Keyphrase extraction; Knowledge base population (KBP) github.com

24 juin 2022

23 June, 2022 10:54

https://github.com/facebookresearch/metaseq/tree/main/projects/OPT

23 juin 2022

Simple scalable graph neural networks | by Michael Bronstein | Towards Data Science

https://towardsdatascience.com/simple-scalable-graph-neural-networks-7eb04f366d07

23 juin 2022

Amazon AI Researchers Open-Source ‘Syne Tune’: A Novel Python Library For Distributed HPO With An Emphasis On Enabling Reproducible Machine Learning Research – MarkTechPost

Researchers at AWS introduced Syne Tune, a library for distributed, large-scale hyperparameter optimization (HPO). The modular design of Syne Tune makes it simple to add new optimization algorithms and swap between various execution backends to support experimentation. https://www.marktechpost.com/2022/06/22/amazon-ai-researchers-open-source-syne-tune-a-novel-python-library-for-distributed-hpo-with-an-emphasis-on-enabling-reproducible-machine-learning-research/

23 juin 2022

Salesforce AI Open-Sources ‘OmniXAI’: A Python-based Machine Learning Library That Provides One-Stop Explainable AI (XAI) Solution To analyze, Debug, And Interprets AI Models – MarkTechPost

https://www.marktechpost.com/2022/06/20/salesforce-ai-open-sources-omnixai-a-python-based-machine-learning-library-that-provides-one-stop-explainable-ai-xai-solution-to-analyze-debug-and-interprets-ai-models/

21 juin 2022

GitHub – mingrammer/diagrams: Diagram as Code for prototyping cloud system architectures

https://github.com/mingrammer/diagrams

20 juin 2022

Synthetic Data Is About To Transform Artificial Intelligence

https://www.forbes.com/sites/robtoews/2022/06/12/synthetic-data-is-about-to-transform-artificial-intelligence/?sh=5569c0007523post Synthetic Data Is About To Transform Artificial Intelligence Synthetic data is one of those ideas that seems almost too good to be true. www.forbes.com

20 juin 2022

Spancat: a new approach for span labeling

https://explosion.ai/blog/spancat

16 juin 2022

GitHub – salesforce/TaiChi: Open source library for few shot NLP

https://github.com/salesforce/TaiChi

16 juin 2022

15 June, 2022 07:18

https://dvc.org/blog/DVC-VS-Code-extension

15 juin 2022

scikit-learn/sklearn-transformers · Hugging Face

https://huggingface.co/scikit-learn/sklearn-transformers

15 juin 2022

9 June, 2022 11:24

https://devblogs.microsoft.com/devops/devops-dojo-okrs-objectives-and-key-results/?utm_content=buffer0b5f4&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer

9 juin 2022

Google AI Blog: Vector-Quantized Image Modeling with Improved VQGAN

https://ai.googleblog.com/2022/05/vector-quantized-image-modeling-with.html?m=1

6 juin 2022

GitHub – asahi417/tner: Language model fine-tuning on NER with an easy interface and cross-domain evaluation.

https://github.com/asahi417/tner

6 juin 2022

GitHub – Nixtla/neuralforecast: Scalable and user friendly neural forecasting algorithms for time series data .

https://github.com/Nixtla/neuralforecast

6 juin 2022

GitHub – astrojuanlu/talk-dataframes: Talk « Beyond pandas: The great Python dataframe showdown »

https://github.com/astrojuanlu/talk-dataframes

2 juin 2022

Stenography

https://stenography.dev/

31 mai 2022

GitHub – jboynyc/textnets: Text analysis with networks.

https://github.com/jboynyc/textnets

30 mai 2022

How to Load Any HuggingFace Model in spaCy

https://github.com/explosion/spaCy/discussions/10768 How to Load Any HuggingFace Model in spaCy #10768 – GitHub You can use a different model just by changing the name parameter. Note that if you have any other components that rely on your Transformer, you will need to re-train your pipeline after doing this – you can’t just change the name and…

23 mai 2022

GitHub – labmlai/neox: Simple Annotated implementation of GPT-NeoX in PyTorch

https://github.com/labmlai/neox

22 mai 2022

Python for Data Analysis, 3E

https://wesmckinney.com/book/

19 mai 2022

Data Mesh Architecture

https://www.datamesh-architecture.com/

19 mai 2022

Extract knowledge from text: End-to-end information extraction pipeline with spaCy and Neo4j

https://towardsdatascience.com/extract-knowledge-from-text-end-to-end-information-extraction-pipeline-with-spacy-and-neo4j-502b2b1e0754 Extract knowledge from text: End-to-end information extraction pipeline with spaCy and Neo4j | by Tomaz Bratanic | May, 2022 | Towards Data Science The goal of information extraction pipeline is to extract structured information from unstructured text. Image by the author. While I have already implemented and written about an IE pipeline, I’ve noticed…

17 mai 2022

Creating the Whole Machine Learning Pipeline with PyCaret

https://www.datasource.ai/uploads/624e8836466a40923b64b901b5050c0f.html Creating the Whole Machine Learning Pipeline with PyCaret Recreating the entire experiment without PyCaret requires more than 100 lines of code in most libraries. The library also allows you to do more advanced things, such as advanced pre-processing, ensembling, generalized stacking, and other techniques that allow you to fully customize the ML pipeline and…

15 mai 2022

GitHub – NannyML

Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance. https://github.com/NannyML/nannyml

15 mai 2022

Haystack

Haystack is an open-source framework for building search systems that work intelligently over large document collections. Recent advances in NLP have enabled the application of question answering, retrieval and summarization to real world settings and Haystack is designed to be the bridge between research and industry. https://haystack.deepset.ai/overview/intro

15 mai 2022

GitHub – SelfExplainML/PiML-Toolbox: PiML (Python Interpretable Machine Learning) toolbox for model development and validation.

https://github.com/SelfExplainML/PiML-Toolbox

14 mai 2022

13 May, 2022 15:10

https://www.scikit-yb.org/en/latest/

13 mai 2022

GitHub – explosion/coreferee: Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

https://github.com/explosion/coreferee

12 mai 2022

Run Python in Your HTML

https://pyscript.net/ PyScript | Run Python in your HTML Run Python code in your HTML. pyscript.net

12 mai 2022

Scalene: a high-performance CPU, GPU and memory profiler for Python

https://github.com/plasma-umass/scalene plasma-umass/scalene: Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python – GitHub Scalene Overview Scalene talk (PyCon US 2021) This talk presented at PyCon 2021 walks through Scalene’s advantages and how to use it to debug the performance of an application (and provides some technical details on its internals). We highly recommend…

12 mai 2022

Salesforce AI Introduces ‘AI Economist’: A Reinforcement Learning (RL) System That Learns Dynamic Tax Policies To Optimize Equality Along With Productivity In Simulated Economies, Outperforming Alternative Tax Systems – MarkTechPost

https://www.marktechpost.com/2022/05/10/salesforce-ai-introduces-ai-economist-a-reinforcement-learning-rl-system-that-learns-dynamic-tax-policies-to-optimize-equality-along-with-productivity-in-simulated-economies-outperforming-alte/

11 mai 2022

Catégorie : Notes