NLP – Notes de Francis

Journalism AI – Quotes extraction for modular journalism

The aim of the project is to extract quotes from news articles using Named Entity Recognition, add coreferencing information and format the results for an exploratory search tool. https://github.com/JournalismAI-2021-Quotes/quote-extraction JournalismAI-2021-Quotes/quote-extraction: Quote extraction for modular journalism (JournalismAI collab 2021) – GitHub Journalism AI – Quotes extraction for modular journalism. This repo contains the code for the…

22 septembre 2022

A brief timeline of NLP from Bag of Words to the Transformer family | by Fabio Chiusano | NLPlanet | Feb, 2022 | Medium

https://medium.com/nlplanet/a-brief-timeline-of-nlp-from-bag-of-words-to-the-transformer-family-7caad8bbba56

18 février 2022

Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints | AWS Machine Learning Blog

https://aws.amazon.com/fr/blogs/machine-learning/improve-high-value-research-with-hugging-face-and-amazon-sagemaker-asynchronous-inference-endpoints/

10 février 2022

yzpang/gold-off-policy-text-gen-iclr21

https://github.com/yzpang/gold-off-policy-text-gen-iclr21

10 février 2022

Hugging Face Tasks

Hugging Face is the home for all Machine Learning tasks. Here you can find what you need to get started with a task: demos, use cases, models, datasets, and more! https://huggingface.co/tasks

18 janvier 2022

Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition

https://www.philschmid.de/huggingface-transformers-keras-tf

3 janvier 2022

NorskRegnesentral/skweak: skweak: A software toolkit for weak supervision applied to NLP tasks

https://github.com/NorskRegnesentral/skweak

3 janvier 2022

NorskRegnesentral/skweak: skweak: A software toolkit for weak supervision applied to NLP tasks

https://github.com/NorskRegnesentral/skweak

3 janvier 2022

Open source NLP is fueling a new wave of startups

https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/12/23/open-source-nlp-is-fueling-a-new-wave-of-startups/amp/ Open source NLP is fueling a new wave of startups A growing number of startups are offering open source language models as a service, competing with heavyweights like OpenAI. venturebeat-com.cdn.ampproject.org

2 janvier 2022

Fine-Tuning Bert for Tweets Classification ft. Hugging Face | by Rajan Choudhary | Dec, 2021 | Medium

https://codistro.medium.com/fine-tuning-bert-for-tweets-classification-ft-hugging-face-8afebadd5dbf

28 décembre 2021

New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face

https://www.philschmid.de/serverless-transformers-sagemaker-huggingface New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face Amazon SageMaker Serverless Inference. Amazon SageMaker Serverless Inference is a fully managed serverless inference option that makes it easy for you to deploy and scale ML models built on top of AWS Lambda and fully integrated into the Amazon SageMaker service. Serverless Inference…

28 décembre 2021

Advanced NLP with spaCy

https://spacy.io/universe/project/spacy-course

23 décembre 2021

Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition

https://www.philschmid.de/huggingface-transformers-keras-tf

22 décembre 2021

Interpretable_Text_Classification_And_Clustering – a Hugging Face Space by Hellisotherpeople

https://huggingface.co/spaces/Hellisotherpeople/Interpretable_Text_Classification_And_Clustering

18 décembre 2021

d4data/bias-detection-model · Hugging Face

https://huggingface.co/d4data/bias-detection-model

12 décembre 2021

Clustering sentence embeddings to identify intents in short text

The unsupervised learning problem of clustering short-text messages can be turned into a constrained optimization problem to automatically tune UMAP + HDBSCAN hyperparameters. The chatintents package makes it easy to implement this tuning process. User dialogue interactions can be a tremendous source of informati on on how to improve products or services. Understanding why people…

22 octobre 2021

WIT (Wikipedia-based Image Text) Dataset

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia…

30 septembre 2021

Accelerate Transformers on State of the Art Hardware

https://huggingface.co/hardware Optimum: the ML Hardware Optimization Toolkit for Production We’re on a journey to advance and democratize artificial intelligence through open source and open science. huggingface.co

16 septembre 2021

GitHub – Yale-LILY/SummerTime: An open-source text summarization toolkit for non-experts.

https://github.com/Yale-LILY/SummerTime

31 août 2021

Small text: Active learning for text classification in Python

https://github.com/webis-de/small-text GitHub – webis-de/small-text: Active learning for text classification in Python Requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. Quick Start. For a quick start, see the provided examples for binary classification, pytorch multi-class classification, or transformer-based multi-class classification. Docs github.com

3 août 2021

NLP needs to be open. 500+ researchers are trying to make it happen | VentureBeat

https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/07/14/nlp-needs-to-be-open-500-researchers-are-trying-to-make-it-happen/amp/

15 juillet 2021

artefactory/NLPretext: All the goto functions you need to handle NLP use-cases, integrated in NLPretext

https://github.com/artefactory/NLPretext

15 juillet 2021

Deep learning on graph for nlp

https://drive.google.com/file/d/1A9Gtzyan4tqFTgmNsNfwOkO4ELR77iNh/view

12 juillet 2021

Dataiku – Analyze text data with ontology tagging

https://www.dataiku.com/product/plugins/nlp-analysis/

6 juillet 2021

Few-shot learning in practice: GPT-Neo and the 🤗 Accelerated Inference API

https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api

4 juin 2021

2102.09130v1 Entity-level Factual Consistency of Abstractive Text Summarization

https://arxiv.org/abs/2102.09130v1

27 mai 2021

GitHub – nyu-mll/jiant: jiant is an NLP toolkit

https://github.com/nyu-mll/jiant

11 mai 2021

Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon S ageMaker

https://huggingface.co/blog/sagemaker-distributed-training-seq2seq Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Tutorial We will use the new Hugging Face DLCs and Amazon SageMaker extension to train a distributed Seq2Seq-transformer model on the summarization task using the transformers and datasets libraries, and then upload the model to huggingface.co and test it.. As distributed training…

4 mai 2021

Summer of Language Models 21

Summer of Language Models 21 https://bigscience.huggingface.co/en/#!index.md

4 mai 2021

22 April, 2021 07:23

https://huggingface.co/blog/bert-cpu-scaling-part-1

22 avril 2021

textflint/textflint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

https://github.com/textflint/textflint

14 avril 2021

The Super Duper NLP Repo

https://notebooks.quantumstat.com/

11 avril 2021

open-mmlab/mmocr: OpenMMLab Text Detection, Recognition and Understanding Toolbox

https://github.com/open-mmlab/mmocr

11 avril 2021

Words in context: tracking context-processing during language comprehension using computational language models and MEG

https://www.biorxiv.org/content/10.1101/2020.06.19.161190v1.full Words in context: tracking context-processing during language comprehension using computational language models and MEG www.biorxiv.org

9 avril 2021

PAIR-code/lit: The Language Interpretability Tool: Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.

https://github.com/PAIR-code/lit/

7 avril 2021

TextFlint

Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing https://github.com/textflint/textflint

7 avril 2021

Dodrio – An interactive visualization system designed to help NLP researchers and practitioners analyze and compare attention weights in transformer-based models with linguistic knowledge.

https://github.com/poloclub/dodrio GitHub – poloclub/dodrio: Exploring attention weights in transformer-based models with linguistic knowledge. Dodrio . An interactive visualization system designed to help NLP researchers and practitioners analyze and compare attention weights in transformer-based models with linguistic knowledge. github.com

5 avril 2021

Nlp Cypher news

https://pub.towardsai.net/the-nlp-cypher-04-04-21-9964ab34df17?source=rss—-98111c9905da—4?source=social.tw

5 avril 2021

AllenNLP Project Gallery

https://gallery.allennlp.org/

5 avril 2021

Word Mover’s Distance for Text Similarity

https://towardsdatascience.com/word-movers-distance-for-text-similarity-7492aeca71b0 Word Mover’s Distance for Text Similarity | by Nihit Saxena | Towards Data Science – Medium Introduction of the NLP (Natural Language Processing) revolutionized all the industries. So, NLP is a branch of AI (artificial Intelligence) that helps computer understand, interpret and manipulate human language. Now, with heaps of data available (thanks to big…

31 mars 2021

The Best of NLP

https://cacm.acm.org/magazines/2021/4/251336-the-best-of-nlp/fulltext The Best of NLP | April 2021 | Communications of the ACM "Each time, the added scale gives us new capabilities to let us test new assumptions," Bosselut says. "As much as many people think we are going too far down this path, the truth is that the next iteration of language modeling could…

24 mars 2021

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

https://arxiv.org/abs/2005.11401 https://huggingface.co/transformers/master/model_doc/rag.html#tfragmodel "My biggest-challenge open-source collaboration with Huggingface : Tensorflow’s implementation of RAG (Retrieval Augmented Generation) is now available on Huggingface master !!! https://lnkd.in/gGVmw4X RAG ( https://lnkd.in/gdKWqqZ ) is an AI prototype that can read articles to give answers to any questions! With appropriate training data like ELI5 ( https://lnkd.in/gB4S4wj ) , it can even…

9 mars 2021

NLP Text-Classification in Python: PyCaret Approach Vs The Traditional Approach

https://towardsdatascience.com/nlp-classification-in-python-pycaret-approach-vs-the-traditional-approach-602d38d29f06 NLP Text-Classification in Python: PyCaret Approach Vs The Traditional Approach | by Prateek Baghel | Towards Data Science I. Introduction. In this post we’ll see a demonstration of an NLP-Classification problem with 2 different approaches in python: 1-The Traditional approach: In this approach, we will: – preprocess the given text d ata using different…

28 février 2021

Doccano – An open-source text annotation tool for humans.

An open-source text annotation tool for humans. Annotation features for : Text classification Sequence labeling Sequence to sequence tasks. Label data for: sentiment analysis, named entity recognition, text summarization and … Features: Collaborative annotation Multi-language support Mobile support Emoji 😄 support Dark theme RESTful API pip install doccano https://github.com/doccano/doccano GitHub – doccano/doccano: Open source text…

9 février 2021

GitHub – jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).

https://github.com/jalammar/ecco

9 janvier 2021

Étiquette : NLP