Catégorie : NLP
-
Journalism AI – Quotes extraction for modular journalism
The aim of the project is to extract quotes from news articles using Named Entity Recognition, add coreferencing information and format the results for an exploratory search tool. https://github.com/JournalismAI-2021-Quotes/quote-extraction JournalismAI-2021-Quotes/quote-extraction: Quote extraction for modular journalism (JournalismAI collab 2021) – GitHub Journalism AI – Quotes extraction for modular journalism. This repo contains the code for the…
-
A brief timeline of NLP from Bag of Words to the Transformer family | by Fabio Chiusano | NLPlanet | Feb, 2022 | Medium
https://medium.com/nlplanet/a-brief-timeline-of-nlp-from-bag-of-words-to-the-transformer-family-7caad8bbba56
-
Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints | AWS Machine Learning Blog
https://aws.amazon.com/fr/blogs/machine-learning/improve-high-value-research-with-hugging-face-and-amazon-sagemaker-asynchronous-inference-endpoints/
-
yzpang/gold-off-policy-text-gen-iclr21
https://github.com/yzpang/gold-off-policy-text-gen-iclr21
-
Hugging Face Tasks
Hugging Face is the home for all Machine Learning tasks. Here you can find what you need to get started with a task: demos, use cases, models, datasets, and more! https://huggingface.co/tasks
-
Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition
https://www.philschmid.de/huggingface-transformers-keras-tf
-
NorskRegnesentral/skweak: skweak: A software toolkit for weak supervision applied to NLP tasks
https://github.com/NorskRegnesentral/skweak
-
NorskRegnesentral/skweak: skweak: A software toolkit for weak supervision applied to NLP tasks
https://github.com/NorskRegnesentral/skweak
-
Open source NLP is fueling a new wave of startups
https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/12/23/open-source-nlp-is-fueling-a-new-wave-of-startups/amp/ Open source NLP is fueling a new wave of startups A growing number of startups are offering open source language models as a service, competing with heavyweights like OpenAI. venturebeat-com.cdn.ampproject.org
-
Fine-Tuning Bert for Tweets Classification ft. Hugging Face | by Rajan Choudhary | Dec, 2021 | Medium
https://codistro.medium.com/fine-tuning-bert-for-tweets-classification-ft-hugging-face-8afebadd5dbf
-
New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face
https://www.philschmid.de/serverless-transformers-sagemaker-huggingface New Serverless Transformers using Amazon SageMaker Serverless Inference and Hugging Face Amazon SageMaker Serverless Inference. Amazon SageMaker Serverless Inference is a fully managed serverless inference option that makes it easy for you to deploy and scale ML models built on top of AWS Lambda and fully integrated into the Amazon SageMaker service. Serverless Inference…
-
Advanced NLP with spaCy
https://spacy.io/universe/project/spacy-course
-
Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition
https://www.philschmid.de/huggingface-transformers-keras-tf
-
Interpretable_Text_Classification_And_Clustering – a Hugging Face Space by Hellisotherpeople
https://huggingface.co/spaces/Hellisotherpeople/Interpretable_Text_Classification_And_Clustering
-
d4data/bias-detection-model · Hugging Face
https://huggingface.co/d4data/bias-detection-model
-
Clustering sentence embeddings to identify intents in short text
The unsupervised learning problem of clustering short-text messages can be turned into a constrained optimization problem to automatically tune UMAP + HDBSCAN hyperparameters. The chatintents package makes it easy to implement this tuning process. User dialogue interactions can be a tremendous source of informati on on how to improve products or services. Understanding why people…
-
WIT (Wikipedia-based Image Text) Dataset
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages. Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia…
-
Accelerate Transformers on State of the Art Hardware
https://huggingface.co/hardware Optimum: the ML Hardware Optimization Toolkit for Production We’re on a journey to advance and democratize artificial intelligence through open source and open science. huggingface.co
-
GitHub – Yale-LILY/SummerTime: An open-source text summarization toolkit for non-experts.
https://github.com/Yale-LILY/SummerTime
-
Small text: Active learning for text classification in Python
https://github.com/webis-de/small-text GitHub – webis-de/small-text: Active learning for text classification in Python Requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. Quick Start. For a quick start, see the provided examples for binary classification, pytorch multi-class classification, or transformer-based multi-class classification. Docs github.com
-
NLP needs to be open. 500+ researchers are trying to make it happen | VentureBeat
https://venturebeat-com.cdn.ampproject.org/c/s/venturebeat.com/2021/07/14/nlp-needs-to-be-open-500-researchers-are-trying-to-make-it-happen/amp/
-
artefactory/NLPretext: All the goto functions you need to handle NLP use-cases, integrated in NLPretext
https://github.com/artefactory/NLPretext
-
Deep learning on graph for nlp
https://drive.google.com/file/d/1A9Gtzyan4tqFTgmNsNfwOkO4ELR77iNh/view
-
Dataiku – Analyze text data with ontology tagging
https://www.dataiku.com/product/plugins/nlp-analysis/
-
Few-shot learning in practice: GPT-Neo and the 🤗 Accelerated Inference API
https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api
-
2102.09130v1 Entity-level Factual Consistency of Abstractive Text Summarization
https://arxiv.org/abs/2102.09130v1
-
GitHub – nyu-mll/jiant: jiant is an NLP toolkit
https://github.com/nyu-mll/jiant
-
Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon S ageMaker
https://huggingface.co/blog/sagemaker-distributed-training-seq2seq Distributed Training: Train BART/T5 for Summarization using 🤗 Transformers and Amazon SageMaker Tutorial We will use the new Hugging Face DLCs and Amazon SageMaker extension to train a distributed Seq2Seq-transformer model on the summarization task using the transformers and datasets libraries, and then upload the model to huggingface.co and test it.. As distributed training…
-
Summer of Language Models 21
Summer of Language Models 21 https://bigscience.huggingface.co/en/#!index.md
-
22 April, 2021 07:23
https://huggingface.co/blog/bert-cpu-scaling-part-1
-
textflint/textflint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
https://github.com/textflint/textflint
-
The Super Duper NLP Repo
https://notebooks.quantumstat.com/
-
open-mmlab/mmocr: OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://github.com/open-mmlab/mmocr
-
Words in context: tracking context-processing during language comprehension using computational language models and MEG
https://www.biorxiv.org/content/10.1101/2020.06.19.161190v1.full Words in context: tracking context-processing during language comprehension using computational language models and MEG www.biorxiv.org
-
PAIR-code/lit: The Language Interpretability Tool: Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
https://github.com/PAIR-code/lit/
-
TextFlint
Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing https://github.com/textflint/textflint
-
Dodrio – An interactive visualization system designed to help NLP researchers and practitioners analyze and compare attention weights in transformer-based models with linguistic knowledge.
https://github.com/poloclub/dodrio GitHub – poloclub/dodrio: Exploring attention weights in transformer-based models with linguistic knowledge. Dodrio . An interactive visualization system designed to help NLP researchers and practitioners analyze and compare attention weights in transformer-based models with linguistic knowledge. github.com
-
Nlp Cypher news
https://pub.towardsai.net/the-nlp-cypher-04-04-21-9964ab34df17?source=rss—-98111c9905da—4?source=social.tw
-
AllenNLP Project Gallery
https://gallery.allennlp.org/
-
Word Mover’s Distance for Text Similarity
https://towardsdatascience.com/word-movers-distance-for-text-similarity-7492aeca71b0 Word Mover’s Distance for Text Similarity | by Nihit Saxena | Towards Data Science – Medium Introduction of the NLP (Natural Language Processing) revolutionized all the industries. So, NLP is a branch of AI (artificial Intelligence) that helps computer understand, interpret and manipulate human language. Now, with heaps of data available (thanks to big…
-
The Best of NLP
https://cacm.acm.org/magazines/2021/4/251336-the-best-of-nlp/fulltext The Best of NLP | April 2021 | Communications of the ACM "Each time, the added scale gives us new capabilities to let us test new assumptions," Bosselut says. "As much as many people think we are going too far down this path, the truth is that the next iteration of language modeling could…
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
https://arxiv.org/abs/2005.11401 https://huggingface.co/transformers/master/model_doc/rag.html#tfragmodel "My biggest-challenge open-source collaboration with Huggingface : Tensorflow’s implementation of RAG (Retrieval Augmented Generation) is now available on Huggingface master !!! https://lnkd.in/gGVmw4X RAG ( https://lnkd.in/gdKWqqZ ) is an AI prototype that can read articles to give answers to any questions! With appropriate training data like ELI5 ( https://lnkd.in/gB4S4wj ) , it can even…
-
NLP Text-Classification in Python: PyCaret Approach Vs The Traditional Approach
https://towardsdatascience.com/nlp-classification-in-python-pycaret-approach-vs-the-traditional-approach-602d38d29f06 NLP Text-Classification in Python: PyCaret Approach Vs The Traditional Approach | by Prateek Baghel | Towards Data Science I. Introduction. In this post we’ll see a demonstration of an NLP-Classification problem with 2 different approaches in python: 1-The Traditional approach: In this approach, we will: – preprocess the given text d ata using different…
-
Doccano – An open-source text annotation tool for humans.
An open-source text annotation tool for humans. Annotation features for : Text classification Sequence labeling Sequence to sequence tasks. Label data for: sentiment analysis, named entity recognition, text summarization and … Features: Collaborative annotation Multi-language support Mobile support Emoji 😄 support Dark theme RESTful API pip install doccano https://github.com/doccano/doccano GitHub – doccano/doccano: Open source text…
-
GitHub – jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).
https://github.com/jalammar/ecco
-
CLIP: Connecting Text and Images
https://openai.com/blog/clip/
-
The Pile An 800GB Dataset of Diverse Text for Language Modeling
https://pile.eleuther.ai/ The Pile The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. pile.eleuther.ai
-
Classifying Sentiment from Text Reviews
https://towardsdatascience.com/classifying-sentiment-from-text-reviews-a2c65ea468d6