Applying Context Aware Spell Checking in Spark NLP

Applying Context Aware Spell Checking in Spark NLP which is scalable, extensible, and highly accurate! Btw you can extend it with your own training to add support for more languages or specific domains.


JohnSnowLabs/spark-nlp: State of the Art Natural Language Processing – GitHub
Spark NLP: State of the Art Natural Language Processing. Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 200+ pretrained pipelines and models in more than 45+ languages.

Models for Spark NLP

Models – Spark NLP
High Performance NLP with Apache Spark Offline. If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.. After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

Spark NLP in action

Spark NLP in Action | John Snow Labs
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.

On accuracy: The pre-trained contextual spell checker model delivers a word error rate of 8.09% for fully automatic correction in the Holbrook benchmark compared to 20.24% error rate that JamSpell attains on the same benchmark.

By Philip Vollet