Applying Context Aware Spell Checking in Spark NLP which is scalable, extensible, and highly accurate! Btw you can extend it with your own training to add support for more languages or specific domains.
Blogpost https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
GitHub https://github.com/JohnSnowLabs/spark-nlp
JohnSnowLabs/spark-nlp: State of the Art Natural Language Processing – GitHub Spark NLP: State of the Art Natural Language Processing. Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 200+ pretrained pipelines and models in more than 45+ languages. github.com |
Models for Spark NLP https://nlp.johnsnowlabs.com/docs/en/models
Models – Spark NLP High Performance NLP with Apache Spark Offline. If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.. After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster): nlp.johnsnowlabs.com |
Spark NLP in action https://www.johnsnowlabs.com/spark-nlp-in-action
Spark NLP in Action | John Snow Labs Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings. www.johnsnowlabs.com |
On accuracy: The pre-trained contextual spell checker model delivers a word error rate of 8.09% for fully automatic correction in the Holbrook benchmark compared to 20.24% error rate that JamSpell attains on the same benchmark.
By Philip Vollet https://www.linkedin.com/posts/philipvollet_opensource-linkedin-artificialintelligence-activity-6695043270982139904-XruJ