Libra: Ergonomic machine learning

https://libradocs.github.io/

About Libra
Libra is the nexus of modern machine learning. We’ve combined technologies from the most popular platforms to create a complete experience. Keras: straightforward model building techniques for improved modularity and ease of deployment. TensorFlow: core computational fundamentals and detailed functionality. PyTorch: scalable training for highly-dimensional processes.
libradocs.github.io

The Roots of Data Science

https://towardsdatascience.com/the-roots-of-data-science-77c71115229

The Roots of Data Science. How it all began | by Favio Vázquez | Aug, 2020 | Towards Data Science
John Tukey is one of the most important statisticians in history. In the fantastic article “The Future of Data Analysis” he said this: For a long time I have thought I was a statistician, interested in inferences from the particular to the general.
towardsdatascience.com

https://miro.medium.com/max/3600/1*laggHT3j9o9Fip8sTrMkzg.png

1*laggHT3j9o9Fip8sTrMkzg.png

Google breaks AI performance records in MLPerf with world’s fastest training supercomputer

https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds-fastest-training-supercomputer

Google wins MLPerf benchmark contest with fastest ML training supercomputer | Google Cloud Blog
Table 1: All of these MLPerf submissions trained from scratch in 33 seconds or faster on Google’s new ML supercomputer. 2. Training at scale with TensorFlow, JAX, Lingvo, and XLA. Training complex ML models using thousands of TPU chips required a combination of algorithmic techniques and optimizations in TensorFlow, JAX, Lingvo, and XLA.To provide some background, XLA is the underlying …
cloud.google.com

Applying Context Aware Spell Checking in Spark NLP

Applying Context Aware Spell Checking in Spark NLP which is scalable, extensible, and highly accurate! Btw you can extend it with your own training to add support for more languages or specific domains.

Blogpost https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
GitHub https://github.com/JohnSnowLabs/spark-nlp

JohnSnowLabs/spark-nlp: State of the Art Natural Language Processing – GitHub
Spark NLP: State of the Art Natural Language Processing. Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 200+ pretrained pipelines and models in more than 45+ languages.
github.com

Models for Spark NLP https://nlp.johnsnowlabs.com/docs/en/models

Models – Spark NLP
High Performance NLP with Apache Spark Offline. If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.. After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):
nlp.johnsnowlabs.com

Spark NLP in action https://www.johnsnowlabs.com/spark-nlp-in-action

Spark NLP in Action | John Snow Labs
Recognize Persons, Locations, Organizations and Misc entities using out of the box pretrained Deep Learning models based on GloVe (glove_100d) and BERT (ner_dl_bert) word embeddings.
www.johnsnowlabs.com

On accuracy: The pre-trained contextual spell checker model delivers a word error rate of 8.09% for fully automatic correction in the Holbrook benchmark compared to 20.24% error rate that JamSpell attains on the same benchmark.

By Philip Vollet https://www.linkedin.com/posts/philipvollet_opensource-linkedin-artificialintelligence-activity-6695043270982139904-XruJ

turbo_transformers: a fast and user-friendly runtime for transformer inference on CPU and GPU

Transformer is the most critical alogrithm innovation in the NLP field in recent years. It brings higher model accuracy while introduces more calculations. The efficient deployment of online Transformer-based services faces enormous challenges. In order to make the costly Transformer online service more efficient, the WeChat AI open-sourced a Transformer inference acceleration tool called TurboTransformers, which has the following characteristics.

https://github.com/Tencent/TurboTransformers