turbo_transformers: a fast and user-friendly runtime for transformer inference on CPU and GPU

Transformer is the most critical alogrithm innovation in the NLP field in recent years. It brings higher model accuracy while introduces more calculations. The efficient deployment of online Transformer-based services faces enormous challenges. In order to make the costly Transformer online service more efficient, the WeChat AI open-sourced a Transformer inference acceleration tool called TurboTransformers, which has the following characteristics.

https://github.com/Tencent/TurboTransformers

Publié

20 juillet 2020

dans

GPU, Machine Learning, Notes

par

Francis

Étiquettes :

GPU, Machine Learning

turbo_transformers: a fast and user-friendly runtime for transformer inference on CPU and GPU

Partager :