The Pile The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together. pile.eleuther.ai |
The Pile An 800GB Dataset of Diverse Text for Language Modeling
par
Étiquettes :