This repository contains code to replicate our research described in "Cramming: Training a Language Model on a Single GPU in One Day". We experiment with language model pretraining a BERT-type model with limited compute, wondering "how bad can it really be"?
https://github.com/JonasGeiping/cramming
GitHub – JonasGeiping/cramming: Cramming the training of a (BERT-type) language model into limited compute. A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. github.com |