Cramming: Training a Language Model on a Single GPU in One Day – Jonas Geiping and Tom Goldstein University of Maryland 2022

This repository contains code to replicate our research described in "Cramming: Training a Language Model on a Single GPU in One Day". We experiment with language model pretraining a BERT-type model with limited compute, wondering "how bad can it really be"?

https://github.com/JonasGeiping/cramming

GitHub – JonasGeiping/cramming: Cramming the training of a (BERT-type) language model into limited compute.
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior.
github.com

Publié

dans

par

Étiquettes :