https://arxiv.org/abs/2202.07765
[2202.07765] General-purpose, long-context autoregressive modeling with Perceiver AR – arXiv.org Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality … arxiv.org |