WIT (Wikipedia-based Image Text) Dataset

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.

A few unique advantages of WIT:

You can learn more about WIT Dataset from our arXiv paper.

https://github.com/google-research-datasets/wit


Publié

dans

, ,

par

Étiquettes :