Open Images Dataset V5 News Extras Extended Download Description Explore

Open Images Extended

Open Images Extended is a collection of sets that complement the core Open Images Dataset with additional images and/or annotations.


Crowdsourced Extension

This dataset is composed of over 478,000 images across 6,000+ categories contributed by global users of the Google Crowdsource Android app. A large majority of these images are from India, with some representation from the Middle East, Africa and Latin America. The images focus on some key categories like household objects, plants and animals, food, and people in various professions (all faces are blurred to protect privacy). Detailed information about the composition of the dataset can be found here. On average these images are simpler than those in the core Open Images Dataset, and often feature a single centered object.

Class definitions

These classes are a subset of those within the core Open Images Dataset and are identified by MIDs (Machine-generated Ids) as can be found in Freebase or Google Knowledge Graph API. A short description of each class is available in class-descriptions.csv.

Data Organization

This dataset is composed of over 478,000 images across 6,000+ categories contributed by global users of the Google Crowdsource Android app.

Image-Level Labels

Table 1 shows the split between donated-verified labels and human-verified labels in the dataset. Donated-verified labels are tags provided by external users via the Google Crowdsource app, which were directly translated into Open Images classes and then verified by human annotators at Google. Human-verified labels are additional labels generated by other means (i.e., running proprietary models on top of the images) and then also verified by human annotators at Google.

Table 1: Image-level labels.

Donated-Verified Labels
Labels generated by tags suggested by users.
Human-Verified Labels
Labels generated by other means.
Positive 388,988 673,034
Negative 116,448 96,553
Total 505,436 769,587

Image labels in this dataset follow the same format as the core dataset. We do not currently have bounding box labels on these images but we intend to add these in the future.

Downloads

~90GB of 478,000 images spread across 10 ~8.80GB sets for easier downloading.

Find any images that are inappropriate for this dataset? Contact us with metadata about the image.
Image IDs
Image Labels