Open Images Dataset V7 News Extras Extended Download Description Explore

Open Images Extended

Open Images Extended is a collection of sets that complement the core Open Images Dataset with additional images and/or annotations. There are currently three extensions:

The HierText Dataset (OCR Annotations)

HierText is the first dataset featuring hierarchical annotations of text in natural scenes and documents. The dataset contains 11,639 images selected from the Open Images dataset, providing high quality word (~1.2M), line, and paragraph level annotations. Text lines are defined as connected sequences of words that are aligned in spatial proximity and are logically connected. Text lines that belong to the same semantic topic and are geometrically coherent form paragraphs. Images in HierText are rich in text, with average of more than 100 words per image.

We hope this dataset can help researchers developing more robust OCR models and enables research into unified OCR and layout analysis.

Check the dataset website for more details and downloads.

MIAP (More Inclusive Annotations for People)

The MIAP dataset focuses on enabling ML Fairness research. We provide additional annotations for 100,000 (70k from training and 30k from validation/test) images that contain at least one person bounding box in the original annotations.

These additional annotations provide exhaustive bounding boxes for all people in an image. Person boxes are further annotated with attribute labels for fairness research. Annotated attributes include the human perceived gender presentation (predominantly feminine, predominantly masculine, and unknown) and perceived age range (young, middle, older, and unknown) of the localized person. This procedure adds nearly 100,000 new boxes that were not annotated under the original labeling pipeline.

Annotations on the exhaustive set enable research into the fairness properties of models trained on partial annotations and the pipelines that produce these annotations.

A note on perceived gender presentation and perceived age presentation

In this subset of Open Images we have annotations for both perceived gender presentation (predominantly feminine, predominantly masculine, unknown) and perceived age presentation (young, middle, older). Note that gender is not binary, and an individual's gender identity may not match their perceived gender presentation. It is not possible to label gender identity from images. Additionally, norms around gender expression vary across cultures and have changed over time. No single aspect of a person's appearance "defines" their gender expression. For example, a person may still present as predominantly masculine while wearing jewelry. Another may present as predominantly feminine while having short hair. Similarly, a person's age presentation may not represent the age they were when the picture was taken. The intention of these labels is to capture gender and age presentation as assessed by a third party based on visual cues alone, rather than an individual's self-identified gender or actual age.

That being said, these labels are valuable because they allow researchers to assess the performance of models across gender presentation, which can ultimately lead to less biased models that work well for all users. While these gender annotations will sometimes be misaligned with each individual's self-identified gender, in aggregate the annotations are useful to give us a simplified overall sense of how model performance may differ for people who present gender differently.

Note, we do not support or condone the building or deployment of gender and/or age classifiers from this dataset.


Bounding box information can be found below:

The images in the MIAP dataset are a subset of the full Open Images dataset, whose image keys are listed in these files:

To download the images, use the Open Images Downloader as follows (more info in the download page):

  1. Download the downloader (open and press Ctrl + S), or directly run:
  2. Run the following script, where $IMAGE_LIST_FILE is one of the files with image key lists above:
    python $IMAGE_LIST_FILE --download_folder=$DOWNLOAD_FOLDER --num_processes=5


The following paper describes the annotation process and detailed statistics about the data. If you use the MIAP dataset in your work, please cite this article.

C. Schumann, S. Ricco, U. Prabhu, V. Ferrari, C. Pantofaru
A Step Toward More Inclusive People Annotations for Fairness
AIES, 2021.
[PDF] [BibTeX] [Data Card]
  title = {A Step Toward More Inclusive People Annotations for Fairness},
  author        = {Candice Schumann and Susanna Ricco and Utsav Prabhu and Vittorio Ferrari and Caroline Rebecca Pantofaru},
  booktitle     = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES)},
  year = {2021}

Crowdsourced Extension

You can visually explore the crowdsourced images here.

This dataset is composed of over 382,000 images across 6,000+ categories contributed by global users of the Google Crowdsource Android app. A large majority of these images are from India, with some representation from the Middle East, Africa and Latin America. The images focus on some key categories like household objects, plants and animals, food, and people in various professions (all faces are blurred to protect privacy). Detailed information about the composition of the dataset can be found here. On average these images are simpler than those in the core Open Images Dataset, and often feature a single centered object.

Class definitions

These classes are a subset of those within the core Open Images Dataset and are identified by MIDs (Machine-generated Ids) as can be found in Freebase or Google Knowledge Graph API. A short description of each class is available in class-descriptions.csv.

Data Organization

This dataset is composed of over 382,000 images across 6,000+ categories contributed by global users of the Google Crowdsource Android app.

Image-Level Labels

Table 1 shows the split between donated-verified labels and human-verified labels in the dataset. Donated-verified labels are tags provided by external users via the Google Crowdsource app, which were directly translated into Open Images classes and then verified by human annotators at Google. Human-verified labels are additional labels generated by other means (i.e., running proprietary models on top of the images) and then also verified by human annotators at Google.

Table 1: Image-level labels.

Donated-Verified Labels
Labels generated by tags suggested by users.
Human-Verified Labels
Labels generated by other means.
Positive 363,339 598,223
Negative 65,361 74,595
Total 428,700 672,818

Image labels in this dataset follow the same format as the core dataset. We do not currently have bounding box labels on these images but we intend to add these in the future.


~80GB of 382,000 images spread across 10 ~6.50GB sets for easier downloading.

Find any images that are inappropriate for this dataset? Contact us with metadata about the image.
Image IDs
Image Labels