If you would like to view the description of another version, please select it here:
Subset with Bounding Boxes (600 classes) and Visual Relationships
These annotation files cover the 600 boxable object classes, and span the 1,743,042 training images where we annotated bounding boxes and visual relationships, as well as the full validation (41,620 images) and test (125,436 images) sets.NOTE: we will provide visual relationships annotations on the test and validation sets soon - stay tuned!
Trouble downloading the pixels? Let us know.
Subset with Image-Level Labels (19,995 classes)
These annotation files cover all object classes. In the train set, the human-verified labels span 5,655,108 images, while the machine-generated labels span 8,853,429 images. The annotation files span the full validation (41,620 images) and test (125,436 images) sets.Trouble downloading the pixels? Let us know.
Complete Open Images
The full set of 9,178,275 images.Trouble downloading the pixels? Let us know.
Open Images Extended
Data Formats
Boxes
Each row defines one bounding box.
ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
000026e7ee790996,freeform,/m/07j7r,1,0.071905,0.145346,0.206591,0.391306,0,1,1,0,0
000026e7ee790996,freeform,/m/07j7r,1,0.439756,0.572466,0.264153,0.435122,0,1,1,0,0
000026e7ee790996,freeform,/m/07j7r,1,0.668455,1.000000,0.000000,0.552825,0,1,1,0,0
000062a39995e348,freeform,/m/015p6,1,0.205719,0.849912,0.154144,1.000000,0,0,0,0,0
000062a39995e348,freeform,/m/05s2s,1,0.137133,0.377634,0.000000,0.884185,1,1,0,0,0
0000c64e1253d68f,freeform,/m/07yv9,1,0.000000,0.973850,0.000000,0.043342,0,1,1,0,0
0000c64e1253d68f,freeform,/m/0k4j,1,0.000000,0.513534,0.321356,0.689661,0,1,0,0,0
0000c64e1253d68f,freeform,/m/0k4j,1,0.016515,0.268228,0.299368,0.462906,1,0,0,0,0
0000c64e1253d68f,freeform,/m/0k4j,1,0.481498,0.904376,0.232029,0.489017,1,0,0,0,0
...
ImageID
: the image this box lives in.
Source
: indicates how the box was made:
freeform
andxclick
are manually drawn boxes.activemil
are boxes produced using an enhanced version of the method [1]. These are human verified to be accurate at IoU>0.7.
LabelName
: the MID of the object class this box belongs to.
Confidence
: a dummy value, always 1.
XMin
, XMax
, YMin
, YMax
: coordinates of the box, in normalized image coordinates. XMin is in [0,1], where 0 is the leftmost pixel, and 1 is the rightmost pixel in the image. Y coordinates go from the top pixel (0) to the bottom pixel (1).
The attributes have the following definitions:
IsOccluded
: Indicates that the object is occluded by another object in the image.IsTruncated
: Indicates that the object extends beyond the boundary of the image.IsGroupOf
: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching.IsDepiction
: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance).IsInside
: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building).
For each of them, value 1
indicates present, 0
not present, and -1
unknown.
Instance segmentations
Visual relationships
Each row in the file corresponds to a single annotation.
ImageID,LabelName1,LabelName2,XMin1,XMax1,YMin1,YMax1,XMin2,XMax2,YMin2,YMax2,RelationLabel
0009fde62ded08a6,/m/0342h,/m/01d380,0.2682927,0.78549093,0.4977778,0.8288889,0.2682927,0.78549093,0.4977778,0.8288889,is
00198353ef684011,/m/01mzpv,/m/04bcr3,0.23779725,0.30162704,0.6500938,0.7335835,0,0.5819775,0.6482176,0.99906194,at
001e341dd7456c72,/m/04yx4,/m/01mzpv,0.07009346,0.2859813,0.2332708,0.5203252,0.14018692,0.31588784,0.32082552,0.48405254,on
001e341dd7456c72,/m/04yx4,/m/01mzpv,0,0.28317758,0.26454034,0.5540963,0.2224299,0.3411215,0.3908693,0.4859287,on
001e341dd7456c72,/m/01599,/m/04bcr3,0.5551402,0.6084112,0.50343966,0.5490932,0.5411215,0.95981306,0.5090682,0.78361475,on
001e341dd7456c72,/m/04bcr3,/m/01d380,0.7392523,0.9990654,0.3889931,0.518449,0.7392523,0.9990654,0.3889931,0.518449,is
...
ImageID
: the image this relationship instance lives in.
LabelName1
: the label of the first object in the relationship triplet.
XMin1,XMax1,YMin1,YMax1
: normalized bounding box coordinates of the bounding box of the first object.
LabelName2
: the label of the second object in the relationship triplet, or an attribute.
XMin2,XMax2,YMin2,YMax2
: If the relationship is between a pair of objects: normalized bounding box coordinates of the bounding box of the second object. For an object-attribute relationship (RelationLabel="is"): normalized bounding box of the first object (repeated). In this case, LabelName2 is an attribute.
RelationLabel
: the label of the relationship ("is" in case of attributes).
Image Labels
Human-verified and machine-generated image-level labels:
ImageID,Source,LabelName,Confidence
000026e7ee790996,verification,/m/04hgtk,0
000026e7ee790996,verification,/m/07j7r,1
000026e7ee790996,crowdsource-verification,/m/01bqvp,1
000026e7ee790996,crowdsource-verification,/m/0csby,1
000026e7ee790996,verification,/m/01_m7,0
000026e7ee790996,verification,/m/01cbzq,1
000026e7ee790996,verification,/m/01czv3,0
000026e7ee790996,verification,/m/01v4jb,0
000026e7ee790996,verification,/m/03d1rd,0
...
Source
: indicates how the annotation was created:
verification
are labels verified by in-house annotators at Google.crowdsource-verification
are labels verified from the Crowdsource app.machine
are machine-generated labels.
Confidence
: Labels that are human-verified to be present in an image have confidence = 1 (positive labels). Labels that are human-verified to be absent from an image have confidence = 0 (negative labels). Machine-generated labels have fractional confidences, generally >= 0.5. The higher the confidence, the smaller the chance for the label to be a false positive.
Class Names
The class names in MID format can be converted to their short descriptions by looking into class-descriptions.csv:
...
/m/0pc9,Alphorn
/m/0pckp,Robin
/m/0pcm_,Larch
/m/0pcq81q,Soccer player
/m/0pcr,Alpaca
/m/0pcvyk2,Nem
/m/0pd7,Army
/m/0pdnd2t,Bengal clockvine
/m/0pdnpc9,Bushwacker
/m/0pdnsdx,Enduro
/m/0pdnymj,Gekkonidae
...
Note the presence of characters like commas and quotes. The file follows standard CSV escaping rules. e.g.:
/m/02wvth,"Fiat 500 ""topolino"""
/m/03gtp5,Lamb's quarters
/m/03hgsf0,"Lemon, lime and bitters"
Image IDs
It has image URLs, their OpenImages IDs, the rotation information, titles, authors and license information:
ImageID,Subset,OriginalURL,OriginalLandingURL,License,AuthorProfileURL,Author,Title,
OriginalSize,OriginalMD5,Thumbnail300KURL,Rotation
...
000060e3121c7305,train,https://c1.staticflickr.com/5/4129/5215831864_46f356962f_o.jpg,\
https://www.flickr.com/photos/brokentaco/5215831864,\
https://creativecommons.org/licenses/by/2.0/,\
"https://www.flickr.com/people/brokentaco/","David","28 Nov 2010 Our new house."\
211079,0Sad+xMj2ttXM1U8meEJ0A==,https://c1.staticflickr.com/5/4129/5215831864_ee4e8c6535_z.jpg,0
...
Each image has a unique 64-bit ID assigned. In the CSV files they appear as zero-padded hex integers, such as 000060e3121c7305
.
The data is as it appears on the destination websites.
OriginalSize
is the download size of the original image.OriginalMD5
is base64-encoded binary MD5, as described here.Thumbnail300KURL
is an optional URL to a thumbnail with ~300K pixels (~640x480). It is provided for the convenience of downloading the data in the absence of more convenient ways to get the images. If missing,OriginalURL
must be used (and then resized to the same size, if needed). These thumbnails are generated on the fly and their contents and even resolution might be different every day.Rotation
is the number of degrees that the image should be rotated counterclockwise to match the Flickr user intended orientation (0
,90
,180
,270
).nan
means that this information is not available. Check this announcement for more information about the issue.
Hierarchy for 600 boxable classes
View the set of boxable classes as a hierarchy here or download it as a JSON file:
Previous versions of the dataset
You can find information and annotations for the previous versions of the dataset in the pages for V3, V2, and V1.
References
- "We don't need no bounding-boxes: Training object class detectors using only human verification", Papadopolous et al., CVPR 2016.