Open Images Dataset V5 News Extras Extended Download Description Explore

Rotated images in Open Images

The images in the Open Images dataset might be rotated with respect to the author’s intended rotation for a series of reasons explained below. We have now made the correct rotation information available in the CSVs ‘Image IDs’ files, so that users can recover this orientation when training their models if they find it appropriate. Possible values are: 0, 90, 180, and 270, which refer to the degrees (counterclockwise) that the image and boxes should be rotated (2.6% of images have non-null orientation). nan indicates that the image was not present in Flickr anymore, so no orientation was recovered.

For the curious reader

The 9 million images contained in the Open Images dataset were acquired from Flickr, selected among those that the users shared under a CC BY 2 license. Flickr images are stored using the JPEG file format, and are available in different sizes, including the original full-resolution image uploaded by the author. The original JPEG file (not the resizes) usually has EXIF metadata embedded in the file, with information such as the camera used, date and time the photo was taken, etc. One of such metadata is the orientation (rotation) of the photo, that is, how the camera was oriented with respect to the direction of gravity (detected by in-camera sensors such as inclinometers or accelerometers).

When a photo is taken, the set of pixels that the sensor captured are usually stored in a matrix, irrespective of the orientation of the camera in the real world. In other words, the top-left pixel in the matrix corresponds to the top-left pixel in the camera sensor. When the image is displayed, however, we usually would like to display the top-left corner "in the real world" on the top-left corner of the screen. Here is where the metadata comes into play: we recover the information of how the camera was oriented when the photo was taken and we rotate the image to display it correctly. Here you can find an illustrated explanation.

For the sake of privacy preservation, all metadata was removed when the JPEG files were stored in CVDF or Figure Eight. Orientation metadata, however, was also removed, without rotating the pixels accordingly. This way, the bounding boxes were drawn on the image displayed as recorded from the sensor, without taking rotation into account. This means that those images with non-default orientation (2.6% of Open Images) look differently if downloaded from Open Images than the original one from Flickr. In particular, since the Open Images visualizer loads the images from Flickr, these images would be displayed with mismatching boxes if rotation was not compensated (the visualizer had actually this bug during the first days).

On top of that, Flickr users can rotate the images when they upload them. This information, however, is not stored in the JPEG files and it is only applied to the resizes, not to the original file. This implies that the original file and the resizes can have different orientations (see this original JPEG and this resize from the same image). Since the orientation of the resized images is the one chosen by the user, we assume it is the correct one.

Since orientation is an inherent and important cue of the images (the sky is usually on the top of the images), and for the sake of coherence with Flickr, we decided to recover the orientation information from the images that are still available at Flickr among the 9 million images in open images (85.5%). Since the resized JPEG files have no metadata, we recovered the right orientation by comparing the four possible orientations (or two in the case of non-square images) and detecting the one with the negligible mean square error. As a side product, we discovered funny images for which the error is the same no matter which orientation :).

We make the rotation information available in the CSVs ‘Image IDs’ files, so that users can recover the correct orientation when training their models if they find it appropriate. Possible values are: 0, 90, 180, and 270, which refer to the degrees (counterclockwise) that the image and boxes should be rotated. nan indicates that the image was not found in Flickr, so no orientation was recovered.



Published 17th May 2018