-------------------------------
Details on model training:
-------------------------------
The model was trained using the tf-slim image classification model library
available at https://github.com/tensorflow/models/tree/master/research/slim. Vgg
input preprocessing was used with image resolution 299x299. The classification
layer is defined as
  logits, end_points = resnet_v1.resnet_v1_101(images, num_classes=5000)
  logits = tf.squeeze(logits, name='SpatialSqueeze')
  end_points['multi_predictions'] = tf.nn.sigmoid(
      logits, name='multi_predictions')

The model was trained asynchronously with 50 GPU workers and batch size 32 for
61995903 steps. RMSProp optimizer was used with the following settings:
  learning_rate = tf.train.exponential_decay(
      0.045,   # learning_rate
      slim.get_or_create_global_step(),
      552345,  # decay_steps
      0.94,    # learning_rate_decay_factor
      staircase=True
  )

  opt = tf.train.RMSPropOptimizer(
      learning_rate,
      0.9,  # decay
      0.9,  # momentum
      1.0   #rmsprop_epsilon
  )

The training data was formed by merging the machine-generated and human verified
annotations (filtered to the 5000 trainable classes):
- https://storage.googleapis.com/openimages/2017_07/annotations_machine_2017_07.tar.gz
- https://storage.googleapis.com/openimages/2017_07/annotations_human_2017_07.tar.gz
Human verified annotations were used whenever both were present.