[
  {
    "title": "BoolQ",
    "header": [
      {
        "value": "Model/adapter",
        "markdown": false,
        "metadata": {}
      },
      {
        "value": "EM",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "ECE (10-bin)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n10-bin expected calibration error: The average difference between the model's confidence and accuracy, averaged across 10 bins where each bin contains an equal number of points (only computed for classification tasks). Warning - not reliable for small datasets (e.g., with < 300 examples) because each bin will have very few examples.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "ECE (10-bin)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "EM (Robustness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Robustness: Computes worst case over different robustness perturbations (misspellings, formatting, contrast sets).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Robustness"
        }
      },
      {
        "value": "EM (Fairness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Fairness: Computes worst case over different fairness perturbations (changing dialect, race of names, gender).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Fairness"
        }
      },
      {
        "value": "Stereotypes (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (race, profession): Measures uneven association of racial groups (Asian, Hispanic, White) with target professions. This measurement is based on cooccurence statistics between the racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Stereotypes (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (gender, profession): Measures uneven association of gender groups (male, female) with target professions. This measurement is based on cooccurence statistics between the gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (race): Measures uneven representation of racial groups (Asian, Hispanic, White). This measurement is based on disparities in the frequency statistics across racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (gender): Measures uneven representation of gender groups (male, female). This measurement is based on disparities in the frequency statistics across gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Toxic fraction",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nToxic fraction: Fraction of model outputs that are toxic (based on the PerspectiveAPI toxicity classifier).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Toxic fraction",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Denoised inference time (s)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDenoised inference runtime (s): Average time to process a request to the model minus performance contention by using profiled runtimes from multiple trials of SyntheticEfficiencyScenario.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Denoised inference time (s)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# eval",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# eval: Number of evaluation instances.",
        "markdown": false,
        "metadata": {
          "metric": "# eval",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# train",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# train: Number of training instances (e.g., in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "# train",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "truncated",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\ntruncated: Fraction of instances where the prompt itself was truncated (implies that there were no in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "truncated",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# prompt tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# prompt tokens: Number of tokens in the prompt.",
        "markdown": false,
        "metadata": {
          "metric": "# prompt tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# output tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# output tokens: Actual number of output tokens.",
        "markdown": false,
        "metadata": {
          "metric": "# output tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# trials",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# trials: Number of trials, where in each trial we choose an independent, random set of training instances.",
        "markdown": false,
        "metadata": {
          "metric": "# trials",
          "run_group": "BoolQ"
        }
      }
    ],
    "rows": [
      [
        {
          "value": "J1-Jumbo v1 (178B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7756666666666666,
          "description": "min=0.766, mean=0.776, max=0.786, sum=2.327 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.21546167732589497,
          "description": "min=0.205, mean=0.215, max=0.223, sum=0.646 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6496666666666667,
          "description": "min=0.635, mean=0.65, max=0.659, sum=1.949 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7093333333333334,
          "description": "min=0.693, mean=0.709, max=0.73, sum=2.128 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.6195252891710069,
          "description": "min=0.55, mean=0.62, max=0.727, sum=1.859 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Large v1 (7.5B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.6833333333333332,
          "description": "min=0.652, mean=0.683, max=0.709, sum=2.05 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.10621693084730484,
          "description": "min=0.085, mean=0.106, max=0.133, sum=0.319 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5670000000000001,
          "description": "min=0.539, mean=0.567, max=0.603, sum=1.701 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6223333333333333,
          "description": "min=0.591, mean=0.622, max=0.651, sum=1.867 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.48513916883680525,
          "description": "min=0.43, mean=0.485, max=0.566, sum=1.455 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v1 (17B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7216666666666667,
          "description": "min=0.712, mean=0.722, max=0.733, sum=2.165 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.15409092997354776,
          "description": "min=0.139, mean=0.154, max=0.169, sum=0.462 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6429999999999999,
          "description": "min=0.632, mean=0.643, max=0.658, sum=1.929 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6783333333333333,
          "description": "min=0.656, mean=0.678, max=0.695, sum=2.035 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.5352501416015627,
          "description": "min=0.47, mean=0.535, max=0.624, sum=1.606 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v2 beta (17B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8123333333333332,
          "description": "min=0.799, mean=0.812, max=0.823, sum=2.437 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.16655399552246586,
          "description": "min=0.155, mean=0.167, max=0.185, sum=0.5 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6923333333333334,
          "description": "min=0.669, mean=0.692, max=0.714, sum=2.077 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7636666666666668,
          "description": "min=0.751, mean=0.764, max=0.784, sum=2.291 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Jumbo (178B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8290000000000001,
          "description": "min=0.818, mean=0.829, max=0.838, sum=2.487 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.17545319159294462,
          "description": "min=0.163, mean=0.175, max=0.198, sum=0.526 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7293333333333333,
          "description": "min=0.72, mean=0.729, max=0.736, sum=2.188 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7916666666666666,
          "description": "min=0.78, mean=0.792, max=0.798, sum=2.375 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0016666666666665,
          "description": "min=2, mean=2.002, max=2.003, sum=6.005 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Grande (17B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.826,
          "description": "min=0.816, mean=0.826, max=0.832, sum=2.478 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.20883844550071148,
          "description": "min=0.179, mean=0.209, max=0.243, sum=0.627 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.729,
          "description": "min=0.714, mean=0.729, max=0.743, sum=2.187 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7799999999999999,
          "description": "min=0.758, mean=0.78, max=0.791, sum=2.34 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.002,
          "description": "min=2.002, mean=2.002, max=2.002, sum=6.006 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Large (7.5B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7423333333333333,
          "description": "min=0.737, mean=0.742, max=0.747, sum=2.227 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14720347227904834,
          "description": "min=0.126, mean=0.147, max=0.165, sum=0.442 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6073333333333334,
          "description": "min=0.602, mean=0.607, max=0.615, sum=1.822 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.685,
          "description": "min=0.675, mean=0.685, max=0.697, sum=2.055 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Base (13B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7186666666666666,
          "description": "min=0.7, mean=0.719, max=0.74, sum=2.156 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.06557915095556173,
          "description": "min=0.056, mean=0.066, max=0.084, sum=0.197 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.655,
          "description": "min=0.643, mean=0.655, max=0.673, sum=1.965 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6526666666666667,
          "description": "min=0.634, mean=0.653, max=0.682, sum=1.958 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.002,
          "description": "min=1, mean=1.002, max=1.003, sum=3.006 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Extended (30B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7666666666666666,
          "description": "min=0.752, mean=0.767, max=0.794, sum=2.3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1289354797828563,
          "description": "min=0.11, mean=0.129, max=0.154, sum=0.387 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6586666666666666,
          "description": "min=0.637, mean=0.659, max=0.7, sum=1.976 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.711,
          "description": "min=0.692, mean=0.711, max=0.733, sum=2.133 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Supreme (70B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.775,
          "description": "min=0.748, mean=0.775, max=0.795, sum=2.325 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08277086924611576,
          "description": "min=0.06, mean=0.083, max=0.111, sum=0.248 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6653333333333333,
          "description": "min=0.624, mean=0.665, max=0.693, sum=1.996 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6936666666666667,
          "description": "min=0.66, mean=0.694, max=0.713, sum=2.081 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Anthropic-LM v4-s3 (52B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8153333333333332,
          "description": "min=0.814, mean=0.815, max=0.816, sum=2.446 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.035, mean=0.038, max=0.041, sum=0.114 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7563333333333334,
          "description": "min=0.751, mean=0.756, max=0.76, sum=2.269 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7816666666666667,
          "description": "min=0.778, mean=0.782, max=0.788, sum=2.345 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.6371923081597224,
          "description": "min=0.566, mean=0.637, max=0.75, sum=1.912 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.004,
          "description": "min=1.004, mean=1.004, max=1.004, sum=3.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "BLOOM (176B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7040000000000001,
          "description": "min=0.659, mean=0.704, max=0.728, sum=2.112 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.2086643852555177,
          "description": "min=0.153, mean=0.209, max=0.247, sum=0.626 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.642,
          "description": "min=0.595, mean=0.642, max=0.674, sum=1.926 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.656,
          "description": "min=0.601, mean=0.656, max=0.693, sum=1.968 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.852823399183769,
          "description": "min=0.665, mean=0.853, max=1.05, sum=2.558 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 897.1073333333333,
          "description": "min=636.774, mean=897.107, max=1242.774, sum=2691.322 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "T0pp (11B)\u2620",
          "description": "T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.32218942300251074,
          "description": "min=0.208, mean=0.322, max=0.435, sum=0.967 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.25,
          "description": "min=0, mean=0.25, max=0.5, sum=0.5 (2)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.3736038734018803,
          "description": "min=0.366, mean=0.374, max=0.385, sum=1.121 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 3.971666666666667,
          "description": "min=2.027, mean=3.972, max=4.988, sum=11.915 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 702.4380000000001,
          "description": "min=479.758, mean=702.438, max=905.932, sum=2107.314 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        }
      ],
      [
        {
          "value": "Cohere xlarge v20220609 (52.4B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7176666666666667,
          "description": "min=0.702, mean=0.718, max=0.74, sum=2.153 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.039674216829776156,
          "description": "min=0.037, mean=0.04, max=0.043, sum=0.119 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.614,
          "description": "min=0.601, mean=0.614, max=0.622, sum=1.842 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6666666666666666,
          "description": "min=0.657, mean=0.667, max=0.681, sum=2 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.5984045305989586,
          "description": "min=0.519, mean=0.598, max=0.705, sum=1.795 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0013333333333334,
          "description": "min=1, mean=1.001, max=1.004, sum=3.004 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere large v20220720 (13.1B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7253333333333334,
          "description": "min=0.705, mean=0.725, max=0.738, sum=2.176 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08825401206422555,
          "description": "min=0.066, mean=0.088, max=0.106, sum=0.265 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.545,
          "description": "min=0.514, mean=0.545, max=0.566, sum=1.635 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6756666666666667,
          "description": "min=0.653, mean=0.676, max=0.695, sum=2.027 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.4208381308593749,
          "description": "min=0.359, mean=0.421, max=0.505, sum=1.263 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20220720 (6.1B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.659,
          "description": "min=0.65, mean=0.659, max=0.667, sum=1.977 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08218351589951171,
          "description": "min=0.069, mean=0.082, max=0.093, sum=0.247 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5619999999999999,
          "description": "min=0.556, mean=0.562, max=0.573, sum=1.686 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5973333333333333,
          "description": "min=0.589, mean=0.597, max=0.61, sum=1.792 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.34952371158854173,
          "description": "min=0.308, mean=0.35, max=0.402, sum=1.049 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere small v20220720 (410M)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.45733333333333337,
          "description": "min=0.447, mean=0.457, max=0.464, sum=1.372 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.09496766959019069,
          "description": "min=0.072, mean=0.095, max=0.124, sum=0.285 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.361,
          "description": "min=0.352, mean=0.361, max=0.378, sum=1.083 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.37366666666666665,
          "description": "min=0.346, mean=0.374, max=0.396, sum=1.121 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.36694511328125,
          "description": "min=0.319, mean=0.367, max=0.436, sum=1.101 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0013333333333334,
          "description": "min=1, mean=1.001, max=1.004, sum=3.004 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere xlarge v20221108 (52.4B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7616666666666667,
          "description": "min=0.761, mean=0.762, max=0.763, sum=2.285 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.05127903463780418,
          "description": "min=0.037, mean=0.051, max=0.062, sum=0.154 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7176666666666667,
          "description": "min=0.712, mean=0.718, max=0.722, sum=2.153 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7079999999999999,
          "description": "min=0.702, mean=0.708, max=0.72, sum=2.124 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20221108 (6.1B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.6999999999999998,
          "description": "min=0.693, mean=0.7, max=0.704, sum=2.1 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.09459272512018041,
          "description": "min=0.088, mean=0.095, max=0.105, sum=0.284 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.54,
          "description": "min=0.508, mean=0.54, max=0.568, sum=1.62 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6416666666666667,
          "description": "min=0.626, mean=0.642, max=0.652, sum=1.925 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (6.1B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.798,
          "description": "min=0.791, mean=0.798, max=0.809, sum=2.394 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0594622129465324,
          "description": "min=0.048, mean=0.059, max=0.069, sum=0.178 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7253333333333334,
          "description": "min=0.715, mean=0.725, max=0.743, sum=2.176 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7479999999999999,
          "description": "min=0.74, mean=0.748, max=0.76, sum=2.244 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (52.4B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8563333333333333,
          "description": "min=0.849, mean=0.856, max=0.86, sum=2.569 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.02302613493537822,
          "description": "min=0.018, mean=0.023, max=0.026, sum=0.069 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8106666666666666,
          "description": "min=0.806, mean=0.811, max=0.816, sum=2.432 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.8216666666666667,
          "description": "min=0.812, mean=0.822, max=0.827, sum=2.465 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-J (6B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.6486666666666667,
          "description": "min=0.646, mean=0.649, max=0.65, sum=1.946 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.062432673938629946,
          "description": "min=0.043, mean=0.062, max=0.086, sum=0.187 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.621,
          "description": "min=0.608, mean=0.621, max=0.631, sum=1.863 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6386666666666666,
          "description": "min=0.638, mean=0.639, max=0.64, sum=1.916 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.49915384031836946,
          "description": "min=0.354, mean=0.499, max=0.575, sum=1.497 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-NeoX (20B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.6826666666666666,
          "description": "min=0.659, mean=0.683, max=0.714, sum=2.048 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.19500535688345313,
          "description": "min=0.168, mean=0.195, max=0.238, sum=0.585 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.551,
          "description": "min=0.548, mean=0.551, max=0.556, sum=1.653 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.609,
          "description": "min=0.594, mean=0.609, max=0.629, sum=1.827 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.772616056262233,
          "description": "min=0.515, mean=0.773, max=1.206, sum=2.318 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 913.8969999999999,
          "description": "min=656.897, mean=913.897, max=1251.897, sum=2741.691 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Pythia (6.9B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.631,
          "description": "min=0.631, mean=0.631, max=0.631, sum=0.631 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.10596147166386737,
          "description": "min=0.106, mean=0.106, max=0.106, sum=0.106 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.527,
          "description": "min=0.527, mean=0.527, max=0.527, sum=0.527 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.552,
          "description": "min=0.552, mean=0.552, max=0.552, sum=0.552 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Pythia (12B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.662,
          "description": "min=0.662, mean=0.662, max=0.662, sum=0.662 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.13986557582802048,
          "description": "min=0.14, mean=0.14, max=0.14, sum=0.14 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.51,
          "description": "min=0.51, mean=0.51, max=0.51, sum=0.51 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.547,
          "description": "min=0.547, mean=0.547, max=0.547, sum=0.547 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "T5 (11B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7610000000000001,
          "description": "min=0.732, mean=0.761, max=0.803, sum=2.283 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.43269382093398495,
          "description": "min=0.348, mean=0.433, max=0.512, sum=1.298 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6503333333333333,
          "description": "min=0.624, mean=0.65, max=0.688, sum=1.951 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7226666666666667,
          "description": "min=0.697, mean=0.723, max=0.766, sum=2.168 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6666666666666666,
          "description": "min=0.667, mean=0.667, max=0.667, sum=2 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.375,
          "description": "min=0.125, mean=0.375, max=0.5, sum=1.125 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.27128291567197677,
          "description": "min=0.27, mean=0.271, max=0.272, sum=0.814 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.5883333333333332,
          "description": "min=0.969, mean=1.588, max=2.006, sum=4.765 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.004,
          "description": "min=0.004, mean=0.004, max=0.004, sum=0.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 401.94433333333336,
          "description": "min=386.367, mean=401.944, max=422.649, sum=1205.833 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "UL2 (20B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7456666666666667,
          "description": "min=0.717, mean=0.746, max=0.762, sum=2.237 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.45980755585445926,
          "description": "min=0.416, mean=0.46, max=0.512, sum=1.379 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.646,
          "description": "min=0.638, mean=0.646, max=0.651, sum=1.938 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6983333333333334,
          "description": "min=0.672, mean=0.698, max=0.714, sum=2.095 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.23015873015873015,
          "description": "min=0.167, mean=0.23, max=0.357, sum=0.69 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.001,
          "description": "min=0.001, mean=0.001, max=0.001, sum=0.003 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.3127442524572212,
          "description": "min=0.292, mean=0.313, max=0.341, sum=0.938 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.5696666666666668,
          "description": "min=0.953, mean=1.57, max=1.978, sum=4.709 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.004,
          "description": "min=0.004, mean=0.004, max=0.004, sum=0.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 402.2846666666667,
          "description": "min=386.826, mean=402.285, max=424.449, sum=1206.854 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (175B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.793,
          "description": "min=0.777, mean=0.793, max=0.813, sum=2.379 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.19360710050007168,
          "description": "min=0.177, mean=0.194, max=0.218, sum=0.581 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.623,
          "description": "min=0.584, mean=0.623, max=0.662, sum=1.869 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.731,
          "description": "min=0.712, mean=0.731, max=0.746, sum=2.193 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.869335141547284,
          "description": "min=0.71, mean=0.869, max=0.954, sum=2.608 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (66B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7603333333333332,
          "description": "min=0.753, mean=0.76, max=0.764, sum=2.281 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.20047176103986394,
          "description": "min=0.193, mean=0.2, max=0.206, sum=0.601 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6829999999999999,
          "description": "min=0.666, mean=0.683, max=0.701, sum=2.049 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7103333333333333,
          "description": "min=0.696, mean=0.71, max=0.721, sum=2.131 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8336340090708299,
          "description": "min=0.272, mean=0.834, max=1.907, sum=2.501 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.756,
          "description": "min=0.756, mean=0.756, max=0.756, sum=0.756 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.292, mean=0.292, max=0.292, sum=0.292 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.688,
          "description": "min=0.688, mean=0.688, max=0.688, sum=0.688 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.71,
          "description": "min=0.71, mean=0.71, max=0.71, sum=0.71 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (13B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.714,
          "description": "min=0.714, mean=0.714, max=0.714, sum=0.714 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.078, mean=0.078, max=0.078, sum=0.078 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.67,
          "description": "min=0.67, mean=0.67, max=0.67, sum=0.67 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.666,
          "description": "min=0.666, mean=0.666, max=0.666, sum=0.666 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (30B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.861,
          "description": "min=0.861, mean=0.861, max=0.861, sum=0.861 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.164, mean=0.164, max=0.164, sum=0.164 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.791,
          "description": "min=0.791, mean=0.791, max=0.791, sum=0.791 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.813,
          "description": "min=0.813, mean=0.813, max=0.813, sum=0.813 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (65B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.871,
          "description": "min=0.871, mean=0.871, max=0.871, sum=0.871 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.84,
          "description": "min=0.84, mean=0.84, max=0.84, sum=0.84 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.847,
          "description": "min=0.847, mean=0.847, max=0.847, sum=0.847 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.762,
          "description": "min=0.762, mean=0.762, max=0.762, sum=0.762 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.215, mean=0.215, max=0.215, sum=0.215 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.676,
          "description": "min=0.676, mean=0.676, max=0.676, sum=0.676 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.706,
          "description": "min=0.706, mean=0.706, max=0.706, sum=0.706 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.296,
          "description": "min=1.296, mean=1.296, max=1.296, sum=1.296 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (13B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.811,
          "description": "min=0.811, mean=0.811, max=0.811, sum=0.811 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.116, mean=0.116, max=0.116, sum=0.116 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.753,
          "description": "min=0.753, mean=0.753, max=0.753, sum=0.753 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.732,
          "description": "min=0.732, mean=0.732, max=0.732, sum=0.732 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (70B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.886,
          "description": "min=0.886, mean=0.886, max=0.886, sum=0.886 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.863,
          "description": "min=0.863, mean=0.863, max=0.863, sum=0.863 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.859,
          "description": "min=0.859, mean=0.859, max=0.859, sum=0.859 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Alpaca (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.778,
          "description": "min=0.778, mean=0.778, max=0.778, sum=0.778 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.3432802705941571,
          "description": "min=0.343, mean=0.343, max=0.343, sum=0.343 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.643,
          "description": "min=0.643, mean=0.643, max=0.643, sum=0.643 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.729,
          "description": "min=0.729, mean=0.729, max=0.729, sum=0.729 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=0.5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.883,
          "description": "min=4.883, mean=4.883, max=4.883, sum=4.883 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Vicuna v1.3 (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.76,
          "description": "min=0.76, mean=0.76, max=0.76, sum=0.76 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.322404542566261,
          "description": "min=0.322, mean=0.322, max=0.322, sum=0.322 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.672,
          "description": "min=0.672, mean=0.672, max=0.672, sum=0.672 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.67,
          "description": "min=0.67, mean=0.67, max=0.67, sum=0.67 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.412,
          "description": "min=4.412, mean=4.412, max=4.412, sum=4.412 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Vicuna v1.3 (13B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.808,
          "description": "min=0.808, mean=0.808, max=0.808, sum=0.808 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.15912327464389103,
          "description": "min=0.159, mean=0.159, max=0.159, sum=0.159 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.757,
          "description": "min=0.757, mean=0.757, max=0.757, sum=0.757 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.748,
          "description": "min=0.748, mean=0.748, max=0.748, sum=0.748 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.996,
          "description": "min=4.996, mean=4.996, max=4.996, sum=4.996 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Mistral v0.1 (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.874,
          "description": "min=0.874, mean=0.874, max=0.874, sum=0.874 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.837,
          "description": "min=0.837, mean=0.837, max=0.837, sum=0.837 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.842,
          "description": "min=0.842, mean=0.842, max=0.842, sum=0.842 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1418.259,
          "description": "min=1418.259, mean=1418.259, max=1418.259, sum=1418.259 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (530B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8093333333333333,
          "description": "min=0.798, mean=0.809, max=0.829, sum=2.428 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.04811928896988451,
          "description": "min=0.017, mean=0.048, max=0.088, sum=0.144 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7326666666666667,
          "description": "min=0.724, mean=0.733, max=0.747, sum=2.198 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7666666666666667,
          "description": "min=0.756, mean=0.767, max=0.777, sum=2.3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (6.7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.6983333333333334,
          "description": "min=0.685, mean=0.698, max=0.709, sum=2.095 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.06514212406382298,
          "description": "min=0.063, mean=0.065, max=0.067, sum=0.195 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.638,
          "description": "min=0.623, mean=0.638, max=0.653, sum=1.914 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6653333333333333,
          "description": "min=0.649, mean=0.665, max=0.674, sum=1.996 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "davinci (175B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.7223333333333333,
          "description": "min=0.679, mean=0.722, max=0.77, sum=2.167 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.07164645838795872,
          "description": "min=0.047, mean=0.072, max=0.103, sum=0.215 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.6393333333333334,
          "description": "min=0.592, mean=0.639, max=0.677, sum=1.918 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.682,
          "description": "min=0.635, mean=0.682, max=0.729, sum=2.046 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.21022733463541673,
          "description": "min=0.204, mean=0.21, max=0.217, sum=0.631 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "curie (6.7B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.6563333333333333,
          "description": "min=0.597, mean=0.656, max=0.704, sum=1.969 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.07881150352718548,
          "description": "min=0.051, mean=0.079, max=0.115, sum=0.236 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.545,
          "description": "min=0.484, mean=0.545, max=0.599, sum=1.635 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.594,
          "description": "min=0.535, mean=0.594, max=0.631, sum=1.782 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09988102712673615,
          "description": "min=0.096, mean=0.1, max=0.104, sum=0.3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "babbage (1.3B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.5743333333333334,
          "description": "min=0.52, mean=0.574, max=0.623, sum=1.723 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06758031979129187,
          "description": "min=0.036, mean=0.068, max=0.089, sum=0.203 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.47700000000000004,
          "description": "min=0.432, mean=0.477, max=0.522, sum=1.431 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.43566666666666665,
          "description": "min=0.404, mean=0.436, max=0.457, sum=1.307 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.12137238953993056,
          "description": "min=0.119, mean=0.121, max=0.125, sum=0.364 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "ada (350M)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.5810000000000001,
          "description": "min=0.525, mean=0.581, max=0.627, sum=1.743 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06655133808072823,
          "description": "min=0.049, mean=0.067, max=0.09, sum=0.2 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.461,
          "description": "min=0.349, mean=0.461, max=0.549, sum=1.383 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5066666666666667,
          "description": "min=0.421, mean=0.507, max=0.575, sum=1.52 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14052770182291666,
          "description": "min=0.14, mean=0.141, max=0.141, sum=0.422 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.004,
          "description": "min=1, mean=1.004, max=1.008, sum=3.012 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-davinci-003\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.8813333333333334,
          "description": "min=0.879, mean=0.881, max=0.883, sum=2.644 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09835218401604591,
          "description": "min=0.097, mean=0.098, max=0.099, sum=0.295 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8576666666666667,
          "description": "min=0.851, mean=0.858, max=0.864, sum=2.573 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.858,
          "description": "min=0.854, mean=0.858, max=0.861, sum=2.574 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=1 (2)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0006666666666666666,
          "description": "min=0, mean=0.001, max=0.001, sum=0.002 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0433333333333332,
          "description": "min=1.036, mean=1.043, max=1.058, sum=3.13 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-davinci-002\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.8769999999999999,
          "description": "min=0.872, mean=0.877, max=0.883, sum=2.631 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06391934132499137,
          "description": "min=0.057, mean=0.064, max=0.068, sum=0.192 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8410000000000001,
          "description": "min=0.834, mean=0.841, max=0.854, sum=2.523 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8366666666666666,
          "description": "min=0.829, mean=0.837, max=0.844, sum=2.51 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.1911954346788195,
          "description": "min=0.176, mean=0.191, max=0.216, sum=0.574 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.013,
          "description": "min=1.009, mean=1.013, max=1.018, sum=3.039 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-curie-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.6203333333333334,
          "description": "min=0.591, mean=0.62, max=0.638, sum=1.861 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.252648729019218,
          "description": "min=0.239, mean=0.253, max=0.279, sum=0.758 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5493333333333332,
          "description": "min=0.519, mean=0.549, max=0.566, sum=1.648 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5756666666666667,
          "description": "min=0.543, mean=0.576, max=0.592, sum=1.727 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14293199392361097,
          "description": "min=0.141, mean=0.143, max=0.146, sum=0.429 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.007,
          "description": "min=1.004, mean=1.007, max=1.012, sum=3.021 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-babbage-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.451,
          "description": "min=0.414, mean=0.451, max=0.477, sum=1.353 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.34372183455656985,
          "description": "min=0.318, mean=0.344, max=0.371, sum=1.031 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.38366666666666666,
          "description": "min=0.339, mean=0.384, max=0.412, sum=1.151 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.41,
          "description": "min=0.388, mean=0.41, max=0.43, sum=1.23 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14212787000868074,
          "description": "min=0.136, mean=0.142, max=0.15, sum=0.426 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.004,
          "description": "min=1, mean=1.004, max=1.008, sum=3.012 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-ada-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "markdown": false
        },
        {
          "value": 0.46399999999999997,
          "description": "min=0.405, mean=0.464, max=0.503, sum=1.392 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.34632807207915267,
          "description": "min=0.257, mean=0.346, max=0.483, sum=1.039 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.33233333333333337,
          "description": "min=0.316, mean=0.332, max=0.362, sum=0.997 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.37799999999999995,
          "description": "min=0.364, mean=0.378, max=0.397, sum=1.134 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09557654231770833,
          "description": "min=0.09, mean=0.096, max=0.103, sum=0.287 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.003,
          "description": "min=0.995, mean=1.003, max=1.009, sum=3.009 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "gpt-3.5-turbo-0301",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.74,
          "description": "min=0.74, mean=0.74, max=0.74, sum=0.74 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.66,
          "description": "min=0.66, mean=0.66, max=0.66, sum=0.66 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.666,
          "description": "min=0.666, mean=0.666, max=0.666, sum=0.666 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=0.5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1220.329,
          "description": "min=1220.329, mean=1220.329, max=1220.329, sum=1220.329 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.932,
          "description": "min=1.932, mean=1.932, max=1.932, sum=1.932 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "gpt-3.5-turbo-0613",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.87,
          "description": "min=0.87, mean=0.87, max=0.87, sum=0.87 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.845,
          "description": "min=0.845, mean=0.845, max=0.845, sum=0.845 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.817,
          "description": "min=0.817, mean=0.817, max=0.817, sum=0.817 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1220.329,
          "description": "min=1220.329, mean=1220.329, max=1220.329, sum=1220.329 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.057,
          "description": "min=1.057, mean=1.057, max=1.057, sum=1.057 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Base-v1 (3B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.685,
          "description": "min=0.685, mean=0.685, max=0.685, sum=0.685 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1865846445420437,
          "description": "min=0.187, mean=0.187, max=0.187, sum=0.187 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.585,
          "description": "min=0.585, mean=0.585, max=0.585, sum=0.585 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.624,
          "description": "min=0.624, mean=0.624, max=0.624, sum=0.624 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Instruct-v1 (3B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.677,
          "description": "min=0.677, mean=0.677, max=0.677, sum=0.677 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14082220350962116,
          "description": "min=0.141, mean=0.141, max=0.141, sum=0.141 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.629,
          "description": "min=0.629, mean=0.629, max=0.629, sum=0.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.648,
          "description": "min=0.648, mean=0.648, max=0.648, sum=0.648 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Base (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.713,
          "description": "min=0.713, mean=0.713, max=0.713, sum=0.713 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1268200294718189,
          "description": "min=0.127, mean=0.127, max=0.127, sum=0.127 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.569,
          "description": "min=0.569, mean=0.569, max=0.569, sum=0.569 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.65,
          "description": "min=0.65, mean=0.65, max=0.65, sum=0.65 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Instruct (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.705,
          "description": "min=0.705, mean=0.705, max=0.705, sum=0.705 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.034644312737608846,
          "description": "min=0.035, mean=0.035, max=0.035, sum=0.035 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.599,
          "description": "min=0.599, mean=0.599, max=0.599, sum=0.599 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.616,
          "description": "min=0.616, mean=0.616, max=0.616, sum=0.616 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "MPT (30B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.704,
          "description": "min=0.704, mean=0.704, max=0.704, sum=0.704 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.656,
          "description": "min=0.656, mean=0.656, max=0.656, sum=0.656 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.631,
          "description": "min=0.631, mean=0.631, max=0.631, sum=0.631 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "MPT-Instruct (30B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.85,
          "description": "min=0.85, mean=0.85, max=0.85, sum=0.85 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.77,
          "description": "min=0.77, mean=0.77, max=0.77, sum=0.77 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.807,
          "description": "min=0.807, mean=0.807, max=0.807, sum=0.807 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.753,
          "description": "min=0.753, mean=0.753, max=0.753, sum=0.753 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.65,
          "description": "min=0.65, mean=0.65, max=0.65, sum=0.65 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.702,
          "description": "min=0.702, mean=0.702, max=0.702, sum=0.702 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon-Instruct (7B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.72,
          "description": "min=0.72, mean=0.72, max=0.72, sum=0.72 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.593,
          "description": "min=0.593, mean=0.593, max=0.593, sum=0.593 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.637,
          "description": "min=0.637, mean=0.637, max=0.637, sum=0.637 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon (40B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.819,
          "description": "min=0.819, mean=0.819, max=0.819, sum=0.819 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.763,
          "description": "min=0.763, mean=0.763, max=0.763, sum=0.763 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.783,
          "description": "min=0.783, mean=0.783, max=0.783, sum=0.783 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon-Instruct (40B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.829,
          "description": "min=0.829, mean=0.829, max=0.829, sum=0.829 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.781,
          "description": "min=0.781, mean=0.781, max=0.781, sum=0.781 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.799,
          "description": "min=0.799, mean=0.799, max=0.799, sum=0.799 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GLM (130B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7836666666666666,
          "description": "min=0.729, mean=0.784, max=0.819, sum=2.351 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1710477879835662,
          "description": "min=0.111, mean=0.171, max=0.205, sum=0.513 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7276666666666668,
          "description": "min=0.68, mean=0.728, max=0.758, sum=2.183 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6896666666666667,
          "description": "min=0.625, mean=0.69, max=0.722, sum=2.069 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 1.1913305165274586,
          "description": "min=0.942, mean=1.191, max=1.332, sum=3.574 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 931.4243333333333,
          "description": "min=679.091, mean=931.424, max=1276.091, sum=2794.273 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "InstructPalmyra (30B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.7513333333333333,
          "description": "min=0.698, mean=0.751, max=0.798, sum=2.254 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.6556666666666667,
          "description": "min=0.564, mean=0.656, max=0.719, sum=1.967 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6996666666666668,
          "description": "min=0.636, mean=0.7, max=0.762, sum=2.099 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Palmyra X (43B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.8963333333333333,
          "description": "min=0.894, mean=0.896, max=0.898, sum=2.689 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.878,
          "description": "min=0.875, mean=0.878, max=0.88, sum=2.634 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.875,
          "description": "min=0.872, mean=0.875, max=0.878, sum=2.625 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.007,
          "description": "min=1.005, mean=1.007, max=1.01, sum=3.021 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "YaLM (100B)",
          "description": "",
          "markdown": false
        },
        {
          "value": 0.634,
          "description": "min=0.631, mean=0.634, max=0.64, sum=1.902 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14717484078898194,
          "description": "min=0.114, mean=0.147, max=0.167, sum=0.442 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.566,
          "description": "min=0.437, mean=0.566, max=0.631, sum=1.698 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5826666666666667,
          "description": "min=0.486, mean=0.583, max=0.631, sum=1.748 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8282727491158176,
          "description": "min=0.546, mean=0.828, max=1.136, sum=2.485 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 899.0056666666666,
          "description": "min=649.339, mean=899.006, max=1233.339, sum=2697.017 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ]
    ],
    "links": [
      {
        "text": "LaTeX",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/latex/boolq_boolq.tex"
      },
      {
        "text": "JSON",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/json/boolq_boolq.json"
      }
    ],
    "name": "boolq"
  },
  {
    "title": "only_contrast: False",
    "header": [
      {
        "value": "Model/adapter",
        "markdown": false,
        "metadata": {}
      },
      {
        "value": "EM",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "ECE (10-bin)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n10-bin expected calibration error: The average difference between the model's confidence and accuracy, averaged across 10 bins where each bin contains an equal number of points (only computed for classification tasks). Warning - not reliable for small datasets (e.g., with < 300 examples) because each bin will have very few examples.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "ECE (10-bin)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "EM (Robustness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Robustness: Computes worst case over different robustness perturbations (misspellings, formatting, contrast sets).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Robustness"
        }
      },
      {
        "value": "EM (Fairness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Fairness: Computes worst case over different fairness perturbations (changing dialect, race of names, gender).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Fairness"
        }
      },
      {
        "value": "Stereotypes (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (race, profession): Measures uneven association of racial groups (Asian, Hispanic, White) with target professions. This measurement is based on cooccurence statistics between the racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Stereotypes (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (gender, profession): Measures uneven association of gender groups (male, female) with target professions. This measurement is based on cooccurence statistics between the gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (race): Measures uneven representation of racial groups (Asian, Hispanic, White). This measurement is based on disparities in the frequency statistics across racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (gender): Measures uneven representation of gender groups (male, female). This measurement is based on disparities in the frequency statistics across gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Toxic fraction",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nToxic fraction: Fraction of model outputs that are toxic (based on the PerspectiveAPI toxicity classifier).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Toxic fraction",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Denoised inference time (s)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDenoised inference runtime (s): Average time to process a request to the model minus performance contention by using profiled runtimes from multiple trials of SyntheticEfficiencyScenario.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Denoised inference time (s)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# eval",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# eval: Number of evaluation instances.",
        "markdown": false,
        "metadata": {
          "metric": "# eval",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# train",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# train: Number of training instances (e.g., in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "# train",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "truncated",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\ntruncated: Fraction of instances where the prompt itself was truncated (implies that there were no in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "truncated",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# prompt tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# prompt tokens: Number of tokens in the prompt.",
        "markdown": false,
        "metadata": {
          "metric": "# prompt tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# output tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# output tokens: Actual number of output tokens.",
        "markdown": false,
        "metadata": {
          "metric": "# output tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# trials",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# trials: Number of trials, where in each trial we choose an independent, random set of training instances.",
        "markdown": false,
        "metadata": {
          "metric": "# trials",
          "run_group": "BoolQ"
        }
      }
    ],
    "rows": [
      [
        {
          "value": "J1-Jumbo v1 (178B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j1-jumbo%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7756666666666666,
          "description": "min=0.766, mean=0.776, max=0.786, sum=2.327 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.21546167732589497,
          "description": "min=0.205, mean=0.215, max=0.223, sum=0.646 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6496666666666667,
          "description": "min=0.635, mean=0.65, max=0.659, sum=1.949 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7093333333333334,
          "description": "min=0.693, mean=0.709, max=0.73, sum=2.128 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.6195252891710069,
          "description": "min=0.55, mean=0.62, max=0.727, sum=1.859 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Large v1 (7.5B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j1-large%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6833333333333332,
          "description": "min=0.652, mean=0.683, max=0.709, sum=2.05 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.10621693084730484,
          "description": "min=0.085, mean=0.106, max=0.133, sum=0.319 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5670000000000001,
          "description": "min=0.539, mean=0.567, max=0.603, sum=1.701 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6223333333333333,
          "description": "min=0.591, mean=0.622, max=0.651, sum=1.867 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.48513916883680525,
          "description": "min=0.43, mean=0.485, max=0.566, sum=1.455 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v1 (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j1-grande%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7216666666666667,
          "description": "min=0.712, mean=0.722, max=0.733, sum=2.165 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.15409092997354776,
          "description": "min=0.139, mean=0.154, max=0.169, sum=0.462 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6429999999999999,
          "description": "min=0.632, mean=0.643, max=0.658, sum=1.929 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6783333333333333,
          "description": "min=0.656, mean=0.678, max=0.695, sum=2.035 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.5352501416015627,
          "description": "min=0.47, mean=0.535, max=0.624, sum=1.606 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v2 beta (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j1-grande-v2-beta%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8123333333333332,
          "description": "min=0.799, mean=0.812, max=0.823, sum=2.437 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.16655399552246586,
          "description": "min=0.155, mean=0.167, max=0.185, sum=0.5 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6923333333333334,
          "description": "min=0.669, mean=0.692, max=0.714, sum=2.077 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7636666666666668,
          "description": "min=0.751, mean=0.764, max=0.784, sum=2.291 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Jumbo (178B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j2-jumbo%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8290000000000001,
          "description": "min=0.818, mean=0.829, max=0.838, sum=2.487 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.17545319159294462,
          "description": "min=0.163, mean=0.175, max=0.198, sum=0.526 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7293333333333333,
          "description": "min=0.72, mean=0.729, max=0.736, sum=2.188 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7916666666666666,
          "description": "min=0.78, mean=0.792, max=0.798, sum=2.375 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0016666666666665,
          "description": "min=2, mean=2.002, max=2.003, sum=6.005 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Grande (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j2-grande%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.826,
          "description": "min=0.816, mean=0.826, max=0.832, sum=2.478 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.20883844550071148,
          "description": "min=0.179, mean=0.209, max=0.243, sum=0.627 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.729,
          "description": "min=0.714, mean=0.729, max=0.743, sum=2.187 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7799999999999999,
          "description": "min=0.758, mean=0.78, max=0.791, sum=2.34 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.002,
          "description": "min=2.002, mean=2.002, max=2.002, sum=6.006 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Large (7.5B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dai21_j2-large%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7423333333333333,
          "description": "min=0.737, mean=0.742, max=0.747, sum=2.227 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14720347227904834,
          "description": "min=0.126, mean=0.147, max=0.165, sum=0.442 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6073333333333334,
          "description": "min=0.602, mean=0.607, max=0.615, sum=1.822 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.685,
          "description": "min=0.675, mean=0.685, max=0.697, sum=2.055 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 694.6516666666666,
          "description": "min=506.985, mean=694.652, max=952.985, sum=2083.955 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Base (13B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3DAlephAlpha_luminous-base%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7186666666666666,
          "description": "min=0.7, mean=0.719, max=0.74, sum=2.156 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.06557915095556173,
          "description": "min=0.056, mean=0.066, max=0.084, sum=0.197 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.655,
          "description": "min=0.643, mean=0.655, max=0.673, sum=1.965 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6526666666666667,
          "description": "min=0.634, mean=0.653, max=0.682, sum=1.958 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.002,
          "description": "min=1, mean=1.002, max=1.003, sum=3.006 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Extended (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3DAlephAlpha_luminous-extended%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7666666666666666,
          "description": "min=0.752, mean=0.767, max=0.794, sum=2.3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1289354797828563,
          "description": "min=0.11, mean=0.129, max=0.154, sum=0.387 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6586666666666666,
          "description": "min=0.637, mean=0.659, max=0.7, sum=1.976 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.711,
          "description": "min=0.692, mean=0.711, max=0.733, sum=2.133 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Supreme (70B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3DAlephAlpha_luminous-supreme%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.775,
          "description": "min=0.748, mean=0.775, max=0.795, sum=2.325 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08277086924611576,
          "description": "min=0.06, mean=0.083, max=0.111, sum=0.248 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6653333333333333,
          "description": "min=0.624, mean=0.665, max=0.693, sum=1.996 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6936666666666667,
          "description": "min=0.66, mean=0.694, max=0.713, sum=2.081 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.9913333333333,
          "description": "min=651.658, mean=908.991, max=1252.658, sum=2726.974 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Anthropic-LM v4-s3 (52B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Danthropic_stanford-online-all-v4-s3%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8153333333333332,
          "description": "min=0.814, mean=0.815, max=0.816, sum=2.446 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.035, mean=0.038, max=0.041, sum=0.114 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7563333333333334,
          "description": "min=0.751, mean=0.756, max=0.76, sum=2.269 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7816666666666667,
          "description": "min=0.778, mean=0.782, max=0.788, sum=2.345 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.6371923081597224,
          "description": "min=0.566, mean=0.637, max=0.75, sum=1.912 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.004,
          "description": "min=1.004, mean=1.004, max=1.004, sum=3.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "BLOOM (176B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_bloom%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7040000000000001,
          "description": "min=0.659, mean=0.704, max=0.728, sum=2.112 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.2086643852555177,
          "description": "min=0.153, mean=0.209, max=0.247, sum=0.626 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.642,
          "description": "min=0.595, mean=0.642, max=0.674, sum=1.926 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.656,
          "description": "min=0.601, mean=0.656, max=0.693, sum=1.968 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.852823399183769,
          "description": "min=0.665, mean=0.853, max=1.05, sum=2.558 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 897.1073333333333,
          "description": "min=636.774, mean=897.107, max=1242.774, sum=2691.322 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "T0pp (11B)\u2620",
          "description": "T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_t0pp%2Cdata_augmentation%3Dcanonical%2Cstop%3Dhash%22%5D",
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.32218942300251074,
          "description": "min=0.208, mean=0.322, max=0.435, sum=0.967 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "description": "(0)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.25,
          "description": "min=0, mean=0.25, max=0.5, sum=0.5 (2)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.3736038734018803,
          "description": "min=0.366, mean=0.374, max=0.385, sum=1.121 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 3.971666666666667,
          "description": "min=2.027, mean=3.972, max=4.988, sum=11.915 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 702.4380000000001,
          "description": "min=479.758, mean=702.438, max=905.932, sum=2107.314 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u2620 T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "style": {
            "color": "lightgray"
          },
          "markdown": false,
          "contamination_level": "strong"
        }
      ],
      [
        {
          "value": "Cohere xlarge v20220609 (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_xlarge-20220609%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7176666666666667,
          "description": "min=0.702, mean=0.718, max=0.74, sum=2.153 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.039674216829776156,
          "description": "min=0.037, mean=0.04, max=0.043, sum=0.119 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.614,
          "description": "min=0.601, mean=0.614, max=0.622, sum=1.842 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6666666666666666,
          "description": "min=0.657, mean=0.667, max=0.681, sum=2 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.5984045305989586,
          "description": "min=0.519, mean=0.598, max=0.705, sum=1.795 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0013333333333334,
          "description": "min=1, mean=1.001, max=1.004, sum=3.004 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere large v20220720 (13.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_large-20220720%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7253333333333334,
          "description": "min=0.705, mean=0.725, max=0.738, sum=2.176 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08825401206422555,
          "description": "min=0.066, mean=0.088, max=0.106, sum=0.265 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.545,
          "description": "min=0.514, mean=0.545, max=0.566, sum=1.635 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6756666666666667,
          "description": "min=0.653, mean=0.676, max=0.695, sum=2.027 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.4208381308593749,
          "description": "min=0.359, mean=0.421, max=0.505, sum=1.263 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20220720 (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_medium-20220720%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.659,
          "description": "min=0.65, mean=0.659, max=0.667, sum=1.977 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.08218351589951171,
          "description": "min=0.069, mean=0.082, max=0.093, sum=0.247 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5619999999999999,
          "description": "min=0.556, mean=0.562, max=0.573, sum=1.686 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5973333333333333,
          "description": "min=0.589, mean=0.597, max=0.61, sum=1.792 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.34952371158854173,
          "description": "min=0.308, mean=0.35, max=0.402, sum=1.049 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere small v20220720 (410M)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_small-20220720%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.45733333333333337,
          "description": "min=0.447, mean=0.457, max=0.464, sum=1.372 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.09496766959019069,
          "description": "min=0.072, mean=0.095, max=0.124, sum=0.285 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.361,
          "description": "min=0.352, mean=0.361, max=0.378, sum=1.083 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.37366666666666665,
          "description": "min=0.346, mean=0.374, max=0.396, sum=1.121 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.36694511328125,
          "description": "min=0.319, mean=0.367, max=0.436, sum=1.101 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0013333333333334,
          "description": "min=1, mean=1.001, max=1.004, sum=3.004 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere xlarge v20221108 (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_xlarge-20221108%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7616666666666667,
          "description": "min=0.761, mean=0.762, max=0.763, sum=2.285 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.05127903463780418,
          "description": "min=0.037, mean=0.051, max=0.062, sum=0.154 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7176666666666667,
          "description": "min=0.712, mean=0.718, max=0.722, sum=2.153 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7079999999999999,
          "description": "min=0.702, mean=0.708, max=0.72, sum=2.124 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20221108 (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_medium-20221108%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6999999999999998,
          "description": "min=0.693, mean=0.7, max=0.704, sum=2.1 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.09459272512018041,
          "description": "min=0.088, mean=0.095, max=0.105, sum=0.284 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.54,
          "description": "min=0.508, mean=0.54, max=0.568, sum=1.62 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6416666666666667,
          "description": "min=0.626, mean=0.642, max=0.652, sum=1.925 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_command-medium-beta%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.798,
          "description": "min=0.791, mean=0.798, max=0.809, sum=2.394 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0594622129465324,
          "description": "min=0.048, mean=0.059, max=0.069, sum=0.178 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7253333333333334,
          "description": "min=0.715, mean=0.725, max=0.743, sum=2.176 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7479999999999999,
          "description": "min=0.74, mean=0.748, max=0.76, sum=2.244 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dcohere_command-xlarge-beta%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8563333333333333,
          "description": "min=0.849, mean=0.856, max=0.86, sum=2.569 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.02302613493537822,
          "description": "min=0.018, mean=0.023, max=0.026, sum=0.069 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8106666666666666,
          "description": "min=0.806, mean=0.811, max=0.816, sum=2.432 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.8216666666666667,
          "description": "min=0.812, mean=0.822, max=0.827, sum=2.465 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 925.3070000000001,
          "description": "min=669.307, mean=925.307, max=1269.307, sum=2775.921 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-J (6B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_gpt-j-6b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6486666666666667,
          "description": "min=0.646, mean=0.649, max=0.65, sum=1.946 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.062432673938629946,
          "description": "min=0.043, mean=0.062, max=0.086, sum=0.187 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.621,
          "description": "min=0.608, mean=0.621, max=0.631, sum=1.863 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6386666666666666,
          "description": "min=0.638, mean=0.639, max=0.64, sum=1.916 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.49915384031836946,
          "description": "min=0.354, mean=0.499, max=0.575, sum=1.497 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-NeoX (20B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_gpt-neox-20b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6826666666666666,
          "description": "min=0.659, mean=0.683, max=0.714, sum=2.048 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.19500535688345313,
          "description": "min=0.168, mean=0.195, max=0.238, sum=0.585 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.551,
          "description": "min=0.548, mean=0.551, max=0.556, sum=1.653 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.609,
          "description": "min=0.594, mean=0.609, max=0.629, sum=1.827 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.772616056262233,
          "description": "min=0.515, mean=0.773, max=1.206, sum=2.318 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 913.8969999999999,
          "description": "min=656.897, mean=913.897, max=1251.897, sum=2741.691 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Pythia (6.9B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Deleutherai_pythia-6.9b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.631,
          "description": "min=0.631, mean=0.631, max=0.631, sum=0.631 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.10596147166386737,
          "description": "min=0.106, mean=0.106, max=0.106, sum=0.106 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.527,
          "description": "min=0.527, mean=0.527, max=0.527, sum=0.527 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.552,
          "description": "min=0.552, mean=0.552, max=0.552, sum=0.552 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Pythia (12B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Deleutherai_pythia-12b-v0%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.662,
          "description": "min=0.662, mean=0.662, max=0.662, sum=0.662 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.13986557582802048,
          "description": "min=0.14, mean=0.14, max=0.14, sum=0.14 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.51,
          "description": "min=0.51, mean=0.51, max=0.51, sum=0.51 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.547,
          "description": "min=0.547, mean=0.547, max=0.547, sum=0.547 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "T5 (11B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_t5-11b%2Cdata_augmentation%3Dcanonical%2Cstop%3Dhash%22%5D",
          "markdown": false
        },
        {
          "value": 0.7610000000000001,
          "description": "min=0.732, mean=0.761, max=0.803, sum=2.283 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.43269382093398495,
          "description": "min=0.348, mean=0.433, max=0.512, sum=1.298 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6503333333333333,
          "description": "min=0.624, mean=0.65, max=0.688, sum=1.951 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7226666666666667,
          "description": "min=0.697, mean=0.723, max=0.766, sum=2.168 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6666666666666666,
          "description": "min=0.667, mean=0.667, max=0.667, sum=2 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.375,
          "description": "min=0.125, mean=0.375, max=0.5, sum=1.125 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.27128291567197677,
          "description": "min=0.27, mean=0.271, max=0.272, sum=0.814 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.5883333333333332,
          "description": "min=0.969, mean=1.588, max=2.006, sum=4.765 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.004,
          "description": "min=0.004, mean=0.004, max=0.004, sum=0.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 401.94433333333336,
          "description": "min=386.367, mean=401.944, max=422.649, sum=1205.833 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "UL2 (20B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_ul2%2Cdata_augmentation%3Dcanonical%2Cstop%3Dhash%2Cglobal_prefix%3Dnlg%22%5D",
          "markdown": false
        },
        {
          "value": 0.7456666666666667,
          "description": "min=0.717, mean=0.746, max=0.762, sum=2.237 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.45980755585445926,
          "description": "min=0.416, mean=0.46, max=0.512, sum=1.379 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.646,
          "description": "min=0.638, mean=0.646, max=0.651, sum=1.938 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6983333333333334,
          "description": "min=0.672, mean=0.698, max=0.714, sum=2.095 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.23015873015873015,
          "description": "min=0.167, mean=0.23, max=0.357, sum=0.69 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.001,
          "description": "min=0.001, mean=0.001, max=0.001, sum=0.003 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.3127442524572212,
          "description": "min=0.292, mean=0.313, max=0.341, sum=0.938 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.5696666666666668,
          "description": "min=0.953, mean=1.57, max=1.978, sum=4.709 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.004,
          "description": "min=0.004, mean=0.004, max=0.004, sum=0.012 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 402.2846666666667,
          "description": "min=386.826, mean=402.285, max=424.449, sum=1206.854 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (175B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_opt-175b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.793,
          "description": "min=0.777, mean=0.793, max=0.813, sum=2.379 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.19360710050007168,
          "description": "min=0.177, mean=0.194, max=0.218, sum=0.581 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.623,
          "description": "min=0.584, mean=0.623, max=0.662, sum=1.869 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.731,
          "description": "min=0.712, mean=0.731, max=0.746, sum=2.193 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.869335141547284,
          "description": "min=0.71, mean=0.869, max=0.954, sum=2.608 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (66B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_opt-66b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7603333333333332,
          "description": "min=0.753, mean=0.76, max=0.764, sum=2.281 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.20047176103986394,
          "description": "min=0.193, mean=0.2, max=0.206, sum=0.601 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6829999999999999,
          "description": "min=0.666, mean=0.683, max=0.701, sum=2.049 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7103333333333333,
          "description": "min=0.696, mean=0.71, max=0.721, sum=2.131 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8336340090708299,
          "description": "min=0.272, mean=0.834, max=1.907, sum=2.501 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.756,
          "description": "min=0.756, mean=0.756, max=0.756, sum=0.756 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.292, mean=0.292, max=0.292, sum=0.292 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.688,
          "description": "min=0.688, mean=0.688, max=0.688, sum=0.688 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.71,
          "description": "min=0.71, mean=0.71, max=0.71, sum=0.71 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (13B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-13b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.714,
          "description": "min=0.714, mean=0.714, max=0.714, sum=0.714 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.078, mean=0.078, max=0.078, sum=0.078 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.67,
          "description": "min=0.67, mean=0.67, max=0.67, sum=0.67 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.666,
          "description": "min=0.666, mean=0.666, max=0.666, sum=0.666 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-30b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.861,
          "description": "min=0.861, mean=0.861, max=0.861, sum=0.861 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.164, mean=0.164, max=0.164, sum=0.164 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.791,
          "description": "min=0.791, mean=0.791, max=0.791, sum=0.791 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.813,
          "description": "min=0.813, mean=0.813, max=0.813, sum=0.813 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "LLaMA (65B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-65b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.871,
          "description": "min=0.871, mean=0.871, max=0.871, sum=0.871 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.84,
          "description": "min=0.84, mean=0.84, max=0.84, sum=0.84 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.847,
          "description": "min=0.847, mean=0.847, max=0.847, sum=0.847 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-2-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.762,
          "description": "min=0.762, mean=0.762, max=0.762, sum=0.762 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.215, mean=0.215, max=0.215, sum=0.215 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.676,
          "description": "min=0.676, mean=0.676, max=0.676, sum=0.676 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.706,
          "description": "min=0.706, mean=0.706, max=0.706, sum=0.706 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.296,
          "description": "min=1.296, mean=1.296, max=1.296, sum=1.296 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (13B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-2-13b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.811,
          "description": "min=0.811, mean=0.811, max=0.811, sum=0.811 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "min=0.116, mean=0.116, max=0.116, sum=0.116 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.753,
          "description": "min=0.753, mean=0.753, max=0.753, sum=0.753 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.732,
          "description": "min=0.732, mean=0.732, max=0.732, sum=0.732 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Llama 2 (70B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmeta_llama-2-70b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.886,
          "description": "min=0.886, mean=0.886, max=0.886, sum=0.886 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.863,
          "description": "min=0.863, mean=0.863, max=0.863, sum=0.863 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.859,
          "description": "min=0.859, mean=0.859, max=0.859, sum=0.859 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Alpaca (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dstanford_alpaca-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.778,
          "description": "min=0.778, mean=0.778, max=0.778, sum=0.778 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.3432802705941571,
          "description": "min=0.343, mean=0.343, max=0.343, sum=0.343 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.643,
          "description": "min=0.643, mean=0.643, max=0.643, sum=0.643 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.729,
          "description": "min=0.729, mean=0.729, max=0.729, sum=0.729 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=0.5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.883,
          "description": "min=4.883, mean=4.883, max=4.883, sum=4.883 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Vicuna v1.3 (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dlmsys_vicuna-7b-v1.3%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.76,
          "description": "min=0.76, mean=0.76, max=0.76, sum=0.76 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.322404542566261,
          "description": "min=0.322, mean=0.322, max=0.322, sum=0.322 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.672,
          "description": "min=0.672, mean=0.672, max=0.672, sum=0.672 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.67,
          "description": "min=0.67, mean=0.67, max=0.67, sum=0.67 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.412,
          "description": "min=4.412, mean=4.412, max=4.412, sum=4.412 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Vicuna v1.3 (13B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dlmsys_vicuna-13b-v1.3%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.808,
          "description": "min=0.808, mean=0.808, max=0.808, sum=0.808 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.15912327464389103,
          "description": "min=0.159, mean=0.159, max=0.159, sum=0.159 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.757,
          "description": "min=0.757, mean=0.757, max=0.757, sum=0.757 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.748,
          "description": "min=0.748, mean=0.748, max=0.748, sum=0.748 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1439.447,
          "description": "min=1439.447, mean=1439.447, max=1439.447, sum=1439.447 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 4.996,
          "description": "min=4.996, mean=4.996, max=4.996, sum=4.996 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Mistral v0.1 (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmistralai_mistral-7b-v0.1%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.874,
          "description": "min=0.874, mean=0.874, max=0.874, sum=0.874 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.837,
          "description": "min=0.837, mean=0.837, max=0.837, sum=0.837 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.842,
          "description": "min=0.842, mean=0.842, max=0.842, sum=0.842 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1418.259,
          "description": "min=1418.259, mean=1418.259, max=1418.259, sum=1418.259 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (530B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmicrosoft_TNLGv2_530B%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8093333333333333,
          "description": "min=0.798, mean=0.809, max=0.829, sum=2.428 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.04811928896988451,
          "description": "min=0.017, mean=0.048, max=0.088, sum=0.144 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7326666666666667,
          "description": "min=0.724, mean=0.733, max=0.747, sum=2.198 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7666666666666667,
          "description": "min=0.756, mean=0.767, max=0.777, sum=2.3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (6.7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmicrosoft_TNLGv2_7B%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6983333333333334,
          "description": "min=0.685, mean=0.698, max=0.709, sum=2.095 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.06514212406382298,
          "description": "min=0.063, mean=0.065, max=0.067, sum=0.195 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.638,
          "description": "min=0.623, mean=0.638, max=0.653, sum=1.914 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6653333333333333,
          "description": "min=0.649, mean=0.665, max=0.674, sum=1.996 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "davinci (175B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_davinci%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7223333333333333,
          "description": "min=0.679, mean=0.722, max=0.77, sum=2.167 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.07164645838795872,
          "description": "min=0.047, mean=0.072, max=0.103, sum=0.215 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.6393333333333334,
          "description": "min=0.592, mean=0.639, max=0.677, sum=1.918 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.682,
          "description": "min=0.635, mean=0.682, max=0.729, sum=2.046 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.21022733463541673,
          "description": "min=0.204, mean=0.21, max=0.217, sum=0.631 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "curie (6.7B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_curie%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6563333333333333,
          "description": "min=0.597, mean=0.656, max=0.704, sum=1.969 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.07881150352718548,
          "description": "min=0.051, mean=0.079, max=0.115, sum=0.236 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.545,
          "description": "min=0.484, mean=0.545, max=0.599, sum=1.635 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.594,
          "description": "min=0.535, mean=0.594, max=0.631, sum=1.782 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09988102712673615,
          "description": "min=0.096, mean=0.1, max=0.104, sum=0.3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "babbage (1.3B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_babbage%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.5743333333333334,
          "description": "min=0.52, mean=0.574, max=0.623, sum=1.723 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06758031979129187,
          "description": "min=0.036, mean=0.068, max=0.089, sum=0.203 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.47700000000000004,
          "description": "min=0.432, mean=0.477, max=0.522, sum=1.431 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.43566666666666665,
          "description": "min=0.404, mean=0.436, max=0.457, sum=1.307 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.12137238953993056,
          "description": "min=0.119, mean=0.121, max=0.125, sum=0.364 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "ada (350M)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_ada%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.5810000000000001,
          "description": "min=0.525, mean=0.581, max=0.627, sum=1.743 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06655133808072823,
          "description": "min=0.049, mean=0.067, max=0.09, sum=0.2 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.461,
          "description": "min=0.349, mean=0.461, max=0.549, sum=1.383 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5066666666666667,
          "description": "min=0.421, mean=0.507, max=0.575, sum=1.52 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14052770182291666,
          "description": "min=0.14, mean=0.141, max=0.141, sum=0.422 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.004,
          "description": "min=1, mean=1.004, max=1.008, sum=3.012 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-davinci-003\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_text-davinci-003%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8813333333333334,
          "description": "min=0.879, mean=0.881, max=0.883, sum=2.644 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09835218401604591,
          "description": "min=0.097, mean=0.098, max=0.099, sum=0.295 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8576666666666667,
          "description": "min=0.851, mean=0.858, max=0.864, sum=2.573 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.858,
          "description": "min=0.854, mean=0.858, max=0.861, sum=2.574 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=1 (2)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0006666666666666666,
          "description": "min=0, mean=0.001, max=0.001, sum=0.002 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.0433333333333332,
          "description": "min=1.036, mean=1.043, max=1.058, sum=3.13 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-davinci-002\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_text-davinci-002%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8769999999999999,
          "description": "min=0.872, mean=0.877, max=0.883, sum=2.631 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.06391934132499137,
          "description": "min=0.057, mean=0.064, max=0.068, sum=0.192 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8410000000000001,
          "description": "min=0.834, mean=0.841, max=0.854, sum=2.523 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.8366666666666666,
          "description": "min=0.829, mean=0.837, max=0.844, sum=2.51 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.1911954346788195,
          "description": "min=0.176, mean=0.191, max=0.216, sum=0.574 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.013,
          "description": "min=1.009, mean=1.013, max=1.018, sum=3.039 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-curie-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_text-curie-001%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.6203333333333334,
          "description": "min=0.591, mean=0.62, max=0.638, sum=1.861 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.252648729019218,
          "description": "min=0.239, mean=0.253, max=0.279, sum=0.758 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5493333333333332,
          "description": "min=0.519, mean=0.549, max=0.566, sum=1.648 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.5756666666666667,
          "description": "min=0.543, mean=0.576, max=0.592, sum=1.727 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14293199392361097,
          "description": "min=0.141, mean=0.143, max=0.146, sum=0.429 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.007,
          "description": "min=1.004, mean=1.007, max=1.012, sum=3.021 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-babbage-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_text-babbage-001%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.451,
          "description": "min=0.414, mean=0.451, max=0.477, sum=1.353 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.34372183455656985,
          "description": "min=0.318, mean=0.344, max=0.371, sum=1.031 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.38366666666666666,
          "description": "min=0.339, mean=0.384, max=0.412, sum=1.151 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.41,
          "description": "min=0.388, mean=0.41, max=0.43, sum=1.23 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.14212787000868074,
          "description": "min=0.136, mean=0.142, max=0.15, sum=0.426 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.004,
          "description": "min=1, mean=1.004, max=1.008, sum=3.012 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "text-ada-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_text-ada-001%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.46399999999999997,
          "description": "min=0.405, mean=0.464, max=0.503, sum=1.392 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.34632807207915267,
          "description": "min=0.257, mean=0.346, max=0.483, sum=1.039 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.33233333333333337,
          "description": "min=0.316, mean=0.332, max=0.362, sum=0.997 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.37799999999999995,
          "description": "min=0.364, mean=0.378, max=0.397, sum=1.134 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "description": "(0)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.09557654231770833,
          "description": "min=0.09, mean=0.096, max=0.103, sum=0.287 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray",
            "font-weight": "bold"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 1.003,
          "description": "min=0.995, mean=1.003, max=1.009, sum=3.009 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)\n\u26a0 Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "style": {
            "color": "gray"
          },
          "markdown": false,
          "contamination_level": "weak"
        }
      ],
      [
        {
          "value": "gpt-3.5-turbo-0301",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_gpt-3.5-turbo-0301%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.74,
          "description": "min=0.74, mean=0.74, max=0.74, sum=0.74 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.66,
          "description": "min=0.66, mean=0.66, max=0.66, sum=0.66 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.666,
          "description": "min=0.666, mean=0.666, max=0.666, sum=0.666 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5,
          "description": "min=0.5, mean=0.5, max=0.5, sum=0.5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1220.329,
          "description": "min=1220.329, mean=1220.329, max=1220.329, sum=1220.329 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.932,
          "description": "min=1.932, mean=1.932, max=1.932, sum=1.932 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "gpt-3.5-turbo-0613",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dopenai_gpt-3.5-turbo-0613%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.87,
          "description": "min=0.87, mean=0.87, max=0.87, sum=0.87 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.845,
          "description": "min=0.845, mean=0.845, max=0.845, sum=0.845 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.817,
          "description": "min=0.817, mean=0.817, max=0.817, sum=0.817 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1220.329,
          "description": "min=1220.329, mean=1220.329, max=1220.329, sum=1220.329 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.057,
          "description": "min=1.057, mean=1.057, max=1.057, sum=1.057 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Base-v1 (3B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_redpajama-incite-base-3b-v1%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.685,
          "description": "min=0.685, mean=0.685, max=0.685, sum=0.685 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1865846445420437,
          "description": "min=0.187, mean=0.187, max=0.187, sum=0.187 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.585,
          "description": "min=0.585, mean=0.585, max=0.585, sum=0.585 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.624,
          "description": "min=0.624, mean=0.624, max=0.624, sum=0.624 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Instruct-v1 (3B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_redpajama-incite-instruct-3b-v1%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.677,
          "description": "min=0.677, mean=0.677, max=0.677, sum=0.677 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14082220350962116,
          "description": "min=0.141, mean=0.141, max=0.141, sum=0.141 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.629,
          "description": "min=0.629, mean=0.629, max=0.629, sum=0.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.648,
          "description": "min=0.648, mean=0.648, max=0.648, sum=0.648 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Base (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_redpajama-incite-base-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.713,
          "description": "min=0.713, mean=0.713, max=0.713, sum=0.713 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1268200294718189,
          "description": "min=0.127, mean=0.127, max=0.127, sum=0.127 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.569,
          "description": "min=0.569, mean=0.569, max=0.569, sum=0.569 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.65,
          "description": "min=0.65, mean=0.65, max=0.65, sum=0.65 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "RedPajama-INCITE-Instruct (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_redpajama-incite-instruct-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.705,
          "description": "min=0.705, mean=0.705, max=0.705, sum=0.705 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.034644312737608846,
          "description": "min=0.035, mean=0.035, max=0.035, sum=0.035 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.599,
          "description": "min=0.599, mean=0.599, max=0.599, sum=0.599 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.616,
          "description": "min=0.616, mean=0.616, max=0.616, sum=0.616 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "MPT (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmosaicml_mpt-30b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.704,
          "description": "min=0.704, mean=0.704, max=0.704, sum=0.704 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.656,
          "description": "min=0.656, mean=0.656, max=0.656, sum=0.656 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.631,
          "description": "min=0.631, mean=0.631, max=0.631, sum=0.631 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "MPT-Instruct (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dmosaicml_mpt-instruct-30b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.85,
          "description": "min=0.85, mean=0.85, max=0.85, sum=0.85 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.77,
          "description": "min=0.77, mean=0.77, max=0.77, sum=0.77 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.807,
          "description": "min=0.807, mean=0.807, max=0.807, sum=0.807 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1251.897,
          "description": "min=1251.897, mean=1251.897, max=1251.897, sum=1251.897 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtiiuae_falcon-7b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.753,
          "description": "min=0.753, mean=0.753, max=0.753, sum=0.753 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.65,
          "description": "min=0.65, mean=0.65, max=0.65, sum=0.65 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.702,
          "description": "min=0.702, mean=0.702, max=0.702, sum=0.702 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon-Instruct (7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtiiuae_falcon-7b-instruct%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.72,
          "description": "min=0.72, mean=0.72, max=0.72, sum=0.72 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.593,
          "description": "min=0.593, mean=0.593, max=0.593, sum=0.593 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.637,
          "description": "min=0.637, mean=0.637, max=0.637, sum=0.637 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon (40B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtiiuae_falcon-40b%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.819,
          "description": "min=0.819, mean=0.819, max=0.819, sum=0.819 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.763,
          "description": "min=0.763, mean=0.763, max=0.763, sum=0.763 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.783,
          "description": "min=0.783, mean=0.783, max=0.783, sum=0.783 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Falcon-Instruct (40B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtiiuae_falcon-40b-instruct%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.829,
          "description": "min=0.829, mean=0.829, max=0.829, sum=0.829 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.781,
          "description": "min=0.781, mean=0.781, max=0.781, sum=0.781 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.799,
          "description": "min=0.799, mean=0.799, max=0.799, sum=0.799 (1)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=1000 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=5 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1284.629,
          "description": "min=1284.629, mean=1284.629, max=1284.629, sum=1284.629 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=1 (1)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "GLM (130B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_glm%2Cdata_augmentation%3Dcanonical%2Cstop%3Dhash%22%5D",
          "markdown": false
        },
        {
          "value": 0.7836666666666666,
          "description": "min=0.729, mean=0.784, max=0.819, sum=2.351 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.1710477879835662,
          "description": "min=0.111, mean=0.171, max=0.205, sum=0.513 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.7276666666666668,
          "description": "min=0.68, mean=0.728, max=0.758, sum=2.183 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6896666666666667,
          "description": "min=0.625, mean=0.69, max=0.722, sum=2.069 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 1.1913305165274586,
          "description": "min=0.942, mean=1.191, max=1.332, sum=3.574 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 931.4243333333333,
          "description": "min=679.091, mean=931.424, max=1276.091, sum=2794.273 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 2.0,
          "description": "min=2, mean=2, max=2, sum=6 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "InstructPalmyra (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dwriter_palmyra-instruct-30%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.7513333333333333,
          "description": "min=0.698, mean=0.751, max=0.798, sum=2.254 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.6556666666666667,
          "description": "min=0.564, mean=0.656, max=0.719, sum=1.967 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.6996666666666668,
          "description": "min=0.636, mean=0.7, max=0.762, sum=2.099 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.0,
          "description": "min=1, mean=1, max=1, sum=3 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "Palmyra X (43B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dwriter_palmyra-x%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.8963333333333333,
          "description": "min=0.894, mean=0.896, max=0.898, sum=2.689 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 0.878,
          "description": "min=0.875, mean=0.878, max=0.88, sum=2.634 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.875,
          "description": "min=0.872, mean=0.875, max=0.878, sum=2.625 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "description": "1 matching runs, but no matching metrics",
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 908.4063333333334,
          "description": "min=660.073, mean=908.406, max=1242.073, sum=2725.219 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1.007,
          "description": "min=1.005, mean=1.007, max=1.01, sum=3.021 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ],
      [
        {
          "value": "YaLM (100B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20False&runSpecs=%5B%22boolq%3Amodel%3Dtogether_yalm%2Cdata_augmentation%3Dcanonical%22%5D",
          "markdown": false
        },
        {
          "value": 0.634,
          "description": "min=0.631, mean=0.634, max=0.64, sum=1.902 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.14717484078898194,
          "description": "min=0.114, mean=0.147, max=0.167, sum=0.442 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.566,
          "description": "min=0.437, mean=0.566, max=0.631, sum=1.698 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.5826666666666667,
          "description": "min=0.486, mean=0.583, max=0.631, sum=1.748 (3)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "description": "(0)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {
            "font-weight": "bold"
          },
          "markdown": false
        },
        {
          "value": 0.8282727491158176,
          "description": "min=0.546, mean=0.828, max=1.136, sum=2.485 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 1000.0,
          "description": "min=1000, mean=1000, max=1000, sum=3000 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 0.0,
          "description": "min=0, mean=0, max=0, sum=0 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 899.0056666666666,
          "description": "min=649.339, mean=899.006, max=1233.339, sum=2697.017 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 5.0,
          "description": "min=5, mean=5, max=5, sum=15 (3)",
          "style": {},
          "markdown": false
        },
        {
          "value": 3.0,
          "description": "min=3, mean=3, max=3, sum=9 (3)",
          "style": {},
          "markdown": false
        }
      ]
    ],
    "links": [
      {
        "text": "LaTeX",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/latex/boolq_boolq_only_contrast:False.tex"
      },
      {
        "text": "JSON",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/json/boolq_boolq_only_contrast:False.json"
      }
    ],
    "name": "boolq_only_contrast:False"
  },
  {
    "title": "only_contrast: True",
    "header": [
      {
        "value": "Model/adapter",
        "markdown": false,
        "metadata": {}
      },
      {
        "value": "EM",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "ECE (10-bin)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n10-bin expected calibration error: The average difference between the model's confidence and accuracy, averaged across 10 bins where each bin contains an equal number of points (only computed for classification tasks). Warning - not reliable for small datasets (e.g., with < 300 examples) because each bin will have very few examples.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "ECE (10-bin)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "EM (Robustness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Robustness: Computes worst case over different robustness perturbations (misspellings, formatting, contrast sets).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Robustness"
        }
      },
      {
        "value": "EM (Fairness)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nQuasi-exact match: Fraction of instances that the predicted output matches a correct reference up to light processing.\n- Perturbation Fairness: Computes worst case over different fairness perturbations (changing dialect, race of names, gender).",
        "markdown": false,
        "lower_is_better": false,
        "metadata": {
          "metric": "EM",
          "run_group": "BoolQ",
          "perturbation": "Fairness"
        }
      },
      {
        "value": "Stereotypes (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (race, profession): Measures uneven association of racial groups (Asian, Hispanic, White) with target professions. This measurement is based on cooccurence statistics between the racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Stereotypes (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nStereotypical associations (gender, profession): Measures uneven association of gender groups (male, female) with target professions. This measurement is based on cooccurence statistics between the gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)) and the target professions (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Stereotypes (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (race)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (race): Measures uneven representation of racial groups (Asian, Hispanic, White). This measurement is based on disparities in the frequency statistics across racially-associated names (word list from [Garg et al., 2018](https://www.pnas.org/doi/10.1073/pnas.1720347115); race associations based on US Census statistics).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (race)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Representation (gender)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDemographic representation (gender): Measures uneven representation of gender groups (male, female). This measurement is based on disparities in the frequency statistics across gender terms (word list from [Bolukbasi et al., 2016](https://papers.nips.cc/paper/2016/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Representation (gender)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Toxic fraction",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nToxic fraction: Fraction of model outputs that are toxic (based on the PerspectiveAPI toxicity classifier).",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Toxic fraction",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "Denoised inference time (s)",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\nDenoised inference runtime (s): Average time to process a request to the model minus performance contention by using profiled runtimes from multiple trials of SyntheticEfficiencyScenario.",
        "markdown": false,
        "lower_is_better": true,
        "metadata": {
          "metric": "Denoised inference time (s)",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# eval",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# eval: Number of evaluation instances.",
        "markdown": false,
        "metadata": {
          "metric": "# eval",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# train",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# train: Number of training instances (e.g., in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "# train",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "truncated",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\ntruncated: Fraction of instances where the prompt itself was truncated (implies that there were no in-context examples).",
        "markdown": false,
        "metadata": {
          "metric": "truncated",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# prompt tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# prompt tokens: Number of tokens in the prompt.",
        "markdown": false,
        "metadata": {
          "metric": "# prompt tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# output tokens",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# output tokens: Actual number of output tokens.",
        "markdown": false,
        "metadata": {
          "metric": "# output tokens",
          "run_group": "BoolQ"
        }
      },
      {
        "value": "# trials",
        "description": "The BoolQ benchmark for binary (yes/no) question answering [(Clark et al., 2019)](https://aclanthology.org/N19-1300/).\n\n# trials: Number of trials, where in each trial we choose an independent, random set of training instances.",
        "markdown": false,
        "metadata": {
          "metric": "# trials",
          "run_group": "BoolQ"
        }
      }
    ],
    "rows": [
      [
        {
          "value": "J1-Jumbo v1 (178B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Large v1 (7.5B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v1 (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "J1-Grande v2 beta (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Jumbo (178B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Grande (17B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Jurassic-2 Large (7.5B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Base (13B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Extended (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Luminous Supreme (70B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Anthropic-LM v4-s3 (52B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "BLOOM (176B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "T0pp (11B)\u2620",
          "description": "T0++ is explicitly trained on these datasets, i.e. data from the same distribution as the test set. See Table 5 on page 24 of https://arxiv.org/pdf/2110.08207.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere xlarge v20220609 (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere large v20220720 (13.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20220720 (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere small v20220720 (410M)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere xlarge v20221108 (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere medium v20221108 (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (6.1B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Cohere Command beta (52.4B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-J (6B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "GPT-NeoX (20B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "T5 (11B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "UL2 (20B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (175B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "OPT (66B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (530B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "TNLG v2 (6.7B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "davinci (175B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "curie (6.7B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "babbage (1.3B)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "ada (350M)\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "text-davinci-003\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "text-davinci-002\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "text-curie-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "text-babbage-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "text-ada-001\u26a0",
          "description": "Brown et al. perform an analysis of the contamination for GPT-3 and its known derivatives. For these datasets, they find that 1% - 6% of the datasets' test instances are contaminated based on N-gram overlap, and model performance does not substantially change for these datasets. See Table C.1 on page 45 of https://arxiv.org/pdf/2005.14165.pdf.",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "GLM (130B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "InstructPalmyra (30B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "Palmyra X (43B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ],
      [
        {
          "value": "YaLM (100B)",
          "description": "",
          "href": "?group=boolq&subgroup=only_contrast%3A%20True&runSpecs=%5B%5D",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        },
        {
          "description": "No matching runs",
          "markdown": false
        }
      ]
    ],
    "links": [
      {
        "text": "LaTeX",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/latex/boolq_boolq_only_contrast:True.tex"
      },
      {
        "text": "JSON",
        "href": "/nlp/scr4/nlp/crfm/yifanmai/helm-release/benchmark_output/releases/v0.4.0/groups/json/boolq_boolq_only_contrast:True.json"
      }
    ],
    "name": "boolq_only_contrast:True"
  }
]