FuzzBench: sbft-standard-cov-01-16 report

experiment summary

We show two different aggregate (cross-benchmark) rankings of fuzzers. The first is based on the average of per-benchmarks scores, where the score represents the percentage of the highest reached median code-coverage on a given benchmark (higher value is better). The second ranking shows the average rank of fuzzers, after we rank them on each benchmark according to their median reached code-covereges (lower value is better).
By avg. score
average normalized score
fuzzer
libafl 97.05
mystique 96.85
fox 95.31
libfuzzer 94.11
bandfuzz 88.40
tunefuzz 86.61
pastis 60.83
By avg. rank
average rank
fuzzer
bandfuzz 2.89
tunefuzz 3.44
fox 3.56
libfuzzer 3.89
libafl 4.22
pastis 4.67
mystique 4.89
  • Critical difference diagram
    The diagram visualizes the average rank of fuzzers (second ranking above) while showing the significance of the differences as well. What is considered a "critical difference" (CD) is based on the Friedman/Nemenyi post-hoc test. See more in the documentation.
    Note: If a fuzzer does not support all benchmarks, its ranking as shown in this diagram can be lower than it should be. So please check the list of supported benchmarks for the fuzzer(s) of your interest. The list could be specified in the fuzzer's README.md like this.
  • Median relative code-coverages on each benchmark

    Note: The relative coverage summary table shows the median relative performance of each fuzzer to the experiment maximum. Thus the highest relative performance may not be 100%.
    trial_relative_coverage = trial_coverage / experiment_max_coverage

      bandfuzz libafl tunefuzz mystique fox libfuzzer pastis
    FuzzerMedian 97.50 95.00 95.00 95.00 97.00 94.00 95.00
    FuzzerMean 97.00 94.44 94.38 94.22 92.89 91.44 88.00
    freetype2_ftfuzzer 97.00 84.00 86.00 85.00 80.00 71.00 nan
    jsoncpp_jsoncpp_fuzzer 99.00 98.00 nan 98.00 98.00 100.00 nan
    lcms_cms_transform_fuzzer 93.00 93.00 93.00 91.00 75.00 88.00 91.00
    libxml2_xml 98.00 98.00 98.00 98.00 99.00 96.00 nan
    libxslt_xpath 98.00 96.00 98.00 96.00 99.00 90.00 51.00
    re2_fuzzer 99.00 98.00 99.00 98.00 99.00 99.00 99.00
    stb_stbi_read_fuzzer nan 95.00 91.00 94.00 95.00 87.00 96.00
    vorbis_decode_fuzzer 95.00 93.00 94.00 93.00 94.00 94.00 94.00
    zlib_zlib_uncompress_fuzzer 97.00 95.00 96.00 95.00 97.00 98.00 97.00
    • Fuzzers are sorted by "FuzzerMean" (average median relative coverage), highest on the left.
    • Green background = highest relative median coverage.
    • Blue gradient background = greater than 95% relative median coverage.

freetype2_ftfuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    bandfuzz 82800 17.0 12860.764706 438.742454 11722.0 12737.00 13032.0 13128.00 13341.0
    tunefuzz 82800 19.0 11537.263158 226.400050 10862.0 11441.00 11573.0 11698.00 11820.0
    mystique 82800 17.0 11292.000000 500.419449 10320.0 10950.00 11376.0 11499.00 12186.0
    libafl 82800 16.0 11325.187500 491.332639 10607.0 11027.75 11208.5 11544.00 12358.0
    fox 82800 18.0 10660.944444 353.284333 9986.0 10420.25 10759.5 10870.50 11180.0
    libfuzzer 82800 18.0 9517.722222 482.977141 8573.0 9305.00 9567.0 9861.75 10293.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

jsoncpp_jsoncpp_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: libfuzzer.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libfuzzer 82800 14.0 525.000000 0.000000 525.0 525.0 525.0 525.0 525.0
    bandfuzz 82800 17.0 523.235294 1.300452 522.0 522.0 523.0 525.0 525.0
    libafl 82800 19.0 516.736842 0.561951 516.0 516.0 517.0 517.0 518.0
    mystique 82800 19.0 517.526316 1.172292 515.0 517.0 517.0 519.0 519.0
    fox 82800 18.0 516.055556 1.392088 514.0 515.0 516.0 517.0 519.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

lcms_cms_transform_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: libafl.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 82800 11.0 2093.636364 35.370249 2017.0 2076.00 2101.0 2118.00 2139.0
    tunefuzz 82800 16.0 1955.062500 222.618199 1497.0 1854.50 2097.5 2104.50 2136.0
    bandfuzz 82800 18.0 1994.444444 259.902294 1477.0 1954.50 2086.5 2188.25 2228.0
    pastis 82800 16.0 2003.687500 174.877374 1613.0 2004.00 2048.0 2119.25 2200.0
    mystique 82800 18.0 2028.111111 153.651588 1613.0 1977.25 2047.0 2144.25 2243.0
    libfuzzer 82800 17.0 1975.647059 94.924932 1740.0 1927.00 1977.0 2041.00 2119.0
    fox 82800 19.0 1673.210526 122.238646 1506.0 1543.50 1689.0 1763.00 1882.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

libxml2_xml summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: bandfuzz.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    fox 82800 18.0 15791.111111 56.204342 15683.0 15764.25 15802.0 15820.75 15895.0
    tunefuzz 82800 19.0 15715.789474 53.856372 15644.0 15668.50 15708.0 15762.50 15801.0
    bandfuzz 82800 14.0 15706.000000 31.879581 15664.0 15687.25 15691.5 15728.50 15770.0
    libafl 82800 18.0 15625.444444 40.568348 15557.0 15598.25 15621.5 15658.50 15690.0
    mystique 82800 19.0 15609.157895 40.340582 15524.0 15586.50 15614.0 15637.50 15700.0
    libfuzzer 82800 17.0 15391.764706 91.580927 15250.0 15328.00 15386.0 15444.00 15538.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

libxslt_xpath summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: mystique, fox, libafl.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    fox 82800 14.0 11236.500000 171.399107 10858.0 11138.00 11306.0 11368.00 11385.0
    bandfuzz 82800 16.0 11201.125000 46.831435 11130.0 11169.50 11203.5 11226.50 11324.0
    tunefuzz 82800 19.0 11150.789474 142.346439 10858.0 11084.50 11160.0 11279.00 11323.0
    mystique 82800 15.0 10992.600000 51.597065 10899.0 10964.00 10999.0 11035.00 11076.0
    libafl 82800 14.0 10957.071429 74.353071 10811.0 10930.00 10968.0 11008.75 11078.0
    libfuzzer 82800 17.0 10229.764706 545.090306 9120.0 10188.00 10359.0 10649.00 10860.0
    pastis 82800 18.0 5892.888889 9.448865 5885.0 5887.25 5891.0 5893.00 5924.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

re2_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libfuzzer 82800 19.0 2883.894737 2.258188 2879.0 2882.00 2884.0 2885.50 2888.0
    tunefuzz 82800 16.0 2877.062500 4.073798 2866.0 2876.00 2877.0 2880.25 2882.0
    fox 82800 19.0 2868.526316 8.335789 2852.0 2863.50 2872.0 2874.00 2876.0
    bandfuzz 82800 16.0 2866.437500 8.406099 2846.0 2863.75 2867.5 2871.75 2877.0
    pastis 82800 18.0 2863.666667 5.201810 2856.0 2860.25 2862.0 2866.50 2874.0
    libafl 82800 19.0 2858.105263 6.270697 2843.0 2855.00 2859.0 2862.50 2868.0
    mystique 82800 20.0 2851.500000 6.947169 2841.0 2846.75 2850.5 2858.50 2863.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

stb_stbi_read_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: tunefuzz.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    pastis 82800 17.0 2223.352941 54.026546 2122.0 2217.0 2221.0 2226.00 2308.0
    fox 82800 19.0 2163.578947 47.354826 2088.0 2116.5 2197.0 2200.00 2216.0
    libafl 82800 18.0 2193.777778 46.609208 2094.0 2191.0 2194.0 2199.00 2287.0
    mystique 82800 18.0 2179.944444 50.041715 2100.0 2148.5 2191.5 2198.25 2265.0
    tunefuzz 82800 15.0 2131.200000 52.300232 2016.0 2112.5 2116.0 2149.00 2212.0
    libfuzzer 82800 18.0 2026.555556 28.203509 1985.0 2011.0 2019.0 2032.25 2102.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

vorbis_decode_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: bandfuzz, pastis.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    bandfuzz 82800 14.0 1281.785714 24.268303 1267.0 1270.25 1273.5 1275.50 1339.0
    libfuzzer 82800 16.0 1268.187500 1.869715 1265.0 1267.00 1268.0 1269.25 1271.0
    pastis 82800 5.0 1264.400000 2.607681 1262.0 1262.00 1264.0 1266.00 1268.0
    tunefuzz 82800 20.0 1256.800000 22.056387 1193.0 1260.00 1263.0 1265.00 1271.0
    fox 82800 19.0 1261.473684 3.220911 1256.0 1259.50 1262.0 1264.00 1266.0
    libafl 82800 16.0 1253.687500 3.219084 1246.0 1252.75 1253.5 1254.50 1260.0
    mystique 82800 16.0 1250.812500 3.370831 1247.0 1248.75 1249.5 1252.25 1259.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

zlib_zlib_uncompress_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: pastis.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libfuzzer 82800 18.0 466.888889 4.561848 462.0 463.0 464.5 472.00 472.0
    pastis 82800 14.0 460.714286 1.898525 457.0 460.0 461.0 462.00 463.0
    bandfuzz 82800 18.0 459.277778 2.244092 456.0 457.5 459.5 461.00 462.0
    fox 82800 18.0 460.611111 4.803661 453.0 457.5 459.0 461.00 469.0
    tunefuzz 82800 19.0 450.894737 17.002924 411.0 455.5 457.0 459.00 462.0
    libafl 82800 16.0 449.125000 4.702836 442.0 446.5 449.0 450.25 460.0
    mystique 82800 18.0 450.777778 4.845320 444.0 448.0 449.0 452.00 461.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

experiment data

You can download the raw data for this report here.

Check out the documentation on how to create customized reports using this data. Also see some example Colab notebooks for doing custom analysis on the data here.

Experiment Description:

(None,)