FuzzBench: 2024-05-13-new-bug report

(experiment incomplete/still running...)

experiment summary

We show two different aggregate (cross-benchmark) rankings of fuzzers. The first is based on the average of per-benchmarks scores, where the score represents the percentage of the highest reached median bug-coverage on a given benchmark (higher value is better). The second ranking shows the average rank of fuzzers, after we rank them on each benchmark according to their median reached bug-covereges (lower value is better).
By avg. score
average normalized score
fuzzer
libfuzzer 70.0
aflsmart 60.0
honggfuzz 60.0
centipede 40.0
mopt 40.0
afl 20.0
libafl 20.0
By avg. rank
average rank
fuzzer
libfuzzer 1.14
aflsmart 1.21
honggfuzz 1.21
mopt 1.36
afl 1.71
centipede 1.71
libafl 1.71
  • Critical difference diagram
    The diagram visualizes the average rank of fuzzers (second ranking above) while showing the significance of the differences as well. What is considered a "critical difference" (CD) is based on the Friedman/Nemenyi post-hoc test. See more in the documentation.
    Note: If a fuzzer does not support all benchmarks, its ranking as shown in this diagram can be lower than it should be. So please check the list of supported benchmarks for the fuzzer(s) of your interest. The list could be specified in the fuzzer's README.md like this.
  • Median relative code-coverages on each benchmark

    Note: The relative coverage summary table shows the median relative performance of each fuzzer to the experiment maximum. Thus the highest relative performance may not be 100%.
    trial_relative_coverage = trial_coverage / experiment_max_coverage

      aflsmart honggfuzz libfuzzer mopt libafl afl centipede
    FuzzerMedian 89.00 82.00 80.00 86.00 85.00 92.00 81.00
    FuzzerMean 80.00 78.45 75.86 74.29 73.82 69.00 68.27
    arrow_arrow-ipc-stream-fuzz_1a34a0 nan nan 81.00 91.00 90.00 92.00 90.00
    aspell_aspell_fuzzer_e8eb74 nan 88.00 84.00 86.00 86.00 nan 86.00
    assimp_assimp_fuzzer_4d451f nan 32.00 49.00 18.00 0.00 nan 15.00
    bloaty_fuzz_target_52948c nan 82.00 79.00 82.00 83.00 nan nan
    ffmpeg_ffmpeg_demuxer_fuzzer_7adeef 36.00 65.00 35.00 34.00 nan 37.00 32.00
    file_magic_fuzzer_2d5f85 92.00 nan 92.00 91.00 93.00 nan 89.00
    grok_grk_decompress_fuzzer_9cd001 94.00 95.00 95.00 95.00 89.00 94.00 nan
    harfbuzz_hb-shape-fuzzer_17863b 88.00 90.00 85.00 86.00 nan nan 81.00
    lcms_cms_transform_all_fuzzer_97d37d nan nan 55.00 29.00 69.00 28.00 20.00
    libaom_av1_dec_fuzzer_6e1848 nan 96.00 92.00 93.00 92.00 nan 92.00
    libpcap_fuzz_filter_98b0a2 80.00 76.00 78.00 78.00 77.00 nan 72.00
    libxml2_xml_e85b9b nan 79.00 71.00 75.00 85.00 nan 76.00
    php_php-fuzz-parser_0dbedb nan 95.00 94.00 94.00 nan 94.00 98.00
    systemd_fuzz-network-parser_288baf 90.00 65.00 72.00 88.00 48.00 nan nan
    • Fuzzers are sorted by "FuzzerMean" (average median relative coverage), highest on the left.
    • Green background = highest relative median coverage.
    • Blue gradient background = greater than 95% relative median coverage.
  • Median relative bug-coverages on each benchmark

    Note: The relative coverage summary table shows the median relative performance of each fuzzer to the experiment maximum. Thus the highest relative performance may not be 100%.
    trial_relative_coverage = trial_coverage / experiment_max_coverage

      aflsmart honggfuzz centipede libfuzzer afl mopt libafl
    FuzzerMedian 25.00 0.00 0.00 0.00 0.00 0.00 0.00
    FuzzerMean 25.00 17.73 13.64 11.43 10.00 7.14 4.55
    arrow_arrow-ipc-stream-fuzz_1a34a0 nan nan 0.00 0.00 0.00 0.00 0.00
    aspell_aspell_fuzzer_e8eb74 nan 0.00 0.00 0.00 nan 0.00 0.00
    assimp_assimp_fuzzer_4d451f nan 20.00 0.00 10.00 nan 0.00 0.00
    bloaty_fuzz_target_52948c nan 0.00 nan 0.00 nan 0.00 0.00
    ffmpeg_ffmpeg_demuxer_fuzzer_7adeef 0.00 50.00 0.00 0.00 0.00 0.00 nan
    file_magic_fuzzer_2d5f85 50.00 nan 0.00 50.00 nan 0.00 0.00
    grok_grk_decompress_fuzzer_9cd001 50.00 50.00 nan 50.00 50.00 50.00 50.00
    harfbuzz_hb-shape-fuzzer_17863b 50.00 50.00 50.00 50.00 nan 50.00 nan
    lcms_cms_transform_all_fuzzer_97d37d nan nan 0.00 0.00 0.00 0.00 0.00
    libaom_av1_dec_fuzzer_6e1848 nan 25.00 0.00 0.00 nan 0.00 0.00
    libpcap_fuzz_filter_98b0a2 0.00 0.00 0.00 0.00 nan 0.00 0.00
    libxml2_xml_e85b9b nan 0.00 0.00 0.00 nan 0.00 0.00
    php_php-fuzz-parser_0dbedb nan 0.00 100.00 0.00 0.00 0.00 nan
    systemd_fuzz-network-parser_288baf 0.00 0.00 nan 0.00 nan 0.00 0.00
    • Fuzzers are sorted by "FuzzerMean" (average median relative coverage), highest on the left.
    • Green background = highest relative median coverage.
    • Blue gradient background = greater than 95% relative median coverage.
  • Total unique bugs found on each benchmark
      Total honggfuzz libfuzzer mopt libafl centipede aflsmart afl aflplusplus
    FuzzerSum 167 99 76 17 12 10 7 4 2
    arrow_arrow-ipc-stream-fuzz_1a34a0 0 nan 0 0 0 0 0 0 0
    aspell_aspell_fuzzer_e8eb74 2 1 1 2 2 2 0 0 0
    assimp_assimp_fuzzer_4d451f 133 80 61 0 0 0 0 0 0
    bloaty_fuzz_target_52948c 1 1 1 1 1 0 0 0 0
    ffmpeg_ffmpeg_demuxer_fuzzer_7adeef 2 2 0 0 nan 0 0 0 0
    file_magic_fuzzer_2d5f85 2 nan 2 1 1 1 1 0 0
    grok_grk_decompress_fuzzer_9cd001 3 2 2 2 2 nan 2 3 2
    harfbuzz_hb-shape-fuzzer_17863b 6 4 6 4 nan 3 4 0 0
    lcms_cms_transform_all_fuzzer_97d37d 5 0 0 0 5 0 0 0 0
    libaom_av1_dec_fuzzer_6e1848 8 8 0 3 0 0 0 0 0
    libpcap_fuzz_filter_98b0a2 0 0 0 0 0 0 0 0 0
    libxml2_xml_e85b9b 3 1 2 3 1 2 0 0 0
    php_php-fuzz-parser_0dbedb 2 0 1 1 nan 2 0 1 0
    systemd_fuzz-network-parser_288baf 0 0 0 0 0 nan 0 0 0
    • Fuzzers are sorted by "FuzzerSum", highest on the left.
    • Green background = most unique bugs found.
    • *note: This table represents unique bugs found across all trials.

arrow_arrow-ipc-stream-fuzz_1a34a0 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    afl 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    centipede 2700 1.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
    libafl 2700 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    libfuzzer 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    mopt 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    afl 2700 20.0 2213.20 80.282790 1931.0 2198.50 2221.5 2245.00 2316.0
    mopt 2700 20.0 2167.25 106.337039 1890.0 2103.25 2210.0 2234.25 2325.0
    libafl 2700 10.0 2162.90 75.637219 2033.0 2140.00 2182.5 2216.00 2246.0
    centipede 2700 1.0 2167.00 NaN 2167.0 2167.00 2167.0 2167.00 2167.0
    libfuzzer 2700 20.0 2000.85 80.029781 1905.0 1954.50 1963.0 2063.00 2152.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

aspell_aspell_fuzzer_e8eb74 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    centipede 900 15.0 0.333333 0.487950 0.0 0.0 0.0 1.00 1.0
    honggfuzz 900 18.0 0.055556 0.235702 0.0 0.0 0.0 0.00 1.0
    libafl 900 20.0 0.100000 0.307794 0.0 0.0 0.0 0.00 1.0
    libfuzzer 900 20.0 0.100000 0.307794 0.0 0.0 0.0 0.00 1.0
    mopt 900 20.0 0.250000 0.444262 0.0 0.0 0.0 0.25 1.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 900 18.0 3043.111111 15.354206 3023.0 3031.25 3039.0 3051.75 3074.0
    centipede 900 15.0 2910.800000 183.481879 2624.0 2781.00 2971.0 3047.50 3072.0
    mopt 900 20.0 2939.050000 128.946939 2624.0 2895.75 2962.5 2989.50 3113.0
    libafl 900 20.0 2946.000000 121.225757 2624.0 2941.00 2958.5 2985.25 3125.0
    libfuzzer 900 20.0 2867.500000 87.047779 2624.0 2868.50 2886.0 2906.25 2947.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

assimp_assimp_fuzzer_4d451f summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 900 20.0 1.70 0.732695 1.0 1.0 2.0 2.00 3.0
    libfuzzer 900 20.0 1.35 0.670820 1.0 1.0 1.0 1.25 3.0
    centipede 900 11.0 0.00 0.000000 0.0 0.0 0.0 0.00 0.0
    libafl 900 20.0 0.00 0.000000 0.0 0.0 0.0 0.00 0.0
    mopt 900 20.0 0.00 0.000000 0.0 0.0 0.0 0.00 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libfuzzer 900 20.0 2196.100000 51.647489 2119.0 2158.75 2196.0 2226.0 2314.0
    honggfuzz 900 20.0 1415.600000 109.257494 1188.0 1348.25 1419.5 1483.5 1589.0
    mopt 900 20.0 796.250000 31.222251 748.0 774.25 802.0 819.0 858.0
    centipede 900 11.0 668.454545 22.805103 640.0 649.00 669.0 685.5 710.0
    libafl 900 20.0 0.000000 0.000000 0.0 0.00 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

bloaty_fuzz_target_52948c summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 900 20.0 0.00 0.000000 0.0 0.0 0.0 0.0 0.0
    libafl 900 20.0 0.00 0.000000 0.0 0.0 0.0 0.0 0.0
    libfuzzer 900 20.0 0.00 0.000000 0.0 0.0 0.0 0.0 0.0
    mopt 900 20.0 0.05 0.223607 0.0 0.0 0.0 0.0 1.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 900 20.0 4952.00 75.432089 4730.0 4936.50 4961.0 5000.75 5066.0
    honggfuzz 900 20.0 4908.45 39.901820 4822.0 4879.00 4913.0 4938.25 4964.0
    mopt 900 20.0 4848.20 153.489448 4364.0 4821.25 4906.0 4934.75 5005.0
    libfuzzer 900 20.0 4733.10 65.846632 4583.0 4701.00 4727.5 4760.25 4851.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

ffmpeg_ffmpeg_demuxer_fuzzer_7adeef summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 900 20.0 0.5 0.512989 0.0 0.0 0.5 1.0 1.0
    afl 900 1.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
    aflsmart 900 10.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0
    centipede 900 14.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0
    libfuzzer 900 20.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0
    mopt 900 20.0 0.0 0.000000 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 900 20.0 9635.400000 2303.693200 0.0 10010.25 10178.0 10343.75 10703.0
    afl 900 1.0 5786.000000 NaN 5786.0 5786.00 5786.0 5786.00 5786.0
    aflsmart 900 10.0 5750.100000 313.999098 5340.0 5534.25 5691.0 5904.50 6315.0
    libfuzzer 900 20.0 5589.100000 215.565084 5280.0 5461.00 5576.5 5629.75 6191.0
    mopt 900 20.0 5428.750000 310.256409 4917.0 5247.50 5389.0 5585.25 6371.0
    centipede 900 14.0 5205.357143 346.870760 4745.0 4988.00 5096.5 5278.00 6044.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

file_magic_fuzzer_2d5f85 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 900 6.0 0.666667 0.516398 0.0 0.25 1.0 1.0 1.0
    libfuzzer 900 20.0 1.100000 0.307794 1.0 1.00 1.0 1.0 2.0
    centipede 900 12.0 0.000000 0.000000 0.0 0.00 0.0 0.0 0.0
    libafl 900 20.0 0.150000 0.366348 0.0 0.00 0.0 0.0 1.0
    mopt 900 20.0 0.350000 0.489360 0.0 0.00 0.0 1.0 1.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 900 20.0 2277.050000 21.777523 2230.0 2264.25 2276.0 2288.25 2338.0
    aflsmart 900 6.0 2190.333333 180.354835 1823.0 2247.50 2262.5 2270.00 2279.0
    libfuzzer 900 20.0 2250.400000 10.738029 2230.0 2247.50 2251.0 2257.00 2268.0
    mopt 900 20.0 2223.350000 95.247752 1823.0 2229.75 2245.0 2253.25 2268.0
    centipede 900 12.0 2185.500000 23.887996 2148.0 2165.00 2187.5 2204.75 2217.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

grok_grk_decompress_fuzzer_9cd001 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    afl 29700 6.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
    aflsmart 29700 8.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
    honggfuzz 29700 7.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
    libafl 29700 15.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
    libfuzzer 29700 17.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0
    mopt 29700 16.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    mopt 29700 16.0 5790.125000 47.788248 5666.0 5770.00 5803.5 5824.00 5844.0
    libfuzzer 29700 17.0 5764.647059 66.976620 5532.0 5764.00 5779.0 5791.00 5845.0
    honggfuzz 29700 7.0 5762.428571 30.983098 5713.0 5744.00 5767.0 5785.00 5799.0
    aflsmart 29700 8.0 5757.375000 55.659520 5671.0 5730.75 5763.5 5791.25 5831.0
    afl 29700 6.0 5779.166667 65.743187 5711.0 5736.75 5759.5 5811.50 5887.0
    libafl 29700 15.0 5472.466667 361.768052 4910.0 5164.50 5452.0 5837.00 5914.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

harfbuzz_hb-shape-fuzzer_17863b summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 3600 5.0 1.000000 0.707107 0.0 1.0 1.0 1.0 2.0
    centipede 3600 11.0 0.545455 0.522233 0.0 0.0 1.0 1.0 1.0
    honggfuzz 3600 14.0 1.000000 0.679366 0.0 1.0 1.0 1.0 2.0
    libfuzzer 3600 20.0 1.200000 0.410391 1.0 1.0 1.0 1.0 2.0
    mopt 3600 20.0 0.850000 0.745160 0.0 0.0 1.0 1.0 2.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 3600 14.0 9294.285714 61.748159 9204.0 9237.00 9302.5 9336.75 9398.0
    aflsmart 3600 5.0 9111.200000 164.695780 8884.0 9056.00 9071.0 9263.00 9282.0
    mopt 3600 20.0 8797.400000 383.086001 8025.0 8611.50 8881.0 9060.25 9363.0
    libfuzzer 3600 20.0 8751.150000 96.931487 8573.0 8703.25 8733.5 8813.75 8944.0
    centipede 3600 11.0 8381.545455 60.793690 8295.0 8357.50 8376.0 8380.00 8533.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

lcms_cms_transform_all_fuzzer_97d37d summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    afl 3600 20.0 0.00 0.00000 0.0 0.0 0.0 0.0 0.0
    centipede 3600 4.0 0.00 0.00000 0.0 0.0 0.0 0.0 0.0
    libafl 3600 16.0 0.25 0.57735 0.0 0.0 0.0 0.0 2.0
    libfuzzer 3600 20.0 0.00 0.00000 0.0 0.0 0.0 0.0 0.0
    mopt 3600 20.0 0.00 0.00000 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 3600 16.0 1868.5625 233.672411 1505.0 1625.25 1872.5 2063.75 2203.0
    libfuzzer 3600 20.0 1481.5500 150.911084 1208.0 1351.75 1491.5 1579.00 1763.0
    mopt 3600 20.0 843.7000 128.518727 668.0 776.50 788.0 881.75 1134.0
    afl 3600 20.0 821.1000 173.624186 443.0 769.25 776.5 966.00 1096.0
    centipede 3600 4.0 546.5000 9.255629 539.0 542.00 543.5 548.00 560.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

libaom_av1_dec_fuzzer_6e1848 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 2700 12.0 0.666667 0.778499 0.0 0.0 0.5 1.0 2.0
    centipede 2700 10.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    libafl 2700 20.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    libfuzzer 2700 20.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    mopt 2700 20.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    honggfuzz 2700 12.0 9925.416667 256.780578 9491.0 9823.50 9945.5 10094.50 10277.0
    mopt 2700 20.0 9543.100000 86.034815 9325.0 9504.00 9573.0 9580.75 9658.0
    libafl 2700 20.0 9576.800000 70.573292 9479.0 9521.50 9555.5 9616.25 9725.0
    centipede 2700 10.0 9520.400000 28.550929 9449.0 9517.00 9525.5 9534.75 9557.0
    libfuzzer 2700 20.0 9509.700000 67.363195 9376.0 9459.75 9524.0 9554.00 9640.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

libpcap_fuzz_filter_98b0a2 summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 900 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    centipede 900 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    honggfuzz 900 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    libafl 900 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    libfuzzer 900 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    mopt 900 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 900 10.0 2741.300000 81.704141 2631.0 2674.50 2747.5 2787.00 2871.0
    mopt 900 20.0 2599.800000 228.965086 1644.0 2616.00 2654.5 2669.25 2736.0
    libfuzzer 900 20.0 2657.450000 92.731347 2533.0 2571.50 2649.0 2727.00 2850.0
    libafl 900 20.0 2645.900000 64.868206 2506.0 2603.75 2643.0 2683.75 2805.0
    honggfuzz 900 20.0 2635.100000 73.854694 2561.0 2593.25 2607.5 2652.75 2817.0
    centipede 900 12.0 2467.666667 40.806268 2396.0 2434.25 2477.0 2507.00 2515.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

libxml2_xml_e85b9b summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    centipede 1800 14.0 0.285714 0.468807 0.0 0.0 0.0 0.75 1.0
    honggfuzz 1800 20.0 0.000000 0.000000 0.0 0.0 0.0 0.00 0.0
    libafl 1800 20.0 0.000000 0.000000 0.0 0.0 0.0 0.00 0.0
    libfuzzer 1800 20.0 0.050000 0.223607 0.0 0.0 0.0 0.00 1.0
    mopt 1800 20.0 0.000000 0.000000 0.0 0.0 0.0 0.00 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 1800 20.0 16794.700000 340.352000 16095.0 16610.00 16789.0 17051.50 17462.0
    honggfuzz 1800 20.0 15673.000000 162.753608 15302.0 15540.00 15686.0 15734.75 16000.0
    centipede 1800 14.0 15205.357143 256.819783 14902.0 14996.00 15122.0 15469.50 15592.0
    mopt 1800 20.0 14982.750000 547.410735 13380.0 14816.00 14891.0 15170.75 16096.0
    libfuzzer 1800 20.0 14096.800000 516.095728 13441.0 13739.75 14016.0 14355.25 15371.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

php_php-fuzz-parser_0dbedb summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    centipede 8100 7.0 0.571429 0.534522 0.0 0.0 1.0 1.0 1.0
    afl 8100 20.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    honggfuzz 8100 5.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0
    libfuzzer 8100 20.0 0.050000 0.223607 0.0 0.0 0.0 0.0 1.0
    mopt 8100 20.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    centipede 8100 7.0 17002.857143 72.753989 16864.0 16979.5 17019.0 17049.50 17079.0
    honggfuzz 8100 5.0 16458.000000 82.413591 16329.0 16447.0 16465.0 16497.00 16552.0
    afl 8100 20.0 16330.300000 81.458482 16093.0 16314.0 16351.0 16370.75 16463.0
    mopt 8100 20.0 16306.500000 62.101360 16175.0 16269.0 16303.5 16347.25 16414.0
    libfuzzer 8100 20.0 16295.050000 67.703281 16114.0 16282.0 16298.0 16328.00 16398.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

systemd_fuzz-network-parser_288baf summary

Discovered bug coverage distribution
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
Mean bug coverage growth over time
Mean bug coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
  • Sample statistics and statistical significance (bugs covered)
    Bug coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 2700 1.0 0.0 NaN 0.0 0.0 0.0 0.0 0.0
    honggfuzz 2700 14.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    libafl 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    libfuzzer 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
    mopt 2700 20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflsmart 2700 1.0 3023.000000 NaN 3023.0 3023.00 3023.0 3023.00 3023.0
    mopt 2700 20.0 2977.500000 15.122308 2951.0 2966.00 2977.0 2988.75 3001.0
    libfuzzer 2700 20.0 2395.900000 591.625940 1719.0 1803.50 2426.0 2994.75 3065.0
    honggfuzz 2700 14.0 2451.571429 400.738686 2123.0 2161.75 2187.5 2945.25 3012.0
    libafl 2700 20.0 1631.450000 6.435714 1624.0 1625.50 1631.5 1637.00 1641.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.

experiment data

You can download the raw data for this report here.

Check out the documentation on how to create customized reports using this data. Also see some example Colab notebooks for doing custom analysis on the data here.

Experiment Description:

(None,)