FuzzBench: 2025-02-10-path-afl-1 report

experiment summary

We show two different aggregate (cross-benchmark) rankings of fuzzers. The first is based on the average of per-benchmarks scores, where the score represents the percentage of the highest reached median code-coverage on a given benchmark (higher value is better). The second ranking shows the average rank of fuzzers, after we rank them on each benchmark according to their median reached code-covereges (lower value is better).
By avg. score
average normalized score
fuzzer
aflplusplus 99.63
aflsmart 97.86
eclipser 97.68
aflfast 96.50
libfuzzer 96.05
path_afl 93.22
fairfuzz 90.17
centipede 49.93
libafl 49.53
By avg. rank
average rank
fuzzer
aflplusplus 2.5
libfuzzer 3.0
aflsmart 4.5
eclipser 4.5
path_afl 4.5
centipede 5.0
aflfast 6.5
libafl 6.5
fairfuzz 7.5
  • Critical difference diagram
    The diagram visualizes the average rank of fuzzers (second ranking above) while showing the significance of the differences as well. What is considered a "critical difference" (CD) is based on the Friedman/Nemenyi post-hoc test. See more in the documentation.
    Note: If a fuzzer does not support all benchmarks, its ranking as shown in this diagram can be lower than it should be. So please check the list of supported benchmarks for the fuzzer(s) of your interest. The list could be specified in the fuzzer's README.md like this.
  • Median relative code-coverages on each benchmark

    Note: The relative coverage summary table shows the median relative performance of each fuzzer to the experiment maximum. Thus the highest relative performance may not be 100%.
    trial_relative_coverage = trial_coverage / experiment_max_coverage

      aflplusplus centipede aflsmart eclipser libafl aflfast libfuzzer path_afl fairfuzz
    FuzzerMedian 96.50 96.00 95.00 95.00 95.00 94.00 93.00 90.00 87.50
    FuzzerMean 96.50 96.00 95.00 95.00 95.00 94.00 93.00 90.00 87.50
    bloaty_fuzz_target 98.00 nan 95.00 95.00 nan 95.00 90.00 85.00 81.00
    libpng_libpng_read_fuzzer 95.00 96.00 95.00 95.00 95.00 93.00 96.00 95.00 94.00
    • Fuzzers are sorted by "FuzzerMean" (average median relative coverage), highest on the left.
    • Green background = highest relative median coverage.
    • Blue gradient background = greater than 95% relative median coverage.

bloaty_fuzz_target summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: libfuzzer, path_afl, eclipser.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflplusplus 82800 20.0 6271.050000 51.508277 6130.0 6249.00 6292.0 6296.75 6340.0
    aflsmart 82800 20.0 6146.000000 146.262561 5784.0 6070.75 6104.0 6271.00 6381.0
    eclipser 82800 11.0 6053.181818 68.930136 5914.0 6022.50 6078.0 6095.50 6135.0
    aflfast 82800 20.0 6077.150000 116.111233 5846.0 6016.00 6076.0 6140.25 6335.0
    libfuzzer 82800 14.0 5831.857143 105.791418 5627.0 5772.00 5795.5 5929.50 5981.0
    path_afl 82800 13.0 5451.538462 55.699514 5335.0 5411.00 5479.0 5484.00 5521.0
    fairfuzz 82800 16.0 5189.937500 67.159977 5111.0 5135.25 5170.0 5226.00 5323.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

libpng_libpng_read_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: libfuzzer, path_afl.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libfuzzer 82800 10.0 2017.900000 1.100505 2015.0 2018.00 2018.0 2018.00 2019.0
    centipede 82800 17.0 2013.529412 3.710082 2006.0 2013.00 2015.0 2016.00 2018.0
    path_afl 82800 10.0 2001.300000 11.719405 1981.0 1991.50 2005.0 2010.25 2016.0
    aflplusplus 82800 18.0 2007.777778 20.377483 1999.0 2002.00 2003.0 2005.00 2089.0
    libafl 82800 17.0 1997.705882 24.881632 1973.0 1980.00 1999.0 2001.00 2084.0
    eclipser 82800 18.0 1983.000000 27.563830 1900.0 1988.25 1993.5 1996.25 1999.0
    aflsmart 82800 20.0 1964.400000 41.639303 1888.0 1926.50 1992.5 1995.25 1998.0
    fairfuzz 82800 18.0 1972.166667 30.020091 1889.0 1955.75 1981.0 1996.00 1999.0
    aflfast 82800 20.0 1940.050000 40.045336 1856.0 1926.25 1946.5 1973.50 1987.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

experiment data

You can download the raw data for this report here.

Check out the documentation on how to create customized reports using this data. Also see some example Colab notebooks for doing custom analysis on the data here.

Experiment Description:

(None,)