FuzzBench: 2024-09-12-dgfuzz report

experiment summary

We show two different aggregate (cross-benchmark) rankings of fuzzers. The first is based on the average of per-benchmarks scores, where the score represents the percentage of the highest reached median code-coverage on a given benchmark (higher value is better). The second ranking shows the average rank of fuzzers, after we rank them on each benchmark according to their median reached code-covereges (lower value is better).
By avg. score
average normalized score
fuzzer
aflplusplus 94.06
eclipser 88.40
dgfuzz_32d973 85.74
dgfuzz_1d7283 82.59
mopt 74.26
aflsmart 72.99
fairfuzz 72.93
aflfast 70.78
centipede 69.89
libafl 48.68
honggfuzz 11.90
libfuzzer 11.43
afl 11.28
By avg. rank
average rank
fuzzer
aflplusplus 2.38
dgfuzz_32d973 3.00
dgfuzz_1d7283 4.12
eclipser 5.12
aflsmart 5.75
mopt 5.75
fairfuzz 6.50
libafl 7.12
aflfast 7.88
centipede 8.00
honggfuzz 9.38
libfuzzer 9.62
afl 9.88
  • Critical difference diagram
    The diagram visualizes the average rank of fuzzers (second ranking above) while showing the significance of the differences as well. What is considered a "critical difference" (CD) is based on the Friedman/Nemenyi post-hoc test. See more in the documentation.
    Note: If a fuzzer does not support all benchmarks, its ranking as shown in this diagram can be lower than it should be. So please check the list of supported benchmarks for the fuzzer(s) of your interest. The list could be specified in the fuzzer's README.md like this.
  • Median relative code-coverages on each benchmark

    Note: The relative coverage summary table shows the median relative performance of each fuzzer to the experiment maximum. Thus the highest relative performance may not be 100%.
    trial_relative_coverage = trial_coverage / experiment_max_coverage

      honggfuzz libafl dgfuzz_32d973 aflplusplus dgfuzz_1d7283 libfuzzer afl eclipser centipede mopt aflsmart fairfuzz aflfast
    FuzzerMedian 92.00 91.50 94.00 92.50 94.00 88.00 87.00 82.50 81.00 78.50 79.00 77.50 78.50
    FuzzerMean 92.00 92.00 91.71 88.75 88.43 88.00 87.00 83.50 74.57 70.62 69.62 69.50 67.50
    harfbuzz_hb-shape-fuzzer nan nan nan 98.00 nan nan nan 97.00 nan 97.00 97.00 85.00 96.00
    lcms_cms_transform_fuzzer nan nan 94.00 89.00 95.00 nan nan 74.00 35.00 50.00 40.00 54.00 28.00
    libpcap_fuzz_both nan nan 78.00 86.00 78.00 nan nan 71.00 81.00 0.00 0.00 0.00 1.00
    mbedtls_fuzz_dtlsclient nan 87.00 94.00 69.00 71.00 nan nan 67.00 66.00 67.00 68.00 70.00 65.00
    openthread_ot-ip6-send-fuzzer nan 88.00 86.00 76.00 87.00 nan nan 74.00 70.00 71.00 71.00 68.00 71.00
    stb_stbi_read_fuzzer 92.00 nan 96.00 96.00 95.00 88.00 87.00 91.00 85.00 86.00 87.00 86.00 86.00
    vorbis_decode_fuzzer nan 98.00 99.00 99.00 99.00 nan nan 98.00 90.00 98.00 98.00 97.00 98.00
    zlib_zlib_uncompress_fuzzer nan 95.00 95.00 97.00 94.00 nan nan 96.00 95.00 96.00 96.00 96.00 95.00
    • Fuzzers are sorted by "FuzzerMean" (average median relative coverage), highest on the left.
    • Green background = highest relative median coverage.
    • Blue gradient background = greater than 95% relative median coverage.

harfbuzz_hb-shape-fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: fairfuzz, aflplusplus.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflplusplus 82800 9.0 10874.000000 26.748832 10814.0 10867.00 10878.0 10892.00 10903.0
    mopt 82800 20.0 10778.950000 52.249880 10651.0 10757.50 10789.5 10796.25 10863.0
    eclipser 82800 20.0 10769.400000 51.530676 10623.0 10744.25 10777.0 10789.50 10868.0
    aflsmart 82800 20.0 10763.700000 42.521945 10652.0 10753.00 10764.5 10791.00 10839.0
    aflfast 82800 20.0 10677.750000 59.487172 10566.0 10633.00 10675.5 10714.50 10791.0
    fairfuzz 82800 14.0 9509.571429 347.646530 9071.0 9259.50 9399.0 9715.75 10155.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

lcms_cms_transform_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: aflplusplus, dgfuzz_32d973, dgfuzz_1d7283.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    dgfuzz_1d7283 82800 7.0 2140.857143 59.965069 2086.0 2086.50 2144.0 2171.00 2241.0
    dgfuzz_32d973 82800 10.0 2133.300000 35.571680 2077.0 2110.25 2119.5 2171.50 2177.0
    aflplusplus 82800 14.0 1885.785714 218.852665 1583.0 1629.75 1996.0 2055.50 2098.0
    eclipser 82800 17.0 1656.823529 137.802955 1398.0 1537.00 1671.0 1753.00 1893.0
    fairfuzz 82800 18.0 1274.277778 397.407600 800.0 901.00 1221.0 1636.25 1940.0
    mopt 82800 20.0 1154.600000 428.242602 650.0 697.50 1133.0 1559.00 1736.0
    aflsmart 82800 20.0 1074.750000 469.254714 650.0 652.00 900.0 1561.00 1791.0
    centipede 82800 20.0 938.450000 266.317277 732.0 778.50 796.0 949.50 1469.0
    aflfast 82800 20.0 655.900000 152.566086 476.0 629.50 640.5 644.25 1276.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

libpcap_fuzz_both summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: mopt, dgfuzz_32d973, aflfast, dgfuzz_1d7283, aflplusplus.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflplusplus 82800 2.0 2979.500000 36.062446 2954.0 2966.75 2979.5 2992.25 3005.0
    centipede 82800 20.0 2413.350000 1012.985960 100.0 2591.50 2824.0 2926.50 3133.0
    dgfuzz_32d973 82800 9.0 2744.000000 159.184641 2544.0 2669.00 2707.0 2761.00 3100.0
    dgfuzz_1d7283 82800 6.0 2742.833333 177.265244 2513.0 2654.25 2696.0 2866.75 2988.0
    eclipser 82800 19.0 2461.052632 196.678947 1977.0 2403.00 2446.0 2609.00 2706.0
    aflfast 82800 6.0 38.500000 4.929503 34.0 34.00 38.5 43.00 43.0
    aflsmart 82800 20.0 34.000000 0.000000 34.0 34.00 34.0 34.00 34.0
    fairfuzz 82800 16.0 37.375000 4.500000 34.0 34.00 34.0 43.00 43.0
    mopt 82800 14.0 36.214286 3.826599 34.0 34.00 34.0 37.00 43.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

mbedtls_fuzz_dtlsclient summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: fairfuzz, dgfuzz_32d973, aflplusplus, dgfuzz_1d7283.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    dgfuzz_32d973 82800 13.0 3521.846154 485.515679 2826.0 2851.00 3756.0 3866.00 3972.0
    libafl 82800 17.0 3318.235294 399.724675 2714.0 2769.00 3481.0 3610.00 3811.0
    dgfuzz_1d7283 82800 9.0 3174.888889 415.069405 2809.0 2829.00 2845.0 3577.00 3726.0
    fairfuzz 82800 15.0 2849.466667 159.218119 2754.0 2779.00 2801.0 2807.50 3296.0
    aflplusplus 82800 11.0 2764.181818 25.929976 2723.0 2750.00 2761.0 2784.50 2802.0
    aflsmart 82800 20.0 2709.000000 26.942434 2665.0 2701.75 2705.5 2713.00 2792.0
    eclipser 82800 16.0 2726.562500 280.074030 2496.0 2674.00 2692.5 2715.00 3730.0
    mopt 82800 20.0 2685.000000 34.546688 2562.0 2672.50 2692.5 2701.25 2726.0
    centipede 82800 20.0 2646.050000 22.818448 2613.0 2625.50 2641.0 2668.50 2677.0
    aflfast 82800 20.0 2561.850000 115.700737 2312.0 2580.00 2611.0 2629.75 2658.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

openthread_ot-ip6-send-fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: dgfuzz_1d7283, aflplusplus, dgfuzz_32d973.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    libafl 82800 16.0 3608.250000 237.543400 3044.0 3547.50 3564.5 3637.00 4049.0
    dgfuzz_1d7283 82800 11.0 3512.454545 49.301853 3412.0 3497.00 3529.0 3549.00 3558.0
    dgfuzz_32d973 82800 7.0 3351.000000 324.034463 3005.0 3037.50 3500.0 3522.00 3833.0
    aflplusplus 82800 8.0 3198.125000 217.796391 3049.0 3057.75 3086.0 3247.50 3585.0
    eclipser 82800 16.0 2984.937500 66.946216 2895.0 2918.25 2999.5 3036.75 3073.0
    mopt 82800 20.0 2889.450000 41.043590 2826.0 2832.75 2912.5 2916.00 2936.0
    aflsmart 82800 20.0 2896.650000 46.006035 2828.0 2886.25 2907.0 2912.25 3025.0
    aflfast 82800 20.0 2888.250000 43.931616 2810.0 2865.75 2906.0 2911.75 2974.0
    centipede 82800 20.0 2842.250000 60.918302 2742.0 2786.50 2868.0 2887.00 2956.0
    fairfuzz 82800 19.0 2779.105263 65.278799 2676.0 2745.50 2764.0 2801.50 2912.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

stb_stbi_read_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: dgfuzz_1d7283, dgfuzz_32d973.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    dgfuzz_32d973 82800 7.0 2216.428571 15.447299 2189.0 2211.50 2221.0 2222.50 2237.0
    aflplusplus 82800 17.0 2174.176471 47.454235 2107.0 2116.00 2207.0 2211.00 2215.0
    dgfuzz_1d7283 82800 10.0 2226.900000 47.310910 2188.0 2193.00 2196.5 2275.00 2297.0
    honggfuzz 82800 20.0 2127.500000 30.768234 2111.0 2113.00 2115.5 2118.25 2199.0
    eclipser 82800 19.0 2102.210526 10.141110 2081.0 2099.50 2106.0 2108.00 2115.0
    libfuzzer 82800 20.0 2042.750000 46.827652 1985.0 2002.25 2031.5 2084.75 2118.0
    aflsmart 82800 20.0 2025.700000 45.962342 1941.0 2000.50 2007.5 2083.75 2093.0
    afl 82800 20.0 2010.300000 38.377762 1975.0 1986.00 2004.0 2006.25 2105.0
    fairfuzz 82800 16.0 1999.312500 43.870596 1942.0 1976.50 1993.5 1999.00 2084.0
    mopt 82800 20.0 1997.000000 28.771331 1963.0 1980.75 1992.0 2004.00 2072.0
    aflfast 82800 19.0 1987.578947 15.568186 1961.0 1978.00 1989.0 2002.50 2008.0
    centipede 82800 20.0 1961.850000 7.768187 1953.0 1956.00 1960.0 1963.25 1985.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

vorbis_decode_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: eclipser, aflplusplus, dgfuzz_1d7283, dgfuzz_32d973.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    dgfuzz_32d973 82800 7.0 1255.571429 22.374518 1205.0 1262.00 1264.0 1264.50 1267.0
    aflplusplus 82800 9.0 1263.666667 1.802776 1262.0 1262.00 1263.0 1264.00 1267.0
    dgfuzz_1d7283 82800 8.0 1263.625000 2.973094 1260.0 1262.00 1262.5 1264.75 1269.0
    mopt 82800 20.0 1250.400000 9.626717 1218.0 1252.00 1253.0 1254.00 1256.0
    aflsmart 82800 20.0 1244.100000 19.558011 1199.0 1247.00 1251.5 1254.00 1259.0
    aflfast 82800 20.0 1245.400000 19.491969 1183.0 1246.75 1251.0 1253.75 1258.0
    libafl 82800 17.0 1251.470588 3.659195 1245.0 1249.00 1250.0 1255.00 1259.0
    eclipser 82800 15.0 1248.133333 5.514483 1236.0 1245.00 1248.0 1252.50 1255.0
    fairfuzz 82800 18.0 1228.111111 28.060625 1175.0 1208.25 1238.0 1250.75 1257.0
    centipede 82800 20.0 1146.000000 17.161769 1117.0 1134.00 1142.5 1159.00 1177.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

zlib_zlib_uncompress_fuzzer summary

Ranking by median reached code coverage
Reached code coverage distribution
Mean code coverage growth over time
Mean code coverage growth over time
* The error bands show the 95% confidence interval around the mean code coverage.
error
The following fuzzers do not have enough samples: aflsmart, dgfuzz_32d973, aflplusplus, dgfuzz_1d7283.
  • Sample statistics and statistical significance (code coverage)
    Code coverage sample statistics
    count mean std min 25% median 75% max
    fuzzer time
    aflplusplus 82800 8.0 462.625000 3.739270 460.0 460.00 461.0 463.50 469.0
    fairfuzz 82800 16.0 457.875000 3.685557 455.0 455.75 457.0 459.00 470.0
    aflsmart 82800 12.0 455.750000 13.948770 416.0 455.75 456.0 462.00 470.0
    mopt 82800 20.0 456.100000 4.024922 449.0 455.00 455.5 458.50 464.0
    eclipser 82800 16.0 451.687500 13.189484 423.0 450.25 455.0 458.00 471.0
    dgfuzz_32d973 82800 12.0 450.916667 6.156421 440.0 446.00 452.0 456.50 458.0
    centipede 82800 20.0 454.650000 6.499190 451.0 451.00 451.0 455.25 471.0
    libafl 82800 17.0 452.529412 5.038820 446.0 449.00 451.0 458.00 462.0
    aflfast 82800 20.0 448.100000 15.109774 386.0 449.00 449.0 454.00 460.0
    dgfuzz_1d7283 82800 8.0 446.875000 5.488625 440.0 444.50 446.0 449.75 456.0

    Vargha-Delaney A12 measure
    The table summarizes the A12 values from the pairwise Vargha-Delaney A measure of effect size. Green cells indicate the probability the fuzzer in the row will outperform the fuzzer in the column.
    Mann-Whitney U test
    The table summarizes the p values of pairwise Mann-Whitney U tests. Green cells indicate that the reached coverage distribution of a given fuzzer pair is significantly different.
  • Unique code coverage plots
    Ranking by unique code branches covered
    Each bar shows the total number of code branches found by a given fuzzer. The colored area shows the number of unique code branches (i.e., branches that were not covered by any other fuzzers).
    Pairwise unique code coverage
    Each cell represents the number of code branches covered by the fuzzer of the column but not by the fuzzer of the row

experiment data

You can download the raw data for this report here.

Check out the documentation on how to create customized reports using this data. Also see some example Colab notebooks for doing custom analysis on the data here.

Experiment Description:

(None,)