Hugo Chávez dominated the Venezuelan electoral landscape since his first presidential victory in 1998 until his death in 2013. Nobody doubts that he always received considerable voter support in the numerous elections held during his mandate. However, the integrity of the electoral system has come into question since the 2004 Presidential Recall Referendum. From then on, different sectors of society have systematically alleged electoral irregularities or biases in favor of the incumbent party. We have carried out a thorough forensic analysis of the national-level Venezuelan electoral processes held during the 1998–2012 period to assess these complaints. The second-digit Benford's law and two statistical models of vote distributions, recently introduced in the literature, are reviewed and used in our case study. In addition, we discuss a new method to detect irregular variations in the electoral roll. The outputs obtained from these election forensic tools are examined taking into account the substantive context of the elections and referenda under study. Thus, we reach two main conclusions. Firstly, all the tools uncover anomalous statistical patterns, which are consistent with election fraud from 2004 onwards. Although our results are not a concluding proof of fraud, they signal the Recall Referendum as a turning point in the integrity of the Venezuelan elections. Secondly, our analysis calls into question the reliability of the electoral register since 2004. In particular, we found irregular variations in the electoral roll that were decisive in winning the 50% majority in the 2004 Referendum and in the 2012 Presidential Elections.

Hugo Chávez was elected President of Venezuela in 1998 and ruled the country until his death in 2013. He won four consecutive presidential elections (1998, 2000, 2006 and 2012) and a recall referendum (2004), convened against him by opposition forces. He also proposed several major reforms that were approved in national referenda (two held in 1999, one in 2000 and another in 2009). In addition, his party won an overall majority in the National Assembly in three parliamentary elections that took place during his presidency (2000, 2005 and 2010), and in all regional and local elections. His sole election defeat came in the 2007 constitutional referendum, when he attempted a radical socio-political reform. This electoral record could be overshadowed, however, by the allegations of fraud made by opposition sectors since the 2004 Recall Referendum

The electoral law, approved in Venezuela in 1997, established the automation of the vote count. In the period between 1998 and 2000, the vote count was carried out both manually and automatically. However, since 2004 the results come exclusively from a computer center, where the data from the voting machines distributed throughout the country are centralized. Another important characteristic that differentiates the electoral processes before and after 2004 is the composition of the governing body of the elections, the National Electoral Council (CNE in Spanish). The National Assembly, which was controlled by the ruling coalition, appointed an openly pro-government management body. Four out of the five current CNE's rectors lean strongly towards the ruling party and only one to the opposition forces. Although the CNE has improved the transparency and reliability of the electoral system, particularly since 2006, the fact is that the Venezuelan electoral authority has taken controversial decisions that have only ever favored the government and never the opposition

Despite the frequent use of the term, there is ambiguity regarding what is and what is not electoral fraud. What may constitute fraud in one country, or at a particular moment, may not be considered as such in another. Nonetheless, any irregular action that is performed with the intention of altering the development of an election or election-related materials, with the aim of affecting its results, may be considered a fraud

Some electoral irregularities may leave traces in the form of numerical anomalies. If this is the case, they can be detected by appropriate statistical methods. The main idea underlying these methods is the comparison between observed values of statistics based on the vote count and their expected values. When we say expected value, we usually mean the regular value in a free and fair election. Therefore, large discrepancies between observed values and expected ones (

This paper proceeds as follows. In the next section we describe the election data under study. Then, we apply a battery of election fraud forensic tests, which provide consistent and complementary results. Thereafter, we turn to a discussion on the integrity of Venezuelan elections and present some final conclusions.

In our study we considered the following Venezuelan elections:

Presidential elections 1998, 2000, 2006 and 2012

Referenda 1999, 2004, 2007, and 2009

Parliamentary elections 2005 and 2010

Therefore, we took into account every year of national-level elections since Chávez first won the presidency of Venezuela until his death. However, for the 2000 general elections, known as ‘Mega-elections’ because every single official was re-elected, we only considered data from the presidential elections. In 1999 there were two referenda, one in April and one in December, and one election in July for the seats of the National Constituent Assembly (NCA). During the April referendum, two queries were made: about the convening of the NCA to draft a new constitution and about the approval of the basis for this constituent process. In December, the new constitution was adopted by national referendum. We only considered the April referendum due to the lack of available data for the July elections and the December referendum at the level of breakdown we require for our analysis. The official data (available at

For our analysis, we have taken into account data at the least aggregation level. The polling cluster that collects this data has been denominated differently in diverse elections: voting table, electoral notebook, voting machine, etc. To avoid confusion, we will refer to it as

Unlike in an earlier version of this paper

Number of votes for Chávez. This means, votes for him in presidential elections, for his proposals (in referenda), and for the endorsed candidates by the ruling party (in parliamentary elections)

Number of valid votes

Number of registered voters

Polling center to which the electoral unit belongs

For each election, we consolidated these data in one set, labeled with the year of the election, except for the 1999 and 2007 referenda and the 2010 Parliamentary elections, for which there are two data sets. 1999a, 1999b, 2007a and 2007b are the abbreviations to refer to the data associated to the two questions considered in the referenda of 1999 and 2007. The 2010 Parliamentary elections were preceded by an electoral reform. Under the approved system, 70% of the 165 deputies of the National Assembly were elected on a first-past-the post system and 30% on a party list. The results are considered in two separate sets, labeled 2010a and 2010b, respectively. Each polling center is identified by a code. The numbers were re-labeled. We used the old labels for elections and referenda previous to 2005 and the new ones for elections and referenda from 2005 onwards. The conversion table and the election data under consideration are available at

Election | 1998 | 1999a | 1999b | 2000 | 2004 | 2005 | 2006 | 2007a | 2007b | 2009 | 2010a | 2010b | 2012 |

[%] Chávez | 56.20 | 87.75 | 81.74 | 59.76 | 59.09 | 85.50 | 62.84 | 49.29 | 48.94 | 54.85 | 48.20 | 48.72 | 55.07 |

[%] Turnout | 63.45 | 37.65 | 37.65 | 56.63 | 69,92 | 25.26 | 74.69 | 55.90 | 55.90 | 69.92 | 66.45 | 66.45 | 80.56 |

The Benford test for the second significant digit is one of the most commonly-used tools in election forensics. It has been previously used to analyze the 2000 Presidential Elections and the 2004 Recall Referendum

From polling places that collect election data with 10 or more votes favoring Chávez, consider the proportion

Discrepancies between the frequency distribution and the law may be interpreted as evidence of fraud of various kinds.

The most accepted discrepancy measure between the frequencies distribution and the law is the Pearson's chi-square statistic

The statistics is the basis of an uncritical practice to test the null hypothesis _{0}_{0}_{0}_{0}

Despite its widespread use, the application of Benford's test has been severely criticized

Firstly, let us consider the electoral units with 10 or more votes favoring Chávez.

Top panels: Electoral units with 10 or more votes for Chávez. Bottom panels: Electoral units with 100 or more votes for Chávez. Left panels: Presidential elections and referenda previous to 2004. Right panels: Elections and referenda between 2004 and 2012. The proportions of the 2005 Parliamentary Elections are partially out of the

Election | 1998 | 1999a | 1999b | 2000 | 2004 | 2005 | 2006 | 2007a | 2007b | 2009 | 2010a | 2010b | 2012 |

95.17 | 94.16 | 101.1 | 49.47 | 203.93 | 1977.0 | 151.74 | 176.88 | 195.77 | 197.68 | 157.25 | 143.33 | 192.58 | |

0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

0.000 | 0.000 | 0.000 | 0.9993 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |

Statistical tests examining the fit of Benford's law have more power on data with several significant digits. In fact, in accounting fraud detection, among other fields, it is a usual practice to restrict the analysis to data with three or more significant digits _{0}

Election | 1998 | 1999a | 1999b | 2000 | 2004 | 2005 | 2006 | 2007a | 2007b | 2009 | 2010a | 2010b | 2012 |

53.04 | 56.57 | 69.19 | 35.26 | 189.78 | 55178. | 114.97 | 203.76 | 232.33 | 135.08 | 157.89 | 122.13 | 132.68 | |

0.000 | 0.000 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |

1.000 | 0.997 | 0.388 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |

It is well known that Benford's test can be applied to data that are distributed across multiple orders of magnitude. The votes per electoral unit are certainly not. They are less than 600 in almost any Venezuelan election. The bound is (twice) larger only in 2000. The natural way to span these data to higher orders of magnitude is to consider outcomes per polling center. Polling centers may combine multiple electoral units in the same voting place with a number of votes for Chávez above 8000.

Left panel: Presidential elections and referenda previous to 2004. Right panel: Elections and referenda between 2004 and 2012.

Election | 1998 | 1999a | 1999b | 2000 | 2004 | 2005 | 2006 | 2007a | 2007b | 2009 | 2010a | 2010b | 2012 |

9.27 | 10.99 | 10.34 | 6.20 | 14.90 | 11.72 | 16.15 | 16.95 | 12.09 | 20.52 | 18.08 | 12.02 | 9.86 | |

.4127 | .2764 | .3237 | .7197 | .0937 | .2296 | .0638 | .0495 | .2083 | .0150 | .0343 | .2122 | .3619 | |

1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |

Beyond the controversies concerning the application of Second-Digit Benford's law for fraud detection, we can extract at least one conclusion from our analyses: 2004 appears to be an inflection point in which the Venezuelan elections begin to move away from the law. A recent study based on authentic and synthetic election data reports that the non compliance of the law is associated with fraud at least in 50% of cases

The second-digit Benford's law and other tests based on the frequency of digits

Left panel: 1998 Presidential elections. Right panel: 2004 Recall Referendum. Color represents the number of electoral units with corresponding (

According to Klimek et al., fit models for fingerprints of fair elections should correspond to bivariate Gaussian distributions. They test this hypothesis with many countries; including Austria, the Czech Republic, Finland, France, Poland, Romania, Spain, and Switzerland. They also consider non-fraudulent mechanisms that can explain discrepancies from the bivariate Gaussian distribution, e.g. the heterogeneity of the Canadian population. In addition, they discuss fraudulent processes that may contribute to deviations from their fair election model, such as ballot stuffing and coercion to obtain complete turnout and votes for winner. The 1998 Venezuelan Presidential Elections are very close to their model of fair elections, while the 2004 Recall Referendum is farther from it. Leaving aside whether or not there was fraud in 2004, these two electoral processes provide two different fingerprint models for the same electoral population, corresponding to two crucial moments. We are interested in classifying the elections according to the election fingerprint model that better fits the data. For that, we rehearsed with several classification methods, obtaining similar results. Below, we show the outputs of a quadratic classifier that fits multivariate normal densities with covariance estimates stratified by group (1998 and 2004). We selected this method because it relies on the Gaussianity hypothesis of Klimek et al. The classifier provides a simple rule to determine when an electoral unit is an observation that most likely corresponds to the 1998 model rather than to the 2004 model. The results allow for the elections to be grouped into four categories, according to the shape of their fingerprints and the percentage of electoral units classified into the 1998 model, which we will denote by [%] Mod.98.

Each electoral unit represented by a blue circle has been classified as an observation of the Gaussian fit model based on 1998 data. Otherwise, it is represented by one red x. In both elections, the units are clustered around their respective averages of turnout and votes for Chávez. By excluding some units with turnout between 60% and 80%, and low support for Chávez (less than 20%), the scatterplots appear to be normally distributed.

The scatterplots have many units with high turnout and high support for Chávez.

However, the set of electoral units close to the top right corner is less dense (2010) or negligible (2007). Additionally, their [%] Mod.98 values are considerably high, as well as the percentage of electoral units classified into the 2004 model. These elections seem to fit a true mixture model.

This shape is a consequence of the low opposition turnout in 1999 and its almost total absence in 2005.

Election | 1998 | 1999a | 1999b | 2000 | 2004 | 2005 | 2006 | 2007a | 2007b | 2009 | 2010a | 2010b | 2012 |

[%] Mod.98 | 72.03 | 98.21 | 99.05 | 86.82 | 18.25 | 99.50 | 06.69 | 69.30 | 69.54 | 23.14 | 54.38 | 56.22 | 02.83 |

A simple way to summarize the outputs that we have discussed is by plotting the cumulative number of voters favoring the winner as a function of the turnout

The shape of every referendum/election is a sigmoid that reaches a plateau at the maximal vote count for Chávez. The curves of 2004, 2006, and 2012 increase close to complete turnout.

The analysis carried out suggests that 2004 is a breakpoint in the voting behavior of the Venezuelans. The election fingerprints of the presidential elections previous to 2004 fit well the model of fair elections proposed by Klimek et al. The low opposition turnout in the 1999 referendum and parliamentary elections 2005 can explain the deviations of these processes from the Gaussian model. The recall referendum showed a new Venezuelan election fingerprint, that was farther from the Gaussian fair election model. Its shape is shared by the referenda and elections held between 2004 and 2012, in particular by the 2006 and 2012 presidential elections and the 2009 referendum, processes that were characterized by many electoral units with high voter turnout and strong support for Chávez. Many factors can explain the presence of units with these characteristics. Certainly, as Klimek et al. argue, one may be ballot stuffing and/or coercion in some electoral units. But there are also other non-fraudulent devices that can explain these results. As Mebane concludes

As we have already discussed

Denote by

Let

Denote

Then, given

Only the 2000 elections show slightly heavier tails.

If an election is fair, including that election resources are distributed with equity among the polling centers,

The set of the _{k}

The null hypothesis _{1}_{k}

We propose a test for _{1}

Let _{k}_{i}_{i}

Denote by

Then, if _{1}_{1}_{1}_{1}_{i}

_{1}_{1}_{1}

Standardized differences of fair elections computed from a hierarchical bootstrap model (thin blue lines) also verify the expected behavior under _{1}

They are well above the 99% normal confidence interval. These elections provide strong evidence against _{1}

The alternative hypothesis to _{1}

Although we cannot discard that some of the anomalous patterns observed in elections since 2004 can be the result of non-fraudulent mechanisms that could affect the vote distribution, we have not found a convincing explanation of why the irregularities on the vote distribution were mainly observed in electoral units that favored Chávez. But, if there were election irregularities in those units, we have not estimated how they could affect the overall results. In light of the data collected over time, we tried a new approach to address this problem, which requires some preliminaries.

Electoral units in polling centers can be different from one election to the next. The number is determined by the electoral referee, who considers the number of registered voters in polling centers, among other control variables. Nevertheless, many polling centers are common in two consecutive elections. Thus, we will henceforth consider only results by polling center.

As mentioned earlier, one concern of the opposition is the possible fraudulent manipulation of the electoral register, which has grown dramatically over the last few years. In the twelve years between the presidential elections of 2000 and 2012, it increased by roughly 60% whereas the Venezuelan population grew significantly less between 2001 and 2011 (around 16% according to the projections of the Venezuelan census bureau). This is a controversial demographical problem

Denote by

This indicator corresponds to the standard measure of inter-annual population growth. To get an idea of what the range of

A way to visualize the effect of irregular variations in centers is by computing the proportion of votes favoring Chávez as a function of |

The curves are centered, by subtracting the overall percentage of votes obtained by Chávez en each case.

The official results of these elections are reached at extremely large values of the inter-annual variation. At small values, up to 69% of the total valid votes, the results are tight for the 2004 Recall Referendum. At moderate levels, up to the 79% of the total valid votes, the results are adverse for Chávez in the 2012 Presidential Elections.

For the 2004 Recall Referendum, tight results in centers with variation of less than 1%. We note that these centers represent up to 69% of the total valid votes.

For the 2012 Presidential Election, adverse results in centers with variation of less than 4%. These centers represent up to 79% of total valid votes.

A strong increase close to extreme values of |

Until the 2000 Presidential Elections, the historical growth rate of the electoral roll was 11% every five years. The growth rate for the 2004 Recall Referendum was more than twice larger. This notable increment was a direct consequence of

Since the establishment of democracy in 1958, Venezuelan elections were considered free and fair until the 1993 Presidential Elections. Although the electoral outcome was accepted, and the winner did not face any legitimacy problem

The 1999 and 2000 electoral processes were carried out with roughly the same voting system used in 1998

The recall referendum of 2004 has been widely analyzed

From 2006, the CNE introduced important improvements. These included better infrastructure, more guarantees for the secrecy of the ballot, and an increase of the audits carried out on the system. In particular, the post-election audits now involve more than 54% of the electoral units. During this period, the electoral council also continued with the campaign initiated in 2004, aimed at the inclusion of new voters. The improvements have led to a growing confidence in elections as a tool for political change

The 2010 Parliamentary Elections were preceded by a new electoral reform. Under the approved system, the percentage of deputies of the National Assembly elected by nominal election increased from 60% to 70%. Furthermore, the reform legalized a practice with which the government's party had been clearly overrepresented since the 2004 regional elections (colloquially called as

With the above discussion, we wish to emphasize that irregularities and problems were common in all electoral processes during the Chávez period. In elections previous to the recall referendum, these irregularities were not enough to question the results, which were accepted by the main political actors. Otherwise, the nature and range of the irregularities after 2004 have deeply concerned the opposition, but in the end the results were also accepted for practical reasons. We must remember that the opposition denounced fraud in 2004 and with this decision it initiated a costly political strategy for two years, in particular renouncing a fundamental space in the parliament elected in 2005. Our election forensics is consistent with this analysis:

We detected no signs of fraud for elections and referenda previous to 2004.

We found anomalous statistical patterns, which may be traces of election irregularities, in electoral processes between 2006 and 2010.

We cannot discard outcome-determinative fraud in the 2004 referendum, as has been already reported

Our analysis for the 2012 presidential elections offers a controversial finding. We find statistical evidence, which may be interpreted as signals of systematic election irregularities, similar to that observed for 2004. Contrary to the opinion of radical sectors, which did not accept the results of the elections, the opposition candidate conceded defeat

Firstly, both international observers and the opposition have recommended for years a full audit of the electoral register. This has undergone tremendous changes with the voter inclusion program (

Secondly, while the opposition made a great effort to be present at the post-election audits, they failed to meet their goal. The most basic element in the post-election audits is the manual verification of the number of total votes in all the electoral units. This simple procedure was taken in only approximately 6% of the electoral units

Thirdly, these elections were marked by a large number of electoral complaints

In this paper, we have applied four different forensic analyses for the Venezuelan national elections held during the Chávez mandate. In particular, we discussed the use of the second-digit Benford's law, two different approaches for the statistical detection of systematic election irregularities, and a tool based on the evolution of the electoral register. In order to reach a better understanding of the obtained results, we have placed them in their political context. Thus, we provide a thorough evaluation of the integrity of the electoral processes under study. Our results subscribe with the results of previous studies on the referenda of 2004

In sum, we have found anomalous statistical patterns consistent with a hypothetical electoral fraud in the 2004 recall referendum and all elections and referenda held between 2006 and 2012. Although this does not mean that we provide concluding evidence of fraud, specifically of outcome-determinative fraud, this raises serious doubts regarding the impartiality of the current electoral authority and supports the allegations of fraud claimed by important sectors of the Venezuelan society. Our study calls into question the reliability of the electoral register, a major concern since 2004. In particular, we detected irregular variations in the electoral roll that could have overturned the results for the 2004 referendum and the 2012 elections. As a corollary to our analysis, we recommend monitoring polling centers where atypical support (extreme

The authors thank the Academic Editor and the three anonymous reviewers for valuable comments. They also acknowledge careful readings of the manuscript to Anxo Sánchez and Francisco Seijo.