Tuesday, April 14, 2009

Interrogating the Data

My good friend and colleague, Valentin Mikhailov, has reminded me that the main issue with election fraud is not its presence, but rather its scale. Systematic error - of origins both sinister and benign - is a common component of elections. While several political scientists have weighed in on methods to identify fraud, notably Walter Mebane, Misha Myagkov, and Peter Ordeshook, at best researchers can uncover data anomalies that are most plausibly explained by vote manipulation. As Mark Nigrini noted, forensic accountants tend to rely on a series of "data interrogation tests" to uncover improprieties. Even then, the existence of an anomaly does not reveal its cause.

When I evaluate precinct-level election data, I pay close attention to the performance of pro-regime parties, turnout, invalid ballots, and if available, other features like mobile ballot box use. In addition, I compare the distribution of digits to a Benford-type distribution. Several decades ago, Frank Benford re-discovered an interesting property of digits in naturally occurring datasets: ones are the most common first digit, with the probability of a digit being first declining logarithmically. His work has informed the accounting literature, as well as political science. Unfortunately, some properties of election data undermine the application of Benford's Law, such as the presence of zeros (Benford does not account for a zero as the first and only digit), and precinct size (precinct size varies, and it determines the "available" digits). Despite these complications, I have compared data from other post-Soviet states to the Benford distribution and found interesting results, especially in Ukraine.

While I have not performed a full and systematic analysis of the data (having only acquired it last night), the initial scan of data suggests that there is no "smoking gun." The first and second digit Benford test on the results for the PCRM reveals no major issues. While the number of ones is low (significantly lower than anticipated by Benford), the distribution of PCRM first and second digits is not statistically different from the Benford-type distribution. The PCRM performs exceptionally well in several precincts: it received above 90% in eleven precincts. But, no precincts report 100% for the PCRM (in many questionable elections, I have found precincts with regime support at 100%). Some precincts report extremely high turnout, based largely on voters added through a supplemental list (some of these are polling places outside of the country in embassies or consulates). In these precincts, the average vote for the PCRM was 35% - below what it received nationally. The rate of invalid ballots is not high (the mean is just above 1%). The highest invalidation rate was 9%; the PCRM received 43% of the vote in that precinct.

Tomorrow's recount will be an interesting, and unprecedented, exercise in the post-Soviet world. The opposition has decided to boycott the recount, instead hoping to have voter lists re-evaluated. The opposition claims that "dead souls" and other illegitimate persons were on the voter lists, allowing the PCRM to inflate its results. Based on the precinct-level data that has been released, evidence of large-scale ballot box stuffing is not strong.

No comments:

Post a Comment