Analysis of Georgia's Presidential Election
Last Sunday, Georgia held presidential elections that were generally praised as competitive, free, and fair by international observers (see, for example, the OSCE preliminary report, PACE statement, IRI press release, and NDI preliminary statement).* Analysis of the data was delayed by the CEC's initial decision to post only images of protocols and not the raw results in easily readable electronic form (It appears that full results are now posted on the official site along with the protocol images). JumpStartGE set up a crowdsourcing effort to convert the images to usable data, and the data were completed a few days after the election.** The data include results from 3,655 polling stations in Georgia and 52 outside of the country.
Not only is it valuable to assess Georgia's data on their own, but it is also instructive to compare them with other elections. Several indicators in Georgia differ from elections held in the South Caucasus region this year that were assessed less favorably by the observer community.
Turnout
The election protocols provide three time points for turnout: noon, 5pm, and final turnout.*** The distribution of the polling station data are displayed below. Turnout is near-normal in its distribution in all three time periods, with the mean shifting right and variance increasing over time. Notably absent is the "hump" in the right side of the tail (which I pointed out as suspiciously present in Azerbaijan's data).
Comparing turnout to candidate performance suggests moderate tendencies for Margvelashvili to perform better at higher turnout levels, and the opposite for Bakradze (Burjanadze's data suggest no trend). As I noted in the post about Azerbaijan, this outcome could be produced by legitimate or illegitimate mobilization, or other methods. However, the data do not reveal other markers of engineering, notably polling stations with perfect attendance and complete, or near-complete, support for a single candidate. Variation in performance is also reasonably wide, whereas it was much more limited for the leading candidate in Azerbaijan.**** In the Armenian case, the slope was more pronounced, and outcomes favored the pro-regime candidate. While Margvelashvili won a substantial victory, his performance varied across polling stations.
Effects of Turnout, Invalid Ballots, and Polling Station Features
Regressing the results for the three main candidates on polling station-level explanatory variables can show how multiple features are associated with performance. I used the proportion of vote received by Margvelashvili, Bakradze, and Burjanadze as dependent variables, and turnout, the proportion of invalid ballots, the natural log of polling station size, and participation by "special voters" as explanatory variables. As noted elsewhere, turnout could affect performance because of legitimate mobilization or other factors. Ballot invalidation should not be associated with candidate performance as one would not expect it to be systematically related to a candidate but rather random errors by voters.***** Polling station size could matter in a couple of ways. Smaller polling stations are more amenable to pressure on voters as officials are more likely to know individuals and they are also more likely to be in village settings (or special precincts). However, polling station size could also serve as a proxy for rural/urban location which could be connected to legitimate variation in candidate support. Special voter participation is indicative of mobile voting, and these voters are potentially vulnerable to coercion. Like station size, an alternate explanation is that special voters also serve as a proxy for the elderly and disabled who are more likely to request special conditions (and these features could also be related to legitimate candidate support). In short, the variables may have more or less benign interpretations associated with them.
Because the data include outliers, I assessed the results in several ways (standard OLS with and without the outliers, robust regression, and tobit). The significance of the coefficients and their signs did not vary across the models (except in one case noted below). For Margvelashvili, final turnout and the log of polling station size is positively associated with performance; the proportion of invalid ballots is negatively associated with performance. Special voting is not statistically significant.
For Bakradze, turnout is negatively associated with performance and the proportion of invalid ballots is positively associated with performance. Polling station size is negatively associated with performance, but is not statistically significant in two of the models. In the Burjanadze models, only polling station size had a statistically significant coefficient, and it was negatively associated with performance.
The substantive effect is not particularly large for any of the coefficients in the assessment of Margvelashvili's vote (and the model included outliers). The first figure below shows the predicted effects of invalid ballots on the expected votes for Margvelashvili. While an increase in invalid ballots is associated with lower levels of performance, the upper end of the range is unlikely to occur. The mean outcome for invalid ballots at the polling station level was 1.8% (s.d. 1.7), with a range of 0 to 49.6% (The high end is an outlier worthy of further investigation).
The second figure shows the predicted outcome for Margvelashvili as the natural log of polling station size varies. Logging flattens the results, but nevertheless, polling station size has a small effect on outcomes. I did not even place the figure showing the effects of turnout on outcomes in this post because they are equally unimpressive substantively.
In short, while some features that could raise eyebrows are statistically related to candidate performance, their substantive effects are small.
Distribution of Digits
I have noted in previous posts that assessing the distribution of digits can be instructive in uncovering anomalous results. The distribution of the final two digits exceeds expectations from 1-1 on up, declining to around 4-0 where it returns to the expected range. The magnitude of the discrepancy (with 3% or so of the results at 1-1 whereas we would anticipate around 1%) is lower than in Azerbaijan (where 1-0 was over 7% of the results).
Summary
The international community has praised the election process while noting areas for improvement. The initial assessment of data aligns with these observations. Many of the troubling signals from polling station data in other elections seem to be absent, or at a lower level of magnitude, in Georgia.
===================================================================
* The organizations made recommendations for improvements and expressed concerns as well, but the overall sentiment was positive.
** I participated in the effort, entering data for around thirty polling stations.
*** I restricted this part of the analysis to polling stations reporting 100% or less turnout. Ten polling stations reported higher than 100% turnout. Two were overseas stations and two were small (under 50 voters). The remaining six deserve additional scrutiny. It is likely that absentee certificate use elevated turnout in at least some cases.
**** Recall that Ilham Aliyev received over 50% of the vote in every polling station. Margvelashvili's results are more widely dispersed.
*****We could come up with a causal story, relating voters more likely to cast invalid ballots (less educated and/or older) with a specific candidate. But, this causal chain requires a bit more evidence than can be mustered with polling station data.
Not only is it valuable to assess Georgia's data on their own, but it is also instructive to compare them with other elections. Several indicators in Georgia differ from elections held in the South Caucasus region this year that were assessed less favorably by the observer community.
Turnout
The election protocols provide three time points for turnout: noon, 5pm, and final turnout.*** The distribution of the polling station data are displayed below. Turnout is near-normal in its distribution in all three time periods, with the mean shifting right and variance increasing over time. Notably absent is the "hump" in the right side of the tail (which I pointed out as suspiciously present in Azerbaijan's data).
Distribution of Turnout, Georgia PECs |
Proportion of the Vote by Turnout |
Effects of Turnout, Invalid Ballots, and Polling Station Features
Regressing the results for the three main candidates on polling station-level explanatory variables can show how multiple features are associated with performance. I used the proportion of vote received by Margvelashvili, Bakradze, and Burjanadze as dependent variables, and turnout, the proportion of invalid ballots, the natural log of polling station size, and participation by "special voters" as explanatory variables. As noted elsewhere, turnout could affect performance because of legitimate mobilization or other factors. Ballot invalidation should not be associated with candidate performance as one would not expect it to be systematically related to a candidate but rather random errors by voters.***** Polling station size could matter in a couple of ways. Smaller polling stations are more amenable to pressure on voters as officials are more likely to know individuals and they are also more likely to be in village settings (or special precincts). However, polling station size could also serve as a proxy for rural/urban location which could be connected to legitimate variation in candidate support. Special voter participation is indicative of mobile voting, and these voters are potentially vulnerable to coercion. Like station size, an alternate explanation is that special voters also serve as a proxy for the elderly and disabled who are more likely to request special conditions (and these features could also be related to legitimate candidate support). In short, the variables may have more or less benign interpretations associated with them.
Because the data include outliers, I assessed the results in several ways (standard OLS with and without the outliers, robust regression, and tobit). The significance of the coefficients and their signs did not vary across the models (except in one case noted below). For Margvelashvili, final turnout and the log of polling station size is positively associated with performance; the proportion of invalid ballots is negatively associated with performance. Special voting is not statistically significant.
For Bakradze, turnout is negatively associated with performance and the proportion of invalid ballots is positively associated with performance. Polling station size is negatively associated with performance, but is not statistically significant in two of the models. In the Burjanadze models, only polling station size had a statistically significant coefficient, and it was negatively associated with performance.
The substantive effect is not particularly large for any of the coefficients in the assessment of Margvelashvili's vote (and the model included outliers). The first figure below shows the predicted effects of invalid ballots on the expected votes for Margvelashvili. While an increase in invalid ballots is associated with lower levels of performance, the upper end of the range is unlikely to occur. The mean outcome for invalid ballots at the polling station level was 1.8% (s.d. 1.7), with a range of 0 to 49.6% (The high end is an outlier worthy of further investigation).
Predicted Outcome for Margvelashvili, Varying Invalid Ballots |
The second figure shows the predicted outcome for Margvelashvili as the natural log of polling station size varies. Logging flattens the results, but nevertheless, polling station size has a small effect on outcomes. I did not even place the figure showing the effects of turnout on outcomes in this post because they are equally unimpressive substantively.
Predicted Outcome for Margvelashvili, Varying Polling Station Size |
Distribution of Digits
I have noted in previous posts that assessing the distribution of digits can be instructive in uncovering anomalous results. The distribution of the final two digits exceeds expectations from 1-1 on up, declining to around 4-0 where it returns to the expected range. The magnitude of the discrepancy (with 3% or so of the results at 1-1 whereas we would anticipate around 1%) is lower than in Azerbaijan (where 1-0 was over 7% of the results).
Distribution of the Last Two Digits |
Summary
The international community has praised the election process while noting areas for improvement. The initial assessment of data aligns with these observations. Many of the troubling signals from polling station data in other elections seem to be absent, or at a lower level of magnitude, in Georgia.
- Turnout data approach a normal distribution, with the mean and variance changing over time in ways that reasonably conform with "normal" election processes.
- While some candidates perform better/worse at higher levels of turnout, the data are widely dispersed and do not show suspicious results where perfect (or near-perfect) turnout and vote outcomes for a single candidate converge. Moreover, in the multivariate analysis, the substantive effect of turnout was small.
- Other features, such as the proportion of invalid ballots and polling station size, are associated with performance for some candidates. But, the effects are substantively small.
- The distribution of digits shows some evidence of anomalous outcomes, but the scale of the anomalies is not large.
===================================================================
* The organizations made recommendations for improvements and expressed concerns as well, but the overall sentiment was positive.
** I participated in the effort, entering data for around thirty polling stations.
*** I restricted this part of the analysis to polling stations reporting 100% or less turnout. Ten polling stations reported higher than 100% turnout. Two were overseas stations and two were small (under 50 voters). The remaining six deserve additional scrutiny. It is likely that absentee certificate use elevated turnout in at least some cases.
**** Recall that Ilham Aliyev received over 50% of the vote in every polling station. Margvelashvili's results are more widely dispersed.
*****We could come up with a causal story, relating voters more likely to cast invalid ballots (less educated and/or older) with a specific candidate. But, this causal chain requires a bit more evidence than can be mustered with polling station data.