This post was jointly written by a guest contributor to ScIU, Rashid CJ Marcano Rivera @Rashido
The election is almost here and the election forecasters are in full swing. As of October 23rd, the Economist gives Biden a 92% chance of winning, and FiveThirtyEight has him winning 88 out of 100 “simulated” elections. How should we interpret these claims?
If you have a coin and you flip it a thousand times, and it lands on heads 500 times and tails 500 times then you may infer it has a 50% probability of landing on heads and a 50% probability tails. Sounds simple, except, we’re not going to run this election thousands of times, we’re only going to run it once.
Ultimately, the election has a determinate outcome. Either Biden will win, or Trump will win. Hypothetically, with perfect information, we should be able to predict exactly what happens. However, we do not have that information, so instead we have to develop our best guess using the available information. Or best guesses — there are many forecasting models and they don’t all make the same predictions.
Election forecasting makes predictions given past trends and available data. It often begins by developing a general expectation of what we think will happen based on factors such as the economic conditions and incumbency status. (Incumbents have an advantage). Some forecasting models stop at this step, and attempt to predict the election using only these fundamental factors. However, models like the ones cited above refine their initial expectation by factoring in current information about the election like polling data.
Imagine the polls show that Trump is ahead in Florida 51% to 49%. However, polls only question a small proportion out of the population of likely voters, so the true result if every likely voter was interviewed may be 50/50, or 52/48. Forecasters can average together multiple polls or correct for known biases to enhance the information provided by the polls. For example, if you know from the past 5 elections that Republican candidates in Florida tend to slightly outperform polling you might build that into your model. Forecast models can be enormously complex, including many, many factors, connections between them, and adjustments to those factors.
After being aggregated and corrected by the forecaster, polling gives insight into what would happen if the election were to occur today, but how can forecasters account for what happens in the future? Expected or predictable changes may be built into the forecaster’s model. For example, the forecaster may split current undecided voters evenly between the candidates for their prediction. The forecaster also builds in uncertainty. What if Hispanic support for Trump is higher than expected? What if the country swings further towards Biden? What if Trump gets unusually high support in Pennsylvania? This creates a pool of possibilities that reflect possible errors in measurement and different ways the situation could develop in weeks to come.
Finally, the forecaster uses computing to repeatedly “guess” from this pool of possibilities and simulate what might happen in the election if different events and conditions were to occur. The computer might guess Trump overperforms among young Hispanics, and Biden overperforms among religious older men, and Trump overperforms among Nebraskans, and Biden does as expected among Floridians. Or it might guess that Biden underperforms among Black religious women, and Trump does as expected among Midwestern White men, and Biden overperforms in California. This occurs thousands of times with thousands of randomly chosen permutations, where a permutation is just a different way the knobs and dials in the model can be set. Essentially, the forecasting model says here’s a bunch of ways we think the world might be, and here’s what happens.
To say that Biden wins 70% of the time is to say that he wins in 70% of the simulations (and that Trump wins in 30%). Oftentimes, this is assessed by just taking a sample, maybe 100, of the thousands of simulations. If Trump wins, it does not mean that the model was wrong. After all, Trump had a chance of winning as well. A better interpretation is that this model provides evidence, but by no means conclusive evidence, that Biden is favored to win the election.
First, as with all modelling, the outputs are only as good as the inputs. If you have very messy data, or systematic bias in the way your model is constructed, or if you miss important variables, such as courts throwing out mail-in ballots without the second privacy envelope, then that can undermine your prediction. Second, it can only adjust for sources of error we know. COVID–19, for instance, is a novel variable and thus it is very hard to predict exactly how it is going to impact the election. Third, close elections are simply hard to predict because they are easily influenced by small errors and changes.
Overall, election forecasting is extremely cool, but it has limitations. The simplest way for getting your desired outcome to the election is to vote! You can find resources about Indiana voting here and here.
Edited by Clara Boothby and Benjamin Greulich