Tuesday, October 4, 2016

#PredictiveCOL - Forecasting Colombia's peace plebiscite (final update)

For sure, this is the more exciting forecast I have ever done. On one hand, I am Colombian guy, and I really want to live in a peaceful country, and I do want a better place for raising my children. On the other hand, I am very serious when it comes to forecasting.

Maybe you have read this blog before, and you have realized that I love predictions cuz, as a data scientist, you can use your own set of statistical methodologies that relates to voting intention and turnout. If you think genuinely, everything counts while forecasting this kind of processes. For example, you may weekly track people's security perception, how much the president is disliked, how many minutes newscast spend about the peace process, how many articles or opinion columns are written, how many tweets are sent per week about FARC, everything, everything counts and matter. Let us name these variables as contextual variables

Of course, polls and pollsters are also important. They are the only proxy of voting intention. However, polls get old, and they should be updated with new data; also some pollsters are not as confident as others (I really do not believe everything some pollsters claim), and sample size also matters cuz it improves the sampling error. 

Remember that this plebiscite has two options for Colombians to choose. The first option is Yes, that goes for Yes, I approve the agreement between Farc and government. The second option is No, meaning No, I don't support that agreement. With those topics in mind, let me introduce you my (bayesian) predictions for the Colombian plebiscite. Firstly, let me exhibit the trend that the two options have shown over time:
As you can see, we have deflated the data by undecided voters. Note that we extracted the signal (bold lines) from the noise generated by polls. The following graph shows the less robust prediction based only on what polls have estimated.

Now, a more reasonable forecast based on prior information (2014 presidential elections and 2014 legislative elections) that tries to explain the outcomes of the polls by modeling the extracted signal (see trends in previous graphic) with contextual variables. After the model is fitted, we use a Bayesian setup that relates the prior information with the estimated response from this model. The following chart shows this forecast.
Ok, it is clear that Colombian people will support this peace process. However, this election is legitimate if and only if Yes voter turnout is greater than 4.4 million. By using a similar Bayesian methodology, next graph shows the predicted turnout.
Also, we used a small area modeling to forecast the response of every Colombian department (equivalent to a state in the US). The following map shows that the majority of departments are supporting the agreement between FARC and government. However, there are some of them that will vote No. Dark areas are not supporting the deal, while light-gray areas will do support the agreement.
Finally, the posterior probability that Yes defeats No is 98.8%.