Inspired by a friend’s rant in Facebook regarding the media’s way of presenting poll results and by a public lecture of Nate Silver at LSE, where similar problems were discussed, I decided to investigate the news regarding the popularity of the largest parties in Finland. I wondered, what the polls actually say about the popularity and is that reflected in the language used in the news.

I limit my mini-investigation to the four largest parties: *Social-democrats* (SDP), *National Coalition Party* a.k.a. Kokoomus (KOK), *Centre Party* a.k.a. Keskusta (KESK) and *True Finns Party* a.k.a. Perussuomalaiset (PS), because the polls usually only give the margin of error for the largest parties. Therefore, I can only try to estimate the accuracy of popularity of these parties.

I searched for a few latest polls in the two biggest news institutions’ websites in Finland: Yle.fi (the National Broadcasting Company) and HS.fi of Helsingin Sanomat (a broadsheet with the largest number of readers). In the end I chose seven polls, conducted between December 2012 and April 2013. Yle uses Taloustutkimus and HS TNS Gallup to conduct the analysis.

The figure clearly shows that if we consider the margins of error reported by Yle and HS, most of these polls cannot distinguish, which one of these four parties is the most popular one. However, arriving to this conclusion is not easy due to four reasons:

- Margin of error is only reported for “the biggest” parties, although I couldn’t find a definition of what “the biggest parties” mean. I assume it means these four, since there is a significant gap between these and the 5th largest party popularity in all polls. In the latest HS poll (24 April 2013) the margin was not reported at all.
- Yle polls report that the results of the polls are “more accurate than what the margin of error would imply” due to calibration methods they use, but this is not explained further. Therefore, I chose to ignore this information here.
- In order to understand, what margin of error means, one has to know that it actually is a 95% confidence interval and that this interval (in simple terms) means that “we are 95% confident that the value in the total population is within this range”. In other words, if we drew an infinite number of random samples from the Finnish population and asked, which party people support, in 95/100 cases we would get an estimate within the range of 95% confidence interval. In 5/100 cases the estimate would be outside the range. The interval is reported, since as long as we don’t actually ask everyone, we cannot be entirely certain that our estimate is correct. However, we can provide information on how certain we are. Wide range implies more uncertainty than a narrow range.
- Most importantly:
**the way these polls are reported in the media is almost entirely inaccurate**. I will elaborate this claim below.

The first poll (Yle 2 Dec 2012) was titled “Support for Centre Party in rapid increase“, which is not far from truth, since their popularity was 2.5%-points higher than in a previous poll and the margin of error was 1.4%-points. Therefore the overlap of the 95% confidence interval is marginal: the upper limit for the old poll’s estimate is 16.9% and the lower limit for this poll’s estimate is 16.6%. So, maybe their support actually has increased. Later they state that KOK is clearly the largest party, which is not the case, not at least clearly: looking at the figure we can see that the confidence intervals of KOK and SDP overlap. Therefore, the difference in their popularity might be only a coincidence (or noise) due to sampling. In another sample we might have gotten estimates presenting similar popularity for the two, or even higher percentages for SDP. The correct interpretation thus is: **KOK is more popular than PS or KESK, but not significantly different from SDP. SDP seems to be more popular than PS but we’re not confident that there is a significant difference between PS and KESK or SDP and KESK**.

The reasoning for all of the polls is similar, so I won’t present it in as much detail as above for the rest of the polls. Instead I’ll focus on the most inaccurate reports I found. If you’re interested in comparing these further, please take a look at the list of links below. As long as you have fluent Finnish skills, you should be able to do it based on this blog post.

“The Centre Party is the second most popular” (HS 24 April 2013)*. Really? Looking at the figure, we quickly observe that the popularity of all four parties is really similar. We cannot even distinguish between the most and the least popular among these four — all the confidence intervals overlap. Thus, one should say: **according to the latest poll, all** **the four biggest parties have similar levels of support**.

“The Centre Party is the most popular, SDP crashed” (Yle 29 April 2013). Guess what? I don’t agree with the statement. It indeed looks like KESK is more popular than SDP, but based on these results, it is impossible to say, whether it is KESK, KOK or PS, which currently is the most popular. The estimates differ only a tiny bit and the confidence intervals clearly overlap. The statement is not just exaggerating. **It is false**. What about SDP then, did it crash? Well, I don’t think so. The point estimate of their popularity is indeed lower than it has been in previous Yle polls in the figure. However, we have no way of knowing, if that was due to chance (remember the sampling thing!) or if there actually is a difference, since the confidence intervals overlap **in all but the earliest poll in 2 December 2012**. Therefore, my interpretation of this poll is: **there is little difference in popularity of KESK, KOK and PS, whereas SDP may be less popular than the three other parties and also less popular than it was in the beginning of December 2012**.

Okay, the rest of the interpretations you have to conduct yourself. I’m left wondering, why false information is so commonly presented in the media. Is it due to lack of understanding of statistics? Or maybe due to a belief that people are not interested in news, which do not rank the parties? Would it be so bad to truthfully say that we don’t know, which party among certain candidates currently is the most popular? I cannot think of a reason, which justifies false statements, when the journalist clearly should know better.

* *If there’s anything lost in translation, blame me, I was the one, who translated these from Finnish to English.*

### Links to original news articles

- Yle 2 December 2012
- Yle 30 December 2012
- Yle 8 March 2013
- Yle 24 April 2013
- HS 23 January 2012
- HS 27 March 2013
- HS 24 April 2013

### Margins of error:

- Yle 2 Dec ±1.4%-points
- Yle 30 Dec ±2%-points
- Yle 8 Mar ±1.3%-points
- Yle 29 Apr ±1.4%-points
- HS 23 Jan ±”less than 2″%-points –> I assumed 1.9%-points
- HS 27 Mar ±”less than 2″%-points –> I assumed 1.9%-points
- HS 24 Apr not reported –> I assumed 1.9%-points

This interpretation is actually erroneous as well, though to opposite direction. The thing that is forgotten is that the probability of the real value is Gaussian distributed with the quoted error value being 2-sigma values. In other words, the likelyhood that the real value is near the point value is much greater than it being near the ends of the quoted error range. And it’s possible that lies outside it as well. So, when the poll quotes +-1.4% as error value, there is a 68% chance that the value is within +-0.7%. For example, with the HS 24 April poll, it is very likely likely that KOK is the biggest party and SDP the fourth biggest. But KESK and PS are indeed too close to tell.

And with the Yle’s 2 December poll, there is about 70% chance that SDP is indeed bigger than KESK (their difference is bit over 1-sigma value, so it’s easy to estimate). The differences between other parties are bigger, so the likelyhood that their order is as reported is even greater. The morale here is that you just can’t look whether the reported error ranges overlap, you need to keep in mind that the distribution described by the range is Gaussian distribution and the ends of the error range are really selected for arbitrary value – the distribution continues also outside the ends of the reported error range.

Another point worth is about the changes of support. The error values in two different poll made by same organization are not independent of each other. In other words, if one poll gives a too big value for some party, the next poll by the same organization is also likely to give too big value. Where this is important is looking at the changes of support between two polls: it makes the margins of error far smaller for changes of support than it is for support for itself. So, for example the change reported in Yle’s 2 December poll for KESK is likely to be around the 2.5% that was reported, though the actual levels where the change happened might be above or below the point values reported. What is extremely unlikely is that KESK would poll significantly below it’s real support in one poll and then significantly above it in the next one – the errors tend to concentrate in the same direction.

Statistics can mislead one easily and polls are prime case of it. Media tends to ignore the errors completely, but if you only look at the reported error ranges without consideration of the Gaussian distribution of error behind it, you will err in the opposite direction.

I agree, Antero, thanks for clarifying this here. Usually, the error of claiming a difference, when there actually is no difference is considered a more severe one than the error to the opposite direction. Therefore, interpretation in academic research tends to be conservative, whereas it seems to me that in the media the opposite happens. However, I posted this in order to stress that any statistical estimation has a certain level of uncertainty, which is rarely reported in the media. Therefore, it is hard to understand, what the results of the polls actually mean unless one has some sort of statistical training (which probably is the case for most people). Thus, for most readers the results of the polls probaby seem more reliable than they actually are. In addition, the sampling and calibration methods used are rarely reported, which makes it hard to know how reliable these results are even if one was familiar with stats. Given the importance of polls especially before elections, I wouldn’t mind seeing more careful interpretation of the poll results in media in the future.

Apprreciate your blog post