Summary and Conclusions
The EU referendum on 23rd June is attracting a great deal of media attention. With it has come a continuous flurry of polls from the mainstream polling companies, as well many informal polls in online newspapers. There is a very large discrepancy between the formal polls and the informal newspaper polls. The former is showing Leave and Remain very close, with Leave usually slightly ahead, whilst the latter is showing results in the order of 70% – 30% in favour of Leave. This post specifically examines the IpsosMORI poll of 18th May 2016.
The IpsosMORI poll of 18th May 2016 was checked for demographic characteristics and these were compared to standard or known proportions in the population. The errors involving over or under-representation in the sample were found to have a mean of 22.4% above the population mean (P<0.05), with 95% confidence intervals of 1.025 to 1.419. These errors are significant and are heavily biased in favour of Remain.
The balancing of the many demographic characteristics to achieve a truly representative sample will be very difficult in the very small sample sizes which are common in the usual kind of phone and online polling for a referendum such as the one on 23rd June.
As a result of these demographic biases, both formal phone and online polling will have in-built biases in favour of Left, Centre-Left and ‘Liberal’ social and political attitudes. These biases will skew the headline result of such polls towards Remain. The only known way of compensating for many such variables in a survey is to increase the sample size. This post does not calculate the minimum sample size necessary, but from casual polls in online newspapers and the like, it would seem that a sample size of 10,000 is likely to reduce sampling error to an acceptable degree.
One month on from my last post on the subject of the polling for the referendum, many more polls have been published by the official polling companies such as YouGov, IpsosMORI, ComRes and others. In general, these are still showing Leave vs Remain as being neck and neck, but with a tendency for the gap to be narrowed and some even pushing in favour of Brexit. In the last couple of days, more polls seem to be swinging back again towards Remain. These official polls are a mixture of phone interviews and online questionnaires, dominated by the latter. The online polls usually have a sample size of about 1000 – 2000, and the phone polls of 1000 or less.
This poll has just been published by IpsosMORI on 18th May 2016 and headlined in the Daily Mail. It is based upon fieldwork carried out between the 14th and 16th May by telephone with 1002 respondents. The weighted results are 48% for Remain and 35% for Leave. Their unweighted sample showed a response of 53% for Remain and 32% for Leave.
There is a lot of debate amongst the polling companies and political geeks about whether telephone or online surveys are more the more accurate. Sturgis et al. said there was a small difference, but that they were not significant. YouGov argue that they have adjusted their methodology to bring greater accuracy to their online polls. They have some reason for confidence as they were spot on with their forecasts for the London Mayoral election. Their latest poll shows a lead for Remain of 4%.
During this last month, it has become increasingly apparent that the official polls are showing a considerable disagreement with the casual polls in the online newspapers. Yesterday, I checked one of the polls conducted by the Daily Telegraph by clicking onto “leave”. It said that 72% also wanted to Leave and that no fewer than 355,291 people agreed with me. This implies a total sample size of 493,460 – a whisker short of half a million. This pattern of 70 – 30 in favour of Leave is repeated across most of the national newspapers as well as a small number of local newspapers I have looked at. The sample sizes in these informal online polls are often huge – in excess of 10,000.
So what is going on?
This post examines the intrinsic sampling biases within the IpsosMORI poll above to see if it is possible to separate the wood from the trees.
Method and Results
In 2011 the BBC conducted an online survey into class structure in Britain. It was conducted online and there was a huge interest from 161,400 respondents. This was known as the Great British Class Survey or GBCS and has been initially analysed by Savage et al.
“The web survey was launched on 26 January 2011, with extensive publicity across BBC television and radio, and newspaper coverage. Responses were enthusiastic and, by July 2011, 161,400 complete surveys had been submitted. However, examination of the data revealed that the GBCS web survey suffered from a strong selection bias, with respondents being predominantly drawn from the well-educated social groups. As one BBC journalist told us when we reported this problem, ‘yes, you seem to have got a typical BBC news audience there’. To address this problem, the BBC therefore agreed to conduct a separate, nationally representative face-to-face survey using identical questions. This survey, with 1026 respondents, was conducted using quota sampling methods by the well-known survey firm GfK in April 2011. Tests from its field division and by ourselves indicate that its demographics are nationally representative. In this article we refer to the nationally representative survey as GfK and the web survey as GBCS.”
Table 1. Reproduced from Savage et al.
Putting this table into graph form gives Figure 1, where the descriptive categories have been converted into numbers by Excel:
Here, the biases mentioned by Savage et al. can be seen clearly, where the professional classes are heavily over-represented and the lower classes are under-represented. If the GBCS had been exactly representative of social structure in Britain as a whole, the graph would have been a horizontal straight line of value 1. Anything above 1 means that that class is over-represented; anything below means that it is under-represented in the survey. In other words, this graph shows the biases of the GBCS sample.
IpsosMORI classify the respondents to their surveys into four demographic categories: AB, C1, C2 and DE. These are compressed categories derived from the following standard demographic categories:
Table 2: Standard demographic classifications.
By combining the nine categories in Table 1 into the six standard demographic classifications and then further compressing them into the four categories used by IpsosMORI, the likelihood of responding to the GBCS survey becomes:
Figure 2: Likelihood of responding to GBCS (compressed social classes).
Note that the R² value in Figure 1 is 0.696 indicating a good fit for the polynomial curve; whilst the R² value in Figure 2 is 0.984 indicating a near perfect fit. This suggests that the compression of the social classes has been conducted correctly.
In this curve, social classes C2 and DE are under-represented in the GBCS survey, C1 is about right because it is close to a value of 1; and AB are over-represented by 1.8 times.
This method of deriving ratios from the comparison of known proportions in the real population with proportions in a survey such as the GBCS, suggests that a similar method could be used to compare real or known data with poll data to determine demographic sampling biases by the polling companies.
When this is done for the IpsosMORI poll of 18th May 2016, by social class, the following curve is the result:
This shows that the IpsosMORI sample was correct for the ABs as it falls exactly on a ratio value of 1. Their sample over-represented C1s and C2s by 1.5 and 1.8 (50% and 80%) respectively; and under-represented DEs by 0.8 (20%).
When this same principle is applied to age groups, the following graph is produced:
This shows that age classes 35 – 44 and older are all over-represented, whilst 18 – 24 year olds are under-represented. The methodology of pollsters ringing people at home, during working hours on a weekday will inevitably yield a preponderance of retired and part time or unemployed workers.
Using this technique for other demographic categories used by IpsosMORI, gives a series of ratios which are summarised in the following:
Table 3. All Ratios for all demographic categories.
This sample size of 18 ratios allows the calculation of 95% confidence intervals giving ratios of 1.0253 to 1.4194. In other words, the true mean of the biases in the IpsosMORI sample could fall in the range of approximately 0% to 40%. The calculated mean is 22.4%.
It is important to stress that the 95% confidence intervals of up to a 40% potential bias or error is for the demographic categories which have been tested. It is not a direct measure of the accuracy of the headline outcome of the survey (in this case a 13% lead for Remain).
From this table, several statistics stand out as having large potential impact upon upon the headline conclusions of this IpsosMORI poll:
- Labour and Liberal Democrats are over-represented by 30% and 50% respectively.
- UKIP are under-represented by 30%.
- Workers in the Public sector are over-represented by 120%; whilst private sector workers are under-represented by 20%.
- Full time workers are under-represented by 20% and part time, retired or unemployed are over-represented by 20%.
Whilst it is not possible to gauge how much each of these demographic characteristics will separately affect the headline percentages of the IpsosMORI poll, it is reasonable to say that the severe under-representation of UKIP voters; the over-representation of Labour and Lib Dem voters; and the very severe over-representation of public sector voters will skew the results considerably in favour of Remain.
If it is assumed that demographic characteristics of the electorate influence the way they will vote, then the accuracy with which a polling company profiles its respondents is vital. If the characteristics of the sample do not match those of the underlying population, then the results of the poll are unlikely to reflect voting intention accurately. We know, for example, that age affects likelihood of voting turnout, where the 18 – 24 age group are least likely to vote and above 55 year olds are most likely to vote:
Figure 5. Voting turnout by age class (Source: IpsosMORI)
Likewise, younger people are known to be more likely to vote for Labour than their elders. For a post-election discussion as to where votes for particular parties lie, see this IpsosMORI article. However, although demographic characteristics may influence the way people vote, they do not determine it. It cannot be said, for example, that someone in social class D will necessarily vote for Labour (although they may be more likely to do so).
YouGov have recently conducted research which shows that the more highly educated people are, the more likely they are to vote Remain. So they have adjusted their samples to reflect the weighting of educated people within the general population and claim that this correction is sufficient to overcome the sampling biases. They may be right and maybe this is all that is needed. Their online polls prior to the London Mayoral elections were absolutely spot on, so this adjustment gives YouGov good reason to be cheerful. Furthermore, biases towards the AB social classes can be weeded out very easily using online polls, because the pollsters know their panelists very well.
But there is still the problem, pointed out by the Savage et al. paper, in that online panels are self selected. Even if a polling company weights their respondents, there is still a hidden problem pointed out Savage:
“Furthermore, comparisons with the Culture, Class, Distinction (Bennett et al., 2008) study showed that it was not possible to simply weight the GBCS to deal with these skews because the GBCS respondents from routine classes turn out to be highly unrepresentative of their peers in these classes. They possess relatively more cultural and social capital than their peers, and indeed the very act of participating in the GBCS was a ‘performative’ way of claiming cultural stakes [my emphasis] (as discussed by Bourdieu, 1984; Skeggs, 2004). Let us be clear, therefore, that GBCS alone cannot be used to derive a representative model of class.”
The parallels between the respondents to the GBCS and polling panels seem to be clear. From all social classes, polling panelists will be confident of their own beliefs (after all, they are regularly consulted about what they think of a brand or a political party). This means that they are less diffident about their lives and the society in which they live. All this points towards a largely liberal perspective on life.
There may even be two further problems with self selected polling panels, and which are not discussed or even acknowledged by the polling companies. The first is that hard left wingers are taught to ‘stack’ an audience – a problem known to occur in the BBC Question Time audience. This means that even though they may know each other, they spread out in the audience and make as much noise as possible. This gives the impression of the whole audience being much more united in an opinion than they really are. This technique is used for public meetings and will mean that an otherwise minority view will often prevail over the majority. The Left are extremely good at making it sound as if they are more numerous than they really are.
The second problem is the possibility that some people will like doing polls so much that they will join more than one polling panel. It is completely unknown how big this problem is, but if it is widespread (and I suspect it is because of the ‘stacking’ phenomenon) then online polls could potentially be all polling essentially the same panel. That is, not just people with similar views, but the same individuals. This is one possible reason for the similarity between polling results by the different companies. To my knowledge, online polling companies do not test or correct for this problem.
So this leaves the phone polling which, if conducted properly, is theoretically the only way of getting a truly random and representative sample. Except that it isn’t. Apart from the difficulties of getting a representative sample during weekdays and working hours, when many people are at work (Some polls may correct for this by focussing their phoning during the evenings) there is still the issue of a different type of self selection, which is similar to the kind of self selection that happens in the online polls. This takes the form of a willingness to answer a lot of highly intrusive questions about one’s life and finances to a total stranger over the phone. With increasing wariness over phone based selling and downright scams, many people are immediately suspicious of plausible sounding patter. Most will refuse to answer questions and put the phone down. Against this background of the politically reticent refusing to answer questions, the phone poll becomes a sample of those who are politically motivated. This is probably the root cause of many of the biases detected in the IpsosMORI poll analysed above.
Add to this the normal reticence that many people have about sharing their political views with strangers (or even relatives, friends or colleagues) and it becomes apparent that there is a big problem with formal, properly scientific polling methods.
Despite all of these problems, the polling companies think that they can get within two or three percent in a General Election. However, the forthcoming referendum is not a General Election. In British politics, referenda are very rare events. Furthermore, the subject goes right to the heart of how we see ourselves as a sovereign country and independent people. The issues of economic, demographic and democratic concern cut right across party politics, but tend to do so unevenly. For example, most members of the Conservative Party are known to be highly Eurosceptic. Meanwhile, the intellectual Left are more inclined to be Europhiles. The working classes, normally thought of as ‘belonging’ to the Left, are more likely to be patriotic and suspicious of foreigners. A referendum on Brexit is therefore much more difficult to call.
Add to this the fact that nearly all of the casual online polls in the newspapers and other places are almost without exception, heavily in favour of Leave with sample sizes in the tens of thousands. It will be argued by the polling companies, that these are all ‘unscientific’. Furthermore, they all suffer from the criticism above, in that they are all self selected. Some of them can be criticised because they do not prevent people from making multiple votes by logging in and out of the poll. Some of them do actively weed out multiple voting because they identify e-mail addressed and prevent multiple voting, but even so, they are still self selecting. Nevertheless, the huge sample sizes of these online polls should cancel out any demographic or educational bias. But what is absolutely clear is that they are polling a very different sector of society; and that that sector is in favour of Leave by a margin of approximately 40%.
The principal difference between the casual online polls and the formal polls is that people do not have to give away lots of information about themselves before they get a chance to vote. Furthermore, they get an instantaneous result.
People I have spoken to are adamant that most of their friends and colleagues are for Leave. I have even spoken to one Remain campaigner who thinks Leave will win, because of the greater passion of the ‘outers’. And therein lies a clue as to what seems to be going on below the radar. There seems to be a seething discontent with the status quo in British politics today. Almost everywhere you look, there is contempt for many politicians who are seen to be careerist and self-serving. The more arrogant of the media élite will dismiss this phenomenon as ‘populism’. They will continue to regard the rest of us as vulgar, unwashed, uncircumcised, knuckle dragging proles whose views are rarely sought and needed even less. But I feel that things are changing.
This referendum is talked about in pubs and discussed at work (places where people would never normally discuss politics). There are debates all over the country. Twitter is alive with those on both sides commenting about it. I have never before seen this level of political engagement amongst ordinary people. Something is happening that is not being picked up by the pollsters and certainly not David Cameron and his team. The reticent are beginning to speak. And I think that, in the quietest and most unassuming way characteristic of the British way of doing things, they are angry – and they are going to vote Leave in very large numbers.
And the only place where this is being picked up is in the casual online newspaper polls with huge sample sizes.