This
is labeled as "Part 1" because if more email correspondence exists
between me and various pollsters, I will add them as further posts. Of course,
if pollsters are afraid to respond, I will write about that, too. Sorry, Mr.
Cooper... but this is my version of "Keeping Them Honest." :-)
There
has been much written about the skewing or "oversampling" of democrat
respondents in recent polls. I wrote about how I believe this is a form of
psychological voter suppression being employed by the media:
http://loudmouthelephant.blogspot.com/2012/09/voter-suppression-look-no-further-than.html.
In short, the oversampling of democrats in polls allows the results to be
skewed in favor of Obama, promoting a "Romney can't win this race"
situation in an effort to get republican voters to stay home since their
efforts would be futile. CNN wrote a "debunk" piece of my theory
here: http://politicalticker.blogs.cnn.com/2012/09/28/analysis-polling-criticism-unfounded/.
And yes, I have been skeptical of these polls before usual the conservative
pundits got on board.
The
interesting thing about CNN's article: they didn't actually debunk the theory.
They simply said, "well, there is no " conspiracy" (though I
would never call this a conspiracy - more like an effort); look at the results
of these polls," and they simply cited MORE skewed polls! Ugh. If they
would have given some examples of how the polling sample best reflects the
electorate or recent party affiliation trends, their case would have held more
water. They didn't, and in fact, they then released another ridiculously
over-sampled pro-democrat poll: http://i2.cdn.turner.com/cnn/2012/images/10/01/rel11a.pdf
(37% D - 29% R). There is no information anywhere that shows this is how the
country's party affiliation is spread. Shame on CNN for skewing another poll.
Shame on CNN and their attempt at debunking my theory :-) Using the very thing
being questioned as the rebuttal of the question is purely circular logic, and
it's quite silly. I digress...
Moving
on, I have kept an updated database of every poll I can analyze:
http://loudmouthelephant.blogspot.com/2012/09/lme-presidential-poll-tracker-database.html
Today,
while doing my latest round of poll deboggling, I came across this poll: http://weaskamerica.com/2012/10/02/horse-races/
It
shows that though Mitt Romney is leading 50.7% to 35.3% among independents,
somehow Obama has a 10.5 point lead in Nevada?! What?! I had to question these
results, so I emailed the polling company We Ask America. The following email
chain is the whole reason I am writing this post (personal information has been
removed). Keep in mind, as you will see by my tone, I'm cordial and respectful.
This was not a "gotcha" attack; I just wanted to understand the
numbers:
Good
morning,
My
name is Michael XXXXXXX. I'm the chief contributor for the blog The Elephant in
the Room (www.loudmouthelephant.com). I'm writing to learn of the sample you
used for your Nevada poll this morning. The poll's results claim a 10.5
percentage point lead for Obama (52.5% - 42%). Though you did not release the
information, my estimates for the voter demographic breakdown show the poll
sampled approximately: 42% D - 28% R 30% I. Is this true? Could you please
email me the voting demographic breakdown? I can be reached at
loudmouthelephant@gmail.com Thank you.
Respectfully,
Michael XXXXXXX
The
First Response:
Mr. XXXXXXX:
Our breakdown of Party ID in this poll was actually: 38% D - 37% R - 26% Ind. (rounding errors result in 99%). The real difference in Nevada is the relatively high number of Republican who are supporting Barack Obama. That group is heavily concentrated in women from 45-64, a key voting demographic and move numbers despite Romney having a sizable lead among Independents. As we've written before, we find those self-assigned party affiliation numbers to be a bit unreliable and use 60+ other criteria in our weighting.
Regards,
Gregg
XXXXXXX
We
Ask America™ Polls
Naturally, I wrote back:
Gregg,
Thank you for getting back to me. I truly appreciate it.
I'm having a bit of trouble understanding the results of your poll. You stated that the poll's breakdown was 38% D, 37% R, and 26% I. I understand you said you weight things differently, but regardless, on sheer party affiliation, I'm not getting this to line up closely.
Below is my math; please let me know where I'm wrong (I've rounded properly):
Out of the poll's reported 1,078 likely voters:
38% D = 410
37%
R = 399
26%
I = 280 ( I know this doesn't total 1,078, but that creates only a small
issue).
Based on the "which party voted for whom" part of your results seen here: http://weaskamerica.com/2012/10/02/horse-races/ - the race should look like:
OBAMA:
D = 410 x 86.4% = 354 votes
R
= 399 x 18.7% = 75 votes
I
= 280 x 35.3% = 99 votes
Total
Obama votes = 528 votes / 1,078 Sample = 48.98%
ROMNEY:
D = 410 x 11.8% = 48 votes
R
= 399 x 78.2% = 312 votes
I
= 280 x 50.7% = 142 votes
Total Romney votes = 502 votes / 1,078 Sample = 46.57%
This is nowhere close to the 52.5% O - 42% R results your poll shows. In fact, if there was an error, and it was really 38% D, 26% R and 37% I that was polled, the results would still be 50.8% Obama to 43.6% Romney. I cannot figure out how you got to the 10.5 point Obama lead you guys are claiming. Perhaps I'm wrong, and I'm open to hearing how. Please let me know.
Respectfully,
Michael
XXXXXXX
His Response:
Here's the problem: our weighting formula is extraordinarily complex and involves more than 5,500 fields of data that we feel help us hone in on the real picture. You're basing your analysis only on the numbers you've seen, while we're basing ours on a slug of data that has proven correct more times than not in the past. One of the problems all pollsters are having now is dealing with Party Affiliation in an atmosphere where people are shifting their loyalties like cars change lanes on a freeway. As I've written before, we don't use these self-described affiliations in our weighting. You're trying to line up numbers without having the full array of data. Plus, I've asked my data guy to look into the number of responses. We conducted 24 polls last night…this wouldn't be the first time we messed up the response numbers on our public polls. I double checked the result percentages, but not the responses. When you're a two-main operation, this stuff happens.
I'll get back to you.
Gregg XXXXXXX
Which he did while I was at lunch:
Here's an update:
I did indeed mess up the number of responses: it was actually 1,151 (I'll fix it online shortly). That doesn't help you much, though. This might:
Before weighting, the results showed the presidential race with a five point spread for Obama, and a four point spread in favor of Heller the Senate race. Our weighting moved the numbers as we reported.
Could
it be wrong? Absolutely. We'll go back into the field soon to test it again.
But we're not giving up on our proprietary weighting system. With it, we were
recognized as the most accurate pollster in the nation during the primaries by
two independent groups.
Keep in touch; your ideas and review are refreshing and honest. (end of email, no sig)
My Response:
Gregg,
First, thank you again for corresponding with me. It’s nice to know that you’re willing to reply to an outsider. Also, I hope my email didn’t come off as facetious. I have no ill will, and I am not trying to be nasty or demeaning. I hope I didn’t come off like that. I simply keep track of the polls on my blog, and I breakdown the numbers as they’re reported and run them through my calculator.
It’s interesting to know that you use 5,500 fields for your polls. I’m really curious as to how your weighting algorithms work, but, as I’m sure it’s proprietary information, you probably keep your secret formula close to the chest. Either way, and again, I’m not trying to be “snippy,” I just find it hard to believe that with a 38/37/26 D/R/I spread, somehow the numbers come out to 52.5%-42% given the breakdown of how each party voted. I do understand that people’s affiliations change, but, for the sake of the “who voted for whom,” wouldn’t that be static for this poll? I mean, wouldn’t you report the “who voted for whom” as they are listed? Why the additional weights? For example, in your poll, 86.4% of democrats voted for Obama while 35.3% of independents did. How can it be weighted (with the 2% of republicans that voted for the President), that with the 38/37/26 spread, the total Obama vote comes out to 52.5%? Using purely “how they voted weights,” I came up with the aforementioned 48.98%. A result of 52.5% jumps up the “weighting” of your poll by 7.4% for Obama. Conversely, with Mitt Romney’s voting demographic, your “weighting” drops Mitt Romney’s results by 9.9% (your 42.0% result vs my 46.57% result). How can this be? What exists that, on top of the “how they voted part,” some other information shows that Romney’s results would drop by 10% while Obama’s would jump by about 7.5%?
I would never question the proprietary knowledge of your polls, and I’m not a pollster myself. I’m just curious as to what exists that causes you to use these weights. I also find it interesting that you’re a two-man operation and you’ve made it to RCP (that’s where I found your poll). That’s really pretty cool, and I had no idea about the size and scope of your operation. I own the blog The Elephant in the Room, but I essentially run a two-man show, too. I’m just an economist who typically, though not always, writes about economic/political issues. I get thousands of visitors a day, and many people recently have been flocking to my un-skewed polls database.
Thank you again for getting back to me. If you write back, that would be great. If not, I understand.
Very Respectfully,
Michael
XXXXXXX
He wrote back:
Michael:
There is no way for me to give you the answer you want without providing you our weighting matrix (and you've been a gentleman about not asking for it). I realize that you'll not find that to be a satisfactory answer and that it has resulted in you trying to put a jigsaw puzzle together without all the pieces.
While we've rechecked our processes on this poll, my data guy reminded me that this one may be an outlier, and that I questioned him about it when the results came in. Therefore, we're going to look at the possibility of getting back into Nevada soon to test it out. In the meantime, the upcoming debates could throw all these results to the wind.
Gregg XXXXXX
We Ask America™ Polls
My Final Response:
Is there anything you can give or show that explains how your company reaches these results without showing your proprietary algorithms? I think you can understand my concern: I use your 38/37/26 DRI spread and come up with nothing close to your results. Your poll doesn't explain how, and it essentially says "trust us." What prevents a polling company from doing a 33/33/33 DRI spread with equal "who voted for whom" results and publishing them as "Obama trounces Romney - trust us?" Again, I'm not being facetious, and that has never been my attempt; but I hope you can see where I'm coming from. I "numbers check" every poll, and I'm not sure what kind of weighting possibly exists to create the results you have. Is there anything you can share at all?
Very Respectfully,
Michael
XXXXXXX
I'm sure he will write back. So what did I learn? First, I truly appreciate Gregg getting back to me. He was polite, respectful, and he took nothing personal. He certainly didn't have to be. I've confronted many reporters and bloggers and many times I'm met with nastiness. This was not the case. In the tense, hyper-partisan world, he took no cheap shots, and he accused me of taking none. Secondly, it appears there is some sort of weighting system that is involved in polls. I don't know if this truly makes sense, though. Why does a weighting system exist? If you sample people as they are polled and find a large skew in the sample (plus 10 democrat sample), that's what you found. Does it represent the population as a whole? Probably not. I've checked my math over and over. I used We Ask America's numbers. I used their "who voted for whom" information. I used simple math, and what I found is a much different story. Will I be able to get at the heart of their weighting? No. My "weighting" thesis showed, as you can see in an original email to Gregg, a sampling mix of 42% D - 28% R - 30% I. You can check my math, too. Please tell me where I'm wrong.
This is nothing against Gregg personally, but I can't sit and accept "the poll is correct, our numbers work, so just believe us." I never have, and I never will. I will always investigate. Any further correspondence I receive from pollsters will be posted as well. Please share your thoughts below. Thank you.
Hi LME... What's left out of the scenario is WHERE in NV the people were polled. The difference between MOST of the northern and central parts of the state - basically everything north of Vegas (sans Reno) and the rest - is night and day.
ReplyDeleteThere's a sign entering Hiko (about 90 miles nw. of Vegas): 'Leaving Harry Reid Country'.
Unfortunately - while we, conservatives have many more square miles of territory - the other part has MANY more people - a VERY heavy union influence and TONS of immigrants both legal and illegal.
There is another explaination... after years of never being polled, I (finally) had a pollster call my cell last week. They asked for my zip code - and when I told them, they promptly dismissed me without asking any more questions.
If they're polling by zip code and dismissing those outside the 'chosen area', they could definitely produce a skewed sample that way.
Interesting post, and I admire your respectfulness and demeanor in looking into the issue further. Qualities all but absent in most of the partisan world of politics lately.
ReplyDeleteAs far as how these are weighted, I can't say with any kind of certainty what kind of factors are taken into account here, but I've had a fair amount of experience in statistical probabilities, algorithms and analysis.
While on the surface it may seem implied and even accurate to simply take a response from Person A of "Yes, I support Obama/Romney" and weigh it with the same equivalence of the same response for Person B... the inclusion of complex algorithms to assess the difference of a "Yes" from A vs. the difference of a "Yes" from B, is what separates the little from the big guys.
What is Person A's sex? Religion? Race? Voting history? State? City? District? Affiliation? Certainty? Age? Education? What time of day were they talked to? Are they employed? Etc... All of which can be used in countless ways to assess a 'weight' to the meaningfulness and certainty of that person's "Yes" resulting in a vote for the aforementioned politician come election time.
Ex: Is a "yes" from a 20y/o really as valuable as a "yes" from a 60y/o? Voting demographic trends would say no, and by a rather significant margin. The same would apply for female vs male, rich vs poor, etc.
Of course, polls don't typically go into that level of detail in the questions themselves. But you can very easily and rather accurately estimate many of the above demographic factors just from an approximate location, or that combined with certain answers.
I'm sure some if not many polls simply tally the amount of yes's vs. no's, and compare them plainly as such. But for the ones that don't, there is a near-limitless amount of complexity you could delve into to get the results as 'statistically accurate' as possible. And I'd wager that most of the bigger (certainly the more accurate ones) do this, which yeah, can make the reported results confusing (as you found).
I hope maybe that sheds a little light on it, at least from a statistical point of view. Though, I'm not too convinced I told you anything you didn't already know. :)
I too admire the professionalism. It's refreshing.
DeleteBut I must say, I agree with the author here. No matter what the polling agency says with their "we weight things, trust us but we won't prove it to you" stuff, I say "weights schmates."
The author points this out very well. YOU said it's 37-36-26. Using YOUR numbers how how each party voted, here is how the results should be. They're a lot closer, and yet you say they're 11 points apart. That takes some serious weighting.
Yes, I do believe this is part of the left wing's plan. They can report every poll as a big Obama lead and then say "trust us." I checked the author's math. He was right. If the poll is 37-36-26, and the parties voted how they did, the results are what the author said. What kind of spaghetti math is the agency doing? I hope America doesn't buy their crap. We are smarter than this.
Oh, I don't necessarily disagree with LME's point; more was attempting to paint the picture of how this is likely viewed and justified from a statistical aspect.
DeleteBut I can't really agree that this is all some big conspiracy though.... Particularly for smaller, lesser-known polls whose only chance of gaining traction and popularity is reporting the most accurate prediction possible.
The whole premise and success of a given polling website and whatever algorithms they use, is being the closest to predicting the actual outcome. If election time comes and they’re a significant margin off in their predictions, the site is likely toast and/or will lose a significant following (and thereby, revenue/profits/etc). I think it’d be a bit silly to say that every single polling website that isn’t Rasmussen/an ‘unskewing the polls’ blog/showing a GOP lead is part of some giant conspiracy at potential cost of their revenues/success, all so they can elect Obama? And why? Is he pioneering some giant government handout toward all polling websites? And aren’t many of these websites ran by entrepreneurs/small business owners in the first place?
And that still doesn’t answer how polling websites that have been accurate to within 1-5% of predicting the results in past elections, are suddenly deciding to abandon statistical accuracy and attempt to ‘win an election for Obama’ (again, for what reason?) in sacrifice of their record and credibility.
All of that said, I sure won’t try to deny that the potential for lies and corruption exists. I’m sure there are some polls that purposefully do mislead (for whatever reasons). Particularly ones sponsored by news websites that likely couldn’t care less or even be damaged much by it (since polling isn’t their main purpose). But to say that every poll does? A bit too extreme to me.
Rken - Good morning. I came back from a small vacation on just the right day: debate day!
DeleteI haven't made up my mind as to the purpose of the skew yet, but, I can certainly see why the thoughts exist. As is correctly pointed out time and time again, polls are over sampling democrats. In CNN's latest poll, it shows a 28% skew towards democrats. The question I ask is "why?" Why is it like this? Why not sample a group that resembles the US? I do lean towards the intentional effort of the media to skew the election. Can you provide another reason? The media IS liberal leaning. There is no way around it. Do a wikipedia search on media bias. Visit newsbusters.org who give countless examples of this. Just turn on the nightly news and listen to all the positive Obama stuff trailed by all the negative Romney stuff. To deny it is to deny reality. The main point is, why would they over sample like they are doing? Why why why? There are roots in LME's theory. It's as clear as day. Until a determining answer is given that explains something to the contrary, the more skewed polls that come out without rhyme or reason as to why they would skew so heavily for democrats shows me that LME is actually right.
http://battlegroundwatch.com/2012/10/01/hiding-the-decline-what-polls-over-sampling-democrats-reveal/
ReplyDeletehttp://news.yahoo.com/video/obama-leads-romney-polls-does-231300212.html
ReplyDelete