In the article entitled “On Polling Hispanics Part I”, I discussed why using Hispanic surname sampling introduces bias into polls of Latino voters. Based on work I had done for clients like the Obama presidential campaign and the Annenberg Policy Center, I found that later generation Latinos were less likely to be identified on political voter files as having Hispanic surnames and, in states with Mexican-American Latino populations, more likely to be Republican.

This finding made sense to me, since it sounded a whole lot like assimilation. All immigrant populations, I believe, move to the mean. They intermarry, often losing their ethnic surname in the process. Some Anglicize their names. And they don’t vote like their grandparents. Vietnamese-Americans become less Republican, Mexican-Americans become less Democrat. It was my 2006 poll in California where I first saw this American story play out and then it repeated itself in my 2008 polls in Arizona, Nevada and New Mexico. In each case, later generations of Latino voters were less likely to be tagged as Hispanic (i.e., identified as Hispanic on a voter file) and more likely to vote Republican, a finding of tremendous importance for Democratic political professionals such as myself. In my circles, one often hears a discussion about whether Latinos are base voters or not. My polls suggest that while Mexican-American immigrant voters may be base voters (that is, strong Democrats), later generation Mexican-Americans are in need in persuasion. My polls also suggest that because of the surname issue, our voter files are not up to the task of identifying later generation Mexican-Americans.

The most frequent over-reliance on Hispanic surname that I encounter is when my fellow political pollsters want to interview more Latinos than they would get in a normal sample so they do a Hispanic oversample. The polls we do for candidates running for office are generally private, though, so instead I made an example of the public polls conducted by Latino Decisions. Based on the following text on their methodology page, I thought they were only calling voters with Hispanic surnames:

Our samples are typically drawn randomly from the most recent publicly available list of registered voters in the given state, screened for Hispanic surnames using the Census Bureau list of commonly occurring Spanish surnames, and merged with third party data to secure telephone numbers. Voter registration status and Hispanic identification are verified upon contact with respondents, who confirm if they are registered to vote and of Hispanic/Latino descent.

Turns out I was wrong. In their response to my memo, Latino Decisions (Latino Decisions conducts surveys of Latinos in the United States) offered a “clarification” to correct the above representation of samples that were “simplistically identified as ’surname.’” Now, they admit, “the samples we purchase do contain an appropriate percentage of respondents with non-Spanish surnames.” Their newly-posted methodology page reflects that change:

Our samples are typically drawn randomly from the most recent publicly available list of registered voters in the given state, and based on Hispanic households, as identified by different commercial vendors, and merged with third party data to secure telephone numbers; both landline and cellphone numbers are included. One important starting point for identifying Hispanic households is to screen for Hispanic surnames using the Census Bureau list of 12,000 commonly occurring Spanish surnames. Beyond the surname list, additional non-Spanish surname Hispanic households are identified by commercial market data and included in the Hispanic household sample. Voter registration status and Hispanic identification are verified upon contact with respondents, who confirm if they are registered to vote and of Hispanic/Latino descent.

In other words, we now seem to both be in agreement that polls of Latino voters should include Latinos that do not have Hispanic surnames. Which, of course, was my main point. Now the issue is how many Latino voters have Hispanic surnames and in what way are they different than Latino voters who do not have Hispanic surnames?

Admission: I would not presume to put my finger on a precise figure for how many Hispanic voters have Hispanic surnames. Our voter files don’t have the answer, commercial modeling is not the answer and neither is the Census. Latino Decisions often points to Census statements about the Hispanic population as evidence to support Latino Decisions statements about the Hispanic electorate (such as Hispanic surname, generation, socio-economic status, etc.). But as Roberto Suro, then of the Pew Hispanic Center, wrote in 2005, “The differences between the Hispanic population and the Hispanic electorate are more than just a matter of size. Latinos who are eligible to vote and those who actually do vote have distinctly different characteristics than the Latino population as a whole.” One example that Suro highlighted then was that “a far greater share of the Hispanic adult population (56%) is foreign-born than among those who voted in 2004 (28%).” In other words, the share of Hispanic adults that were immigrants was twice as large as the share of 2004-voting Hispanics that were immigrants. Latino Decisions concedes that Hispanic immigrants are more likely than later generation to retain their surname. It is interesting then that same distribution for Hispanic surnames that is found in the population seems to be imposed by Latino Decisions on the electorate despite the difference in generational distribution. In my view, everything from the voter file to the Census simply allows for an estimate.

As a political pollster, what matters most to me is what is actionable from a campaign point of view. By that standard, the percentage of the Hispanic electorate that has a Hispanic surname is almost a philosophical question. Nice to know, but it doesn’t give me anything I can use in a campaign. In campaigns, we almost always use voter lists for both polls and, perhaps more importantly, for targeting voters. As a result, what matters more than the actual percentage of Latino voters with Hispanic surnames is the percentage of Hispanic voters that are identified as Hispanic on the voter list (”tagged”). The reason we use voter lists despite their many shortcomings (such as how quickly the phone numbers within are outdated) is that they have a key benefit: they overcome the issue of poll respondents lying about whether they actually vote or not. Voter lists from political vendors not only show whether an individual has voted, but also in which elections they have voted. That last point is particularly important from a campaign polling point of view, because it helps us make a model (i.e., educated guess) about who will vote in an upcoming election. In both their methodological statements, Latino Decisions acknowledges using voter lists.

So let’s discuss one of the voter lists I’ve used: the 2008 Catalist New Mexico file. No question Catalist is one of the best and most innovative national voter list vendors out there. In 2008, their file had a Hispanic tag that was derived primarily from a model supplied to them by a vendor. Catalist had done their own testing and readily acknowledged that the model wouldn’t get 100 percent of Latinos (80 percent was their estimate) but insisted that it would do better than surname. Since I wasn’t modeling, my curiosity was only academic. Thanks to the commitment of the Obama campaign to spend the extra money, a completely bilingual phone bank (Eastern Research Services) was going to call a statewide (i.e., not just Hispanic) sample and screen on ethnicity. So we called voters regardless of their tag, spoke Spanish or English as they desired, asked if they came from “a Hispanic or Spanish-speaking ethnic background” and then asked about the chance of their voting in the November election for president, keeping only those who said their chances were 50-50 or better. Of the 400 voters who made it through that screen, only 197 were tagged as Hispanic.

My original memo includes a list of the surnames of each voter in my sample that was not tagged as Hispanic. If a name shows up twice, it means I interviewed two voters with that name. Many names seem pretty conspicuously Hispanic to me, such as Apodaca and Ontiveros. But for whatever reason, these voters were not tagged as Hispanic on the Catalist file in September of 2008. In fact, despite the fact that Hispanics made up 38 percent of eligible voters in New Mexico in 2008, only 27 percent of the statewide file Catalist sent me was tagged as Hispanic.

The tag on the 2008 Nevada file was much worse. Just 27 percent of the sample was tagged as Hispanic by the Nevada Democratic Party. Going through the sample by hand, I identified about 25 percent of self-identified Hispanic voters who had names that couldn’t be tagged as Hispanic no matter how good the Hispanic surname dictionary. Thirty-six percent of the overall Nevada sample were immigrants, 27 percent did the interview in Spanish. As was the case with the New Mexico survey, a statewide sample was called by only bilingual callers and asked if they were Hispanic.

So am I saying that every voter in New Mexico with the name Apodaca and Ontiveros are not tagged as Hispanic by Catalist? No, I’m not. I’m just saying the specific Apodacas and Ontiveros’s I called weren’t tagged in 2008.

Am I saying that 51 percent of Hispanic voters in New Mexico do not have Hispanic surnames? No, definitely not. It’s undoubtedly lower than 51 percent, since many obvious Hispanic surnames are untagged on the 2008 Catalist file. Because I looked at the Nevada sample record by record, I have every reason to believe that the number is higher than that which Latino Decisions assigns to the Latino population (10 percent). And, as mentioned, there is nothing inherently contradictory about Hispanic voters having different characteristics than the Hispanic population at large, since only 1 in every 5 Latinos in the United States is a voter. In all 4 polls, I went in with no set idea of what percentage of Latino voters would be identified on the file as Hispanic let alone what percentage would have Hispanic surnames. In the case of Nevada, because the Annenberg Policy Center wanted to do this right, we made 126,921 calls into a statewide sample with bilingual callers to get 999 interviews of self-identified Latino voters who said that they had voted in the just concluded November election. No callbacks, no modeling, no concern that the Nevada Democratic Party’s file was so bad at tagging Hispanics and no expectation of a census-derived set number of Hispanic surnames. We called them whether they were tagged as Hispanic or not. And, to reiterate, more than 25 percent had names like Ching, Gibson and Smith.

In the end, what I care about is that which we Democrats have at our disposal to help us win elections. In September of 2008, as I did Hispanic polls for the Obama campaign in Colorado, Florida, Nevada and New Mexico as quickly as possible so that the campaign could make data-driven decisions, I had at my disposal the Catalist files. In all 4 states I used a methodology to ensure that I would reach Latino voters that were not tagged as Hispanic, but only in New Mexico (the state with the highest proportion of Hispanics – 45 percent) was I able to do a poll of all voters and ask the voters themselves whether they self-identified as Hispanic. If all I had used was the Catalist Hispanic tag, I would have reported to the campaign that Obama was doing 8 points better against McCain than my poll ended up showing. If I showed Obama’s vote share as being 8 points higher, perhaps the campaign wouldn’t have allocated the same resources in the last two months that it did. My interest is not really in getting a definitive answer to the question of how many Latino voters have Hispanic surnames. I want to win. Just relying on the Catalist tag in New Mexico or imposing a fixed number of non-Hispanic surnames could very well could have gotten in the way.

I do not believe that Hispanic surname or any other sort of modeling gets us to where we need to be in order to maximize the Latino vote. Even Latino Decisions now says that their sampling includes 10 percent of non-Hispanic surnames that they get from “commercially available markers of hispanicity.” A Latina friend of mine who lives in Beverly Hills emailed me earlier today about her Hispanic in-laws named Brown, Kasansky and Nelson. Especially if they live in Beverly Hills too, I would have a hard time imagining that either a voter file or a commercial file would have them marked as Latino. And if Brown, Kasansky and Nelson are English-dominant Hispanics, than even Latino Decisions admits they have “higher incomes, higher levels of education, [and] are more likely to be Republican.”

Let’s take a moment to recap. Latino Decisions now admits it calls voters with non-Hispanic surnames. Latino Decisions also recognizes “the prevalence of non-Spanish surnamed individuals rises over generation.” Finally, Latino Decisions says that English-dominant voters are different (including more Republican) than Spanish-dominant voters.

I, of course, agree with all of that.

So where is our difference? Latino Decisions asserts that “the data show almost no difference in the voting habits of Spanish surnamed and non-Spanish surnamed voters.” But they say non-Spanish surnamed are more likely to be later generation than Spanish surnamed. They would agree that later generation are more likely to be English-dominant than immigrants. And they say that English-dominant Latino voters are more likely to be Republican than non-English dominants. Yet for there to be almost no difference in the voting habits of Spanish-surnamed and non-Spanish surnamed voters, that would mean that there was almost no difference between English-dominant later generation Latino voters who are more likely to be Republican and non-English dominant immigrant voters that are less likely to be Republican. That, of course, makes no sense. One of our differences is in the number of Latino voters that have Hispanic surnames. They say it’s about 10 percent because that’s the percentage in the Hispanic population. Since I know that the Hispanic electorate is often very different than the Hispanic population, I don’t know exactly what it is, but I would guess it’s at least twice that big. The even bigger difference, though, is that I think that there is a significant difference in voting habits between English-dominant later generation Latino voters who are more likely to be Republican and non-English dominant immigrant voters that are less likely to be Republican.

Let’s look at my post-election poll of Latino 2006 general election voters in California, a chart of which is in the memo which set this debate off. Of all the file vendors I have worked with, it strikes me that Political Data has the best match in terms of Hispanic voters who are tagged as Hispanic on the file. Overall, 62 percent of the voters in the sample were tagged as Hispanic on the file. Seventy-five percent of immigrants were tagged (32 percent of the sample), 69 percent of second generation (31 percent of the sample), 52 percent of third generation (18 percent of the sample) and 41 percent of fourth generation and beyond (16 percent of the sample). The tag, then, exhibits the trend expected by Latino Decisions even if the number tagged isn’t as high as they would like: “the prevalence of non-Spanish surnamed individuals rises over generation.” As measured by the language they speak at home, 34 percent of mostly English, 6 percent of mostly Spanish and 22 percent of both languages equally were registered Republicans. This, too, is in keeping with Latino Decisions’ expectation: English-dominant Republicans are more likely to be Republican. The differences, again, are two: the number of Latino voters with no Hispanic surnames and the voting habits of Hispanic surnamed voters relative to non-Hispanic surnamed voters.

Forty-three percent of the voters not tagged as Hispanic voted for the Republican Arnold Schwarzenegger, while only 29 percent of voters tagged as Hispanics voted for Arnold. The difference is even greater if you move non-tagged voters like Mrs. Pineda-Ramos over to the Hispanic side (i.e., if you apply a very good Hispanic surname dictionary), because you are moving over mostly immigrant and second generation voters and leaving only later-generation voters with names like Brown. In the California poll, as with the rest, voters with Hispanic surnames (who we all agree are more likely to be immigrants and Spanish-speakers and Democrats) are less likely to vote Republican than voters who do not have Hispanic surnames (who we all agree are more likely to be later generation and English-dominant and Republicans). Why would we expect it to be any other way?

Putting this once again in a campaign context, if Democratic nominee Phil Angelides had done a poll of California Latino voters based only on the Political Data Hispanic tag, he would have thought he was doing better with Latinos than he really was. It would have been more or less as methodologically unsound as Arnold Schwarzenegger doing a poll of Latinos voters only in English. By interviewing too many English-dominant later generation Latinos and almost no Spanish-dominant immigrants, Arnold would have overstated his Latino support. Hispanic-surname sampling and English-only polling both introduce bias right from the start.

Later generation Latinos with names like Brown, Kasansky and Nelson are the Latinos we need to persuade. Immigrant and second-generation Latinos (especially young ones) are turnout targets. Neither modeling nor over-reliance on Hispanic surname will give us voter files that have all the Browns tagged, that include generation and that have vote history for each voter. If we don’t poll later-generation Latinos correctly, we won’t know the messages that work with them. If we don’t identify them and target them as Latinos, we won’t maximize the Latino vote.

One last point of contention came from the Latino Decisions assumption that I must have agreed with the CNN exit poll regarding Nevada. A demographer friend of mine and I were talking about it a couple of weeks ago; I said I would guess that Latino Decisions was closer than CNN. Being a demographer, he did the math: “Halfway between them is 79 percent. So are you saying that Reid got at least 80 percent?”

“Yes,” I replied. “I’m not sure, of course. But even Nevada’s regular exit polls had Obama getting 76 percent in 2008.”

All to say that I don’t think that Latino Decisions is way off, I just think they are off. I’m reassured a bit by today’s disclosure that they do actually poll Latinos with non-Hispanic surnames, but modeling and surnames are always going to leave out some Latinos. I accept that their Hispanic surname dictionaries are state of the art and clearly better than what I was seeing in New Mexico, Arizona and Nevada, but a better dictionary, as helpful as it is, doesn’t address what I see as the key problem. Even Latino Decisions recognizes that later generation Latinos are more likely than immigrants to have non-Hispanic surnames though they think the percentage is so small as not to make much of a difference. But if you have a really good dictionary that captures all the tough names, like Borraez, Camino and Eseamilla, you are still leaving out all the voters with names like Brown, Kasansky and Nelson. Maybe a commercial file will capture some of them, but it won’t capture all of them. I want to capture all of them. Their number is only going to grow. We need to.

In short, I’m glad to have learned only yesterday that Latino Decisions interviews Latino voters with non-Hispanic surnames. They have their reasons for thinking that the percentage of Latinos that don’t have Hispanic surnames is small; my polls have shown that that the way that Hispanics are identified on our voter files leave a lot to be desired and neither modeling nor a good Hispanic surname dictionary completely addresses the problem. Over the next year and a half, we Democrats need to fix our voter files in the jurisdictions where Latino voters can make the difference between winning and losing. If we operate on the assumption that Hispanic surname identifies 90 percent of Latino voters, that commercial modeling gets the other 10 percent and that there is almost no difference in the voting habits of voters with and without Hispanic surnames, we will be ignoring the later-generation Latino voters most in need of persuasion.

On Tuesday, Sept. 27, 2011, Andre Pineda passed away.  His passing came as shock to everyone who came in contact with him. I am comforted with the thought that he is busy in heaven polling the angels.

