Last week University of Chicago economist Matthew Gentzkow received the John Bates Clark Medal for the best American economist under 40, returning the award to Hyde Park for the first time in a decade. (Prior winners: Milton Friedman, Gary Becker, James Heckman, Paul Krugman, Larry Summers, Steven Levitt, Emmanuel Saez…)

He'd been on the unofficial short list for it, and had certainly received a lot of media attention, not least because the economics of media is Gentzkow's primary focus. In particular, Gentzkow and frequent co-author Jesse Shapiro (a University of Chicago economist himself in his mid-30s) tackled one of the knottiest industry problems of the past decade: does the proliferation of news online segregate people by ideology?

It was a difficult problem not just because of its ramifications for the public sphere—considered significant enough that Cass Sunstein in/famously proposed an Internet Fairness Doctrine—but because of the quantitative challenge. How do you transform something as fraught as the ideology of publications, a notorious rhetorical battleground, into data that can be tested? Gentzkow and Shapiro were fortunate enough to come of age as economists when a lot of text is available as data, and a lot of technologies exist to analyze text. Like teaching your email to look for spam, the authors taught computer programs to look for words that represent ideological bias, as expressed by professional ideologues, aka politicians.

They concluded that the concerns of people like Sunstein (and, in fairness, lots of other people at the time) were a bit overblown. Audiences generally visited a balanced number of not-very-slanted major outlets, and those outlets in turn received a relatively balanced audience. It got a lot of attention.

Farther below the radar, for obvious reasons, has been Gentzkow's research on the dawn of the modern media age—newspapers near the turn of the 20th century and through its first half. Which is unfortunate. As Gentzkow says, the contemporary online media landscape he was investigating with that paper looks a lot like the American media landscape a century ago (and like the more ideological newspapers found in other major world media markets). The relatively narrow American media market of the mid-late 20th century, of Walter Cronkite through CNN, is something of a historical and cultural exception.

Gentzkow and Shapiro have published multiple papers on the subject: how economic competition drove the ideology of papers; how the ideology of papers drove (and didn't drive) voting patterns; how economic competition and government influence drove news coverage. As the national media landscape broadens to a scope not seen since the days of corner newsboys, Gentzkow's research has implications for what it means for journalists and the public.

What about the topic of media is fun for you?

Studying media is studying firms. The part of economics I work in is called industrial organization—basically the study of firms. In that area you could find yourself working on lots of different industries. You could study the airline industry, the petroleum industry, the supermarket industry. You end up spending a lot of your life thinking about the details and nuts and bolts of whatever industry you're studying. Just for me personally, newspapers and TV and the way they work, and their content, is something I've always found fascinating—I find it fascinating as a consumer.

A less subjective and more concrete answer is that the economics involved turn out to be somewhat similar in media to other industries, but it's interesting because there are these broader political and social implications. It's a market we care about not only for the usual reasons, but also because we think it has a big impact on society more broadly and a big impact on the political process.

Those political and social dimensions make it especially interesting and exciting.

You work with a lot of data that's difficult to quantify. How do you go about it?

I've been very lucky in walking into this field at a time when the availability of data, particularly text and content data, has really exploded in this area. These are questions people have thought about and worked on for a long time, and I was lucky to turn up at a time when the scope for answering them was really changing in a positive way.

The particular kind of thing that I've done is… if we step back, there's a huge amount of work in computer science, linguistics, and other fields, on automated analysis of text and content. You think about the algorithms that are designed to run in your spam filter—you get a bunch of e-mails, your computer has to decide which ones are spam, which are not spam—a lot of work has gone into designing algorithms that are adaptive, that will learn, and then be able to look at new emails that are coming in and classify them.

The work that I've done with Jesse Shapiro is really just applying those methods developed in those other fields to looking at news text. Just as you might classify email as spam or not spam, we wanted to try to classify news text as conservative or liberal, or put it on a more continuous spectrum of slant from right to left.

How might you do that? How might you teach a computer to look at thousands and thousands of pages of newspaper text and assign it political positions? The way that all these algorithms basically work is that you basically need some body of text to train the algorithm—you need to show it some text to train the algorithm, you need to show it some text and say "here's text where we know what's conservative and what's liberal." And then look at that and say "what are the words and phrases that seem to appear especially often in a conservative text? In a liberal text?"

We map from the occurrence of those phrases into ideology. The computer can kind of learn that on some set of training text, and then when you show it some text it hasn't seen before, it can do that classification. The way we implemented that was to take speeches by politicians. Politicians we know their ideology. So we can ask what words or phrases, when a politician says them, tells you that this is very likely to somebody conservative, like "death tax," or this is very likely to be somebody liberal, like "undocumented worker."

Then you can apply that algorithm to look at newspapers and assign the newspapers political position on that same scale.

Did your findings about on- and offline ideological segregation surprise you at all?

Yes and no. We went into that project, having seen from other data a hint of the kind of conclusion we eventually reached. The origin of that project was that Jesse and I had looked at some simpler survey data, and we consistently had been struck by how much overlap there seemed to be in the political views of different media outlets' readership.

In that survey data, it was true, as you would expect, that more conservative people are more likely to watch Fox News, but to a really surprising degree the shares were not all that different. Quite a number of liberal people reported watching Fox News, and quite a number of conservative people reported watching CNN. You saw the same kind of thing if you looked at overlap between the viewership of Fox News and reading the New York Times, or other pairs of media outlets.

We had the suspicion based on those data that maybe in general the amount of fragmentation along political lines is not as big as everybody seems to think. Now, when we talked to people about those results, the reaction was always "those are just self-reported survey data. People are just reporting their recollections. Those aren't really accurate, and they may be systematically biased, because somebody who's super-conservative might not want to admit that they watch nothing but Fox News. They claim to be looking at a diverse set of things." 

The way the echo-chambers paper came about was saying, let's see if we can find data that doesn't have that self-report problem: data on internet browsing from ComScore, which measures it directly. They put software on your computer that watches everything you do and which websites you go to when. Again, just as you'd expect, being conservative, you're more likely to go to Fox News, less likely to read the New York Times. But all of those differences are smaller than many people had suggested.

We went into the project, based on that other data, with a suspicion, I think, that the whole echo chambers phenomenon had been overstated. But we didn't know what we were going to find; we didn't know, maybe, the non-self-reported data would look different from the self-reported data, maybe the internet would look very different from cable TV.

My anecdotal observation is that people get straight and breaking news from non-ideological sources, and then look to ideological sources for analysis.

It's a limitation of the paper, our ability to look at how this differs for different types of content—breaking news versus more opinion or analysis. It's pretty limited, our ability to look for that. We do try to address that—the way we try is looking at how these segregation patterns shift on different days, when the content of the news is different.

One example that we were particularly worried was, maybe when people are looking for political information, they're actually very segregated, and we're seeing lower segregation because for sports news, or movie reviews, or weather, and other kinds of stuff, that's where the liberals and conservatives go see the same thing.

We look across days and in particular look at days when there's some big political event that means an unusually large event, when the consumption on that day is probably people looking for up-to-the-minute news about politics. An example of that was the election in 2008, and the run-up to the election. If the hypothesis is that political consumption is segregated, but sports news is pulling everything down, then you should have seen segregation go way up as the share of news that was political went up in those days around the election.

And we don't see that at all. The days around the 2008 election, consumption of news and opinion online was if anything less segregated than in other days. And you see similar things for other breaking-news events. The shooting at Virginia Tech happened during the period of data we were looking at; on that day, you don't see any unusual rise or fall in segregation. Across days it doesn't seem to vary too much with what's going on in any kind of systematic way, which I think argues against big differences for different kinds of content.

My guess as to the right way to understand this data is that you naturally go to the New York Times to look at things by default, that's true for a lot of people. Even more people by default go to CNN.com or Yahoo, especially back five or six years ago when we were looking at it. And they do that for everything. Whether there's a shooting at Virginia Tech or the day before the election, people have a favorite news source, or a couple favorite news sources.

The truth is that most people are looking at the same favorite news sources, they're not that different depending on your ideology. Most people are looking at stuff like CNN, whose audience is representative of the country. Some people are looking at the New York Times, the big site a little bit to the left in terms of its audience. Some people are looking at Fox, which is the big site a bit to the right in terms of its audience.

But that's not enough to make that much of a difference in the overall segregation. Most people who look at the New York Times are already looking at something like CNN. Most people who look at Fox are also looking at something like CNN. Overall, most Americans are looking at the same stuff.

A lot of your research has been focused on the early years of the modern media, and it seems to inform how these markets are created and why people consume ideologically oriented media.

This has been one of the things I've most enjoyed in terms of research in the last four or five years. Jesse Shapiro and I have done a series of studies of newspapers in the late 19th and early 20th century. One way to think about why that's exciting is, in many ways, the media landscape in the U.S. today looks more and more like newspapers in the 19th and early 20th century. We're moving increasingly toward media markets where there are lots of different choices, things are very explicitly partisan and ideological. That's not hidden, or not hidden particularly well. There's a lot of competition.

Those are characteristics that also describe historical newspaper markets in this country, and really are kind of the norm, historically in the U.S. and across countries today. We're coming out of what is really an unusual period, if you think back to the 70s and 80s, the world of the big three broadcast networks, where everybody is getting their news from basically the same small set of sources. That's what was really unusual. The internet today looks much more like this fiercely competitive, partisan market, which has been really the norm. Going back and looking at it historically is a neat way to try to get insights into how such markets work—how can we understand them, how does competition play out, what are the impacts of that kind of partisanship on voting, electoral outcomes, and other social effects we might care about.

We did a series of studies—we collected new data that covers all of the daily newspapers in the U.S. from 1869 to the present. We use those data in several studies. One, using opening and closing of newspapers to look at the impact of newspapers on voting, and what you see there is having a newspaper increases, quite dramatically, political participation. When a new newspaper opens up, more people vote. That's consistent with a long history of arguing that media are really important to political participation and democracy.

But at the same time, you don't see any effect of, say, a Republican newspaper opening on the share of people voting Republican, or a Democratic newspaper on the share of people voting Democrat. Newspapers seem to be increasing participation but not leading to polarization. It's not the case that, just because the media are really partisan, they're pulling people to the right or left. That finding we found really surprising. That's one we wouldn't have guessed we would see going into that project.

I think one way to understand it is that when partisanship is really explicit, readers are able to take it into account when interpreting what they read. Something that wears on its sleeve the fact that it's a Republican newspaper isn't necessarily going to persuade people on average to be more Republican, because everyone knows what their agenda is. They can filter that out in interpreting it. That's one story that would be consistent with those facts, not the only one.

A second paper looks at how market forces and how competition between newspapers impact their political positions. A third looks at to what extent newspapers in that period were manipulated by government; that was a time when the institutional restraints on government influence over media were pretty weak. It was common for state legislatures, say, to hand out all kinds of favors to friendly newspapers, or to try to induce newspapers to be friendly. There wasn't much legal constraint on the ability of the government to effectively buy off the media.

However, there was still a lot of market competition that you might think could discipline newspapers' willingness to be bought off, because that would hurt their credibility and therefore their profit. That last paper looked at how things played out in that environment. You had plenty of scope for governments to buy off the media, but the media firms, trying to make money in a competitive environment—and we find that, on net, there doesn't seem to much government influence at all. The market forces sort of win in that case, in that who controlled the government had very little impact on the composition of the media market.

I recall that, when Cass Sunstein was arguing that the internet was segregating people ideologically, one response was that the upshot is that the amount of ideological diversity reflected in that was a necessary response to what had been a very short period in which the ideological diversity of the media was very narrow.

The connection between those two things isn't something we draw out in that paper, or perhaps not as much as we should, but in the background of Sunstein's argument, and I think in the background in a lot of this internet debate, is the idea that economic forces online are going to produce a lot of diversity. We're going to end up with lots of variety online because firms are trying to cater to consumers and trying to differentiate from their competitors, trying to find a niche that somebody else isn't filling, and that's going to lead to lots of diversity.

Whether you think diversity is good or bad, competition, the argument goes, is going to tend to produce a lot of it.

That paper, going back to the historical period, is mostly focused on to what extent is it right that competition does produce diversity. We're trying to get inside the black box of that a little bit, and understand how the different incentives at play—a newspaper's desire to find a different niche than its competitors, the way price competition works, so you might want to be different from your competitors to soften price competition and be able to charge higher prices, the way advertising competition works you might get higher or lower advertising prices because you're doing something different or doing the same—how those different economic forces together effect the amount of diversity the market produces.

Then, Sunstein's argument would be, if the market produces a lot of diversity, that could be a bad thing, because of fragmentation. There's a long history of people arguing the opposite—a lot of both legal tradition and media regulation in the U.S. is based on the opposite argument, which is that ideological diversity is a good thing, that we want to have different points of view represented, because that's the only way people are going to figure out the truth.

So you could then go on and argue whether that's a good thing or bad thing. In that paper we kind of take the perspective that, most of the time, we think of diversity is good, and we have a lot of policies to increase diversity, so let's try to understand what produces. But I think it's absolutely right that the diversity produced could be good, could be bad, and the right answer is probably that it's both. It has benefits, and it has the potential to segregate people and potentially increase polarization.

Where do you see your research going in the near future?

I'd love to keep working on these questions related to media. All of these things happening online are exciting—understanding social media, understanding advertising online, understanding the way these things play out in other countries. I hope to keep working on those questions.

I've also been turning my attentions a little further afield from media—I've been doing some work on the formation of brand preference and the role that brands play in the market. I've been working on a health-care related project recently, motivated by all of the policy debate around health care which I found really interesting.

One of the nice perks of this job is that you wake up in the morning, and whatever captures your eye and strikes you as interesting, you can spend some time thinking about and potentially do some research on.