Big data, small data, thick data, no data: A global perspective on the EPIC debate

There were two kinds of people at EPIC this year: those who felt that the big-small-thick data debate is necessary, and those who didn't. Okay, so maybe it's a bit more complex than that, but people are easier to understand when categorized and quantified, right?

Big data was a big focus at EPIC. The debate focused around what big data is, and how ethnographers can show companies that context still matters. Tricia Wang began the discussion in her keynote, arguing that big data really only generates lots of data points, and that we need the thick data that ethnography provides to make sense of it.

Wang's keynote let nicely into the first session of the conference. Abby Margolis argued that we should shift our focus from big data's computational possibilities and instead take a people-centric approach, looking at how people use data to improve their lives. A critical point about data, she said, is that it is of most value to the individual, and that for the individual it is not an interchangeable commodity.

The question of what big data actually is was raised repeatedly, with speakers noting that the language we use to describe it matters for how we situate our research as ethnographers. In the town hall debate, Todd Harple tweeted, "I think we might be confounding data, big data, personal data and sensing...," inferring that we should get more specific about what we're trying to say.

One speaker, John Curran, argued that we should be reframing the conversation by shifting to a discussion of "big ethnographic data." This, he claimed, would help us to move away from seeing data as some mass thing, and to get back to the self. His viewpoint aligns with the approach that Horst and I took in our paper, where we argue that ethnographic methods have never excluded quantitative analysis. There is no reason why big data analysis can't be part of an ethnographic toolkit.

From my perspective, a big problem with the big data debate is its assumption that it is ubiquitious. Not everyone is constantly using computers and mobile phones. Some people are connected far more than others, thus skewing the picture that this data is portraying.

This is even more true when we look at the developing world. In Hati, the problem isn't the existence of mountains of data, but a distinct lack of it. Many people do not have personal identification, which makes things like census data collection problematic (although there is currently a push to digitize identities).

In a country where 90 percent of people work in the informal economy, few commercial transactions are recorded. With only 10 percent of people holding bank accounts, we can't see where money is flowing. Smart phones are rare, and few people are on the Internet with any regularity.

In Haiti, some of the biggest data sets in the country are generated by ordinary mobile phones, which the vast majority of people own. Mobile money is particularly exciting for dataphiles because it promises to collect financial information, which was never recorded before. But it will be some time yet before big data of a kind discussed in this conference materializes.

While the big data debate raises valid problems that need discussion, it's worth keeping in mind that there are some contexts in which big data cannot dominate our approaches, because the data does not yet exist. These are usually developing countries, which of course are the sites where our discipline developed the ethnographic method in the first place.

Along with questions of ethics and privacy, we should also be asking who is excluded from data sets, and what are the implications of this. Otherwise, we risk skewing the picture that big data is generating on a global scale. After all, it's not the size of your data sets that matter, nor solely what you do with them: our primary global challenge is to generate good data in the first place.