Order of Multitudes

More Isn’t Always Better: Thinking About Big Data with Bill Rankin

Bill Rankin is an Associate Professor in the History of Science at Yale University. He is the author of After the Map: Cartography, Navigation, and the Transformation of Territory in the Twentieth Century (2016), a history of the mapping sciences in the twentieth century. In addition to his research and teaching at Yale, Professor Rankin is an award-winning cartographer whose work has been published in a wide variety of venues in Europe, Asia, and the United States. Most of his maps can be viewed on his website, Radical Cartography, and he is working on a book of the same name based on his cartographic projects. We recently discussed some of this work, as well as his teaching on the history of catastrophes. We also explored what makes a good infographic.

Sarah Pickman: Can you speak briefly about your academic trajectory and research interests? What projects are you currently working on?

Bill Rankin: I took a somewhat circuitous path to where I am now. I started in architecture school, doing a double degree in architecture and civil engineering. After that, I worked as an architect and in experimental physics labs for a couple of years before pursuing a Ph.D. in history of science and architecture. My interests have always been at the intersection of science, technology, and space—broadly conceived—but it took me a while to figure out what that meant in terms of academic disciplines and institutions. These interests have taken me everywhere from the history of laboratory design to the history of engineering drawing to After the Map, the book I wrote on the history of mapping and navigation technology. Right now I’m working on a book based on my own mapping work called Radical Cartography.

SP: Your first book, After the Map, examines the history of three major international mapping projects, each of which required a huge amount of global cooperation across multiple institutions and individuals. Can you talk a bit about how your historical actors worked to manage large, unwieldy bodies of geographic information?

BR: In the late nineteenth and early twentieth centuries, solutions to unwieldy amounts of data often relied on a distributed model of international collaboration. The idea was that every country—or even every individual scientist—would collect data according to some agreed-upon template, so that the results from every nation, agency, or person would, in theory, be automatically legible across the world. In the book I argue that this scientific ideal was also a political ideal, and it mixed global cooperation and global competition in a finely calibrated way. But in the mid-twentieth century there was a big shift, when the U.S. military essentially said, “Now we just need to get this done.” The American efforts still required close collaboration with other countries, but the U.S. structured these exchanges very strategically, with generous assistance to other countries always being offered in exchange for keeping a copy of the data that resulted. So information management is always about political structures and relationships.

SP: In addition to writing and teaching, you’re also a cartographer with an extensive list of maps to your name, many of which will appear in your forthcoming book. How have you seen the recent interest in “big data” impact the domains of mapmaking, GIS, or spatial history?

BR: “Big data” is often presented as being good because there is so much of it—the more data, the better! But there are a few issues to think about. One is the difference between data being “numerically big” versus “visually big.” I find that the threshold for being visually big is actually much smaller than the threshold for being numerically big. For example, if you’re putting a hundred thousand points or shapes or lines on a map or in a visualization, that’s a huge amount of information in a single visual field, and the graphics will have to grapple with bigness. But in the world of “big data,” a hundred thousand points is not very big at all, and it’s not uncommon to come across datasets with many millions of entries, or more. So at a basic level I’d say that there are different kinds of bigness. Questions of visual bigness are often subordinated to numerical bigness in ways that I find unfortunate, both as someone who studies graphics and as someone who makes them. I get nervous when numerical bigness becomes a tool for radically abstracting away from individual lives. The source and meaning of the data can easily get lost in a kind of performative hyper-aggregation.

SP: In general, what are some things that you think make for effective visual communication of data, either geographic data or in general? What makes a good infographic, for example?

BR: One problem with the “more is more” philosophy is that visual complexity in itself isn’t necessarily analytic. For example, you might have GPS tracks of every taxicab ride in New York City, which might translate into millions or even billions of data points on a map. The map might be very compelling, even beautiful, but data always requires interpretation, and it might not be immediately clear what the map actually tells us about the geography of the city. Does it tell us about the flows and chokepoints of infrastructure? Does it tell us about who can afford to take a taxi? Does it tell us something about public transit access? It can be thrilling to see so much data at once, but all too often visualization isn’t explicit enough about the questions, assumptions, and arguments that drive it.

At a practical level, this often comes down to what designers call “visual hierarchy.” What does a reader see from ten feet away? What do they see from three feet away? What do they see from six inches away? This kind of visual structure is what makes a graphic analytic and argumentative. I’ve seen too many big-data visualizations that are only legible when zoomed all the way in; it’s all complexity, no payoff. Instead, I think a good visualization should be structured like good writing; it’ll have some big headline conclusions that are then backed up by detailed evidence. I actually think that this is one of the most important things that the humanities can bring to conversations about visualization. Data never “speaks for itself,” and there’s no such thing as a “neutral” or “objective” form of representation. Thinking about visual hierarchy forces us—whether as makers or as readers—to focus on interpretation, narrative, analysis, and argument.

SP: One of the undergraduate courses you teach is “Global Catastrophe Since 1750.” In the U.S., for example, what are some of the ways in which people have tried to use or make sense of large amounts of information to cope with a crisis situation? Do you see any historical antecedents for how data is being mobilized during the current pandemic?

BR: In the course we look at seven historical catastrophes, from the Biblical flood to climate change. With most of these catastrophes—everything from overfishing to the hole in the ozone layer—I try to emphasize how often decision-making is driven not by the sober assessment of scientific data but by assumptions or worldviews that are already baked into to the data that’s available or that dictate what kinds of data is collected in the first place. For example, we look at the famous collapse of the cod stocks off the coast of Newfoundland in the early 1990s, which led not just to huge ecosystem changes but also to massive social changes as well. In that situation, the problem wasn’t a lack of data or a battle between “data” and “politics.” The problem was that the data used to set fishing quotas relied on models that made certain inappropriate assumptions about the changing technologies and competitive pressures of the fishing industry, and the result was that fisheries managers—and fishers—were totally blind to what was actually happening with the cod. So for me, it’s not so much about how data is “used” or “mobilized” as it is about how data is constructed and how it guides decision making. And I think we can definitely see a similar dynamic with the pandemic, where a lot of the controversy really comes down to the assumptions behind different forms of modeling—assumptions about how the “raw” data (of cases, testing, or deaths) should be properly cooked.

The other thing I’ve been thinking about in the course this spring is how the temporality of an environmental catastrophe compares with the temporality of the COVID-19 pandemic. With things like climate change, or even overfishing, we’re talking about gradual change over decades or centuries, which is remarkably hard to reconcile with the year-to-year temporality of political or economic life. The pandemic is obviously unfolding much faster—it’s months, maybe a year or two (hopefully)—but there’s still a striking mismatch between the temporality of the phenomenon and the temporality of the cultural and political debates about it, which are focused more on the scale of weeks. In both cases I’d say that we’re pretty good at confronting short-term, acute issues, but our political system, or even our lived experience as human beings, has a very hard time managing long-term uncertainty. For example, many models that were made in March or April of this year to track COVID infections or deaths didn’t look any further out than August, and it’s still not uncommon to hear people talk about the COVID crisis as if the end of the time-series graph in front of them will mark the end of the pandemic itself. And data is right at the center of this mismatch. Trying to make data-based decisions—as we rightly should—means that our temporality will always be tethered to the limitations of our data, even if we acknowledge those limitations openly and critically.

SP: As a historian, do you see this mismatch between short- and long-term temporal thinking as a contemporary phenomenon?

BR: I don’t think it’s a contemporary issue, at least not in the sense of the last few decades. There’s a book we read in my Catastrophe course called Making Salmon, by Joseph Taylor, that’s about the collapse of salmon stocks in the Pacific Northwest. It starts with a series of quotes running from the 1870s to the 1990s, each of which essentially says, “We have a major problem here with declining salmon populations, and we really need to act before it’s too late.” And it’s powerful, even heartbreaking, to see a similar level of alarm repeated every ten years or so for more than a century, without anything really changing. So no, I don’t think that people a hundred and fifty years ago were able to reconcile the rhythms of the earth with the rhythms of society any better than we do today. Our temporality is very deeply rooted, both culturally and institutionally.