Order of Multitudes

Reading Stories in Data: A Conversation with Dan Bouk

Dan Bouk is Associate Professor of History at Colgate University, where he researches the history of bureaucracies, quantification, and, as he describes them, “other modern things shrouded in cloaks of boringness.” His first book, How Our Days Became Numbered: Risk and the Rise of the Statistical Individual (Chicago, 2015), explored the spread into ordinary Americans’ lives of the United States life insurance industry’s methods for quantifying people, for discriminating by race, for justifying inequality, and for thinking statistically. His new book, a narrative history of the 1940 U.S. Census entitled Democracy’s Data, will be published in 2022. Bouk has been documenting intriguing pieces of the research for this book online at his blog Census Stories. In this interview, we talked about the human labor that creates data, the dangers of aggregation, and why censuses are more fascinating than they’re given credit for.

Sarah Pickman: With your first book, How Our Days Became Numbered, you investigated how the life insurance industry has shaped American lives over the past two centuries, turning risk into an everyday commodity and creating, or reifying, categories of human life and risk around variables like race, gender, and age. How did you first become interested in this topic, and why is the history of American life insurance so instructive for thinking about the categories we live inside today?

Dan Bouk: To be honest, what I was originally trying to do was a project on the history of mosquito eradication! I was researching this topic and found a book about the elimination of New Jersey mosquitoes that discussed a mosquito eradication committee in the early 1920s headed by Frederick Hoffman. Hoffman was a statistician for Prudential Insurance Company of America. Later I came across a book by the famous twentieth-century American mathematician and statistician Alfred Lotka, just sitting on a library shelf, called The Money Value of a Man, which he’d co-written with the insurance statistician Louis Dublin. My interest in these figures was piqued. And then, while attending a science, technology, and environmental policy workshop at Princeton University, I overheard folks making comments like, “Well, given the internationally accepted value of a human life…” And I just remember my jaw dropping. What could it mean to have an “accepted value” of a human life? How could we possibly have gotten to this place? And why were all these mid-twentieth century people like Hoffman, Lotka, and Dublin, who were public intellectuals, employed by insurance companies? Why were they involved in pressing public health and legal activities like eradicating mosquitos, or developing a way to measure the value of a human life, but at the same time employed by life insurance companies? It occurred to me that finance and life insurance might be places where industrial research is and had been done, but these sites had been overlooked by historians of science.

Where the project ultimately led me was to see that insurance companies, by virtue of the kinds of science and medicine they employ, create and propel certain categories into our lives. But importantly, these categories aren’t just categories of identity, but they are used to decide how to allocate resources. As a result, they have a marked impact on people’s life chances. These categories, which are built around race and gender and other kinds of differences, gain power and are amplified by something like an insurance company. And it wouldn’t necessarily be that a person would ever take these categories on themselves, or even know that the categories were being applied to them. But all the same, they might feel the consequences of a category being applied to them very directly. It’s like the way one encounters an ad on Facebook. You don’t know how Facebook thinks of you in terms of fitting into its categories of users, and even if you did you probably wouldn’t define yourself according to those categories. But you might understand that you get certain job ads because of the way Facebook thinks of you, job ads that wouldn’t be shown to other people, even if you don’t know what specifically it was about you that made Facebook show you that ad. As a result, you may get certain opportunities other people don’t.

Sarah Pickman: One of the tensions in the book is between the aggregate and the individual: insurance companies using population aggregates to assign risk and financial policies, but also individuals being encouraged to make choices for themselves, often in the context of their health, based on aggregated data. Can you talk briefly about how some of your historical actors saw this tension, and do you see any interesting contemporary parallels—say, with how the COVID-19 pandemic has played out in the United States?

Dan Bouk: Early on in this research, I realized that statistical things are supposed to be aggregates: statistics are supposed to be about groups. The history of statistics is very closely tied to the idea of the “law of large numbers,” which in turn is premised on the existence of large numbers of something. Early insurance companies also drew on this principle. The idea of an insurance company was that it would work because it had enough people in its community that statistical regularities used to determine how much money they needed or had to be paid out would hold over the course of that large group. But from the very beginning, there was a tension. Insurance companies couldn’t compel people to buy policies, and they were always suspicious of anybody who actually came and wanted a policy. They were afraid that anyone who voluntarily wanted to buy health insurance or life insurance must know that they were sick or about to die. Otherwise, why would they want to buy a policy? As a result, life insurance companies employed doctors and other people whose job it was to very deeply individualize the entire process. And life insurance medicine became its own field of medicine, in which doctors are trained to think differently about a patient than they do in their private practice. In a private practice they might be more concerned about things that were wrong with a patient and how to fix them. As insurance doctors, they’re trained to think about a person in terms of how normal they are; to try to think about them in a statistical sense, so they can be aggregated into a group.

Today, that doesn’t sound so weird to us. You might go to your primary doctor and be told your height and weight is in a certain percentile, for example. It’s now a pretty normal way to understand ourselves as individuals, as statistical individuals. This wasn’t always the case. Life insurance is an important path to how we arrived here.

To the second part of your question, related to Covid-19, I would recommend the work of Caley Horan, whose book Insurance Era was just published. She focuses on the role of insurance companies from the post-war era onwards. One of the points she makes is that insurance was always a social technology premised on the idea that we all are better off if we stick together and share risks. This is a very old idea that undergirds human societies, and insurance companies are just capitalist attempts to make this bonding happen again. But since the middle of the twentieth century, these companies have tried to lobby for decreased regulation of their activities and against welfare states, and have emphasized the role of personal responsibility and individuation in the way that they sell their products. They have tried to push this idea that insurance will make you independent. And this is a powerful message and a pernicious one if you live in a society that still requires people to sometimes band together for the common good. Covid is a great example of this. Caley and I have often talked about this in terms of mask-wearing, or with vaccines: many people perceive that these are acts meant to protect yourself individually and might be hesitant to do them for that reason, but collective society is really the thing that benefits from these actions.

Sarah Pickman: Your new book project is a narrative history of the 1940 U.S. Census, which you’re documenting at the censusstories.us website. Why the 1940 census, specifically? How has looking at census data differed from other large sets of data you’ve examined for your work?

Dan Bouk: Why the 1940 census? One of the things I wanted to do with the book was get people who don’t think they deal with data to realize that they’re constantly dealing with data. I wanted people who work in census records to construct their family’s genealogy to think about their work as data work, as data reading. So while the book is a narrative history of the census, it’s also an argument about all of the different layers and people beneath or behind the numbers. When we look at a table of numbers we’re actually only seeing a very small part of the story. For that reason, the 1940 census is great because it is the most recent manuscript census available for public access in its entirety—all of the censuses after that are still embargoed, aside from the final published results. The 1940 census is where most people who do genealogy in the U.S. will start.

The 1940 census is a massive dataset, but unlike most big data projects, anyone can access the full and complete dataset. It was also a very extensive census. Each person had, potentially, thirty different columns of information that might be completed about them, as opposed to the 2020 census, where individuals answered about ten questions. In the 1940 census, people were asked where they lived five years previously because during the 1930s there was the Great Depression, the Dust Bowl, and the Great Migration, so migration was a very salient category for the U.S. government. For the first time, there were questions about people’s income—this is a time when the government, big corporations, and labor unions are all interested in having information about people’s income. The census also ends up playing an important role in World War II. We can see here what happens when a census is weaponized, for example, because the 1940 census was used to guide the incarceration of Japanese-Americans during the war.

I thought that in the 1940 census I could find and hear the voices of many different kinds of people. The census in general is a much more inclusive and democratic institution than almost any other. As a result, it provides the chance to listen to, and affirm the dignity of, all individuals.

Sarah Pickman: What unites this project and your earlier book is an interest in the human labor that creates data—in your work, you’ve looked at insurance executives and customers, doctors, door-to-door census takers, tabulators, and more. As a scholar, how have you thought about grounding or centering this human labor in your writing?

Dan Bouk: We have some great studies from people like Sarah Igo or Ian Hacking about how certain categories are spread, and how people come to accept them and use them, but we have relatively less work that tries to understand how categories actually get made from the ground up. Who are the workers in these vast networks? Who does the labor of making these categories? I was most fascinated by the labor of the census enumerator, standing at a person or family’s doorstep and asking them questions. But the people answering the questions are also doing work, because they’re trying to figure out how to fit themselves into a fixed set of choices in response to an enumerator’s question. And at every step, from the enumerator to the census editor to the publisher, humans are making decisions about how to collect, aggregate, and present data. So you can think of the census as the result of a series of negotiations. I think that the “data” here is not just the final set of numbers, but every step of the process.

I actually found my way into this project because I was at the National Archives looking for information for a different project on a woman named Elbertie Foudray. She was an actuary who worked for the U.S. census bureau and produced a series of “U.S. life tables” that were then cited by important population researchers, like Louis Dublin who appeared in my first book. So I wondered, who was this person who was being cited all over the place? I started looking through the records of the 1940 census, trying to find out more about her. In the course of digging around looking for information on Foudray, I found all of these letters from census enumerators complaining that they weren’t being paid properly. I realized that the census was a massive operation that must have constantly been breaking down—and that that might make for some compelling human stories! That changed everything for me.

Sarah Pickman: You’ve described part of this new project as an argument for, or demonstration of, reading data as a liberal art. Could you talk a bit about this way of understanding big data, and why it is important?

Dan Bouk: When we think about the verbs that we attach to data these days, like mining data, or exploiting data, aggregating or selling it or cleaning it, these are instrumental terms. Data feels like a resource that’s meant to be chopped up and dealt with in an industrial way. And sometimes, you do want to deal with data in those ways. But those are not the only ways. I’m trying to convince people that they’re missing a lot of what data can tell them when they attempt to simplify and reduce it with a computational approach. There’s nothing wrong with a computational approach, of course, but what can you learn by actually trying to closely read data and use broadly humanistic methods on digital problems? This is digital humanities work, but instead of bringing the digital to the humanities, I’m trying to bring the humanities to the digital.