Order of Multitudes

What is “data diversity”? Is it a problem or an opportunity?

Photo by Ryoji Iwata on Unsplash

“Data diversity” carries remarkably different meanings in different domains of research today. In one sense, the phrase responds to the racism perpetuated by common algorithms, such as Google’s search engine (see, for instance, Safiya Noble’s Algorithms of Oppression).

Researchers who use the tools of machine learning to improve user-oriented algorithms use “data diversity” to refer to robust data sets that are representative enough to avoid biasing the outcomes of the algorithms. In this context, data diversity represents both an ethical and an epistemic imperative, though researchers also identify a potential trade-off between diversity and “utility” of output (as evolutionary biologists and cognitive psychologists point to the “utility” or “efficiency” of stereotypes).

Meanwhile, in the observational sciences, discussions of data diversity center on ethical questions of a different kind. In an era of shrinking research budgets, scientists often find themselves sharing observational platforms with those in quite different fields, such that ecologists, hydrologists, and atmospheric scientists might depend on a single set of sensors to collect data for each group.

Similarly, the “integrated” modeling of environmental impacts (integrating many different variables) demands the standardization of data across different domains of the natural and social sciences. This means that information about inorganic, organic, and social processes that unfold on quite different scales of space and time must be uniformly packaged: from the growth of a leaf and the drops of rainwater that transpire from it, to the progressive destruction of a rainforest and its impact on the concentration of carbon dioxide in the earth’s atmosphere. 

Here too ethical questions arise, as Information Studies scholar Christine Borgman observes:

“Recognizing the diversity of data, their representations, and the competing perspectives of stakeholders on the matters of value, rights, and ethics is essential to the design of effective knowledge infrastructures”

– Christine Borgman – (Big Data, Little Data, No Data, p. 80).

As Borgman notes, standardizing the production of data is a long-term process that depends on tightly coordinated international organization. It may remove much of the scope for spontaneity and creativity on the part of individual researchers in the lab or field. How should a resolution be reached when one discipline’s data practices conflict with another’s—when, for instance, a descriptive discipline confronts a predictive one? Are data always translatable across domains, or might some forms of data be fundamentally incommensurable with others?

Disparate as these two discourses about “data diversity” may sound, they share an overlooked common ground, namely the perspective of data users. The utility of data can only be judged by those who use it. Whose needs are data being designed to meet? How might the community of potential users be expanded, and how might data be made more usable for previously neglected users?

Author(s)

Deborah Coen

Professor of History and History of Science and Medicine, Yale University

View Bio

Deborah Coen

Deborah R. Coen is a Professor of History and Chair of the Program in History of Science and Medicine at Yale University. Her first book, Vienna in the Age of Uncertainty: Science, Liberalism, and Private Life (University of Chicago Press, 2007), focused on an extraordinary scientific dynasty, the Exner-Frisch family. Vienna in the Age of Uncertainty won the Susan Abrams Prize from the University of Chicago Press, the Barbara Jelavich Prize from the Association for Slavic, East European, and Eurasian Studies, and the Austrian Cultural Forum Book Prize. Her latest book, Climate in Motion: Science, Empire, and the Problem of Scale (University of Chicago Press, 2018), won the 2019 Pfizer Award from the History of Science Society in recognition of an outstanding book dealing with the history of science. Climate in Motion is the first study of the science of climate dynamics before the computer age. It argues that essential elements of the modern understanding of climate arose as a means of thinking across scales of space and time. More recently, Coen is interested in the physical and social science of climate change. Her goal is to acquire the knowledge and skills necessary to bring history to bear on some of the implicitly historical questions that anthropogenic climate change raises.

Upcoming Events

October 29, 2020 -October 29, 2020

The Participant in Troubled Times

How did we get here, and what does it tell us about participation today? Although the protest and the social movement are the most visible and recognizable forms of participation, they are not the most common. In this talk, The Participant leads us through time and space to explore the curious and meandering history of […]

November 13, 2020 -November 13, 2020

First Nations: Ethical Landscapes, Sacred Plants

First Nations: Ethical Landscapes, Sacred Plants is convened by the New York Botanical Garden’s Humanities Institute in partnership with Yale University and the Mellon Foundation. It forms part of The Order of Multitudes: Atlas, Encyclopedia, Museum, a collaborative project that examines the long and varied history of attempts to collect and manage information and generate synthetic, inclusive knowledge. […]

Conversations

Meet scholars, artists, scientists who confront the question of big data.

12-Oct-20 |
Dr. William Watson may be a musicologist by formal training, but his work demonstrates his ambitions to contribute to the broader project of intera...
Load More Conversations

Questions

These are the intellectual puzzles that animate our inquiry. Find out More

Share This Page...