“Data diversity” carries remarkably different meanings in different domains of research today. In one sense, the phrase responds to the racism perpetuated by common algorithms, such as Google’s search engine (see, for instance, Safiya Noble’s Algorithms of Oppression).
Researchers who use the tools of machine learning to improve user-oriented algorithms use “data diversity” to refer to robust data sets that are representative enough to avoid biasing the outcomes of the algorithms. In this context, data diversity represents both an ethical and an epistemic imperative, though researchers also identify a potential trade-off between diversity and “utility” of output (as evolutionary biologists and cognitive psychologists point to the “utility” or “efficiency” of stereotypes).
Meanwhile, in the observational sciences, discussions of data diversity center on ethical questions of a different kind. In an era of shrinking research budgets, scientists often find themselves sharing observational platforms with those in quite different fields, such that ecologists, hydrologists, and atmospheric scientists might depend on a single set of sensors to collect data for each group.
Similarly, the “integrated” modeling of environmental impacts (integrating many different variables) demands the standardization of data across different domains of the natural and social sciences. This means that information about inorganic, organic, and social processes that unfold on quite different scales of space and time must be uniformly packaged: from the growth of a leaf and the drops of rainwater that transpire from it, to the progressive destruction of a rainforest and its impact on the concentration of carbon dioxide in the earth’s atmosphere.
Here too ethical questions arise, as Information Studies scholar Christine Borgman observes:
“Recognizing the diversity of data, their representations, and the competing perspectives of stakeholders on the matters of value, rights, and ethics is essential to the design of effective knowledge infrastructures”– Christine Borgman – (Big Data, Little Data, No Data, p. 80).
As Borgman notes, standardizing the production of data is a long-term process that depends on tightly coordinated international organization. It may remove much of the scope for spontaneity and creativity on the part of individual researchers in the lab or field. How should a resolution be reached when one discipline’s data practices conflict with another’s—when, for instance, a descriptive discipline confronts a predictive one? Are data always translatable across domains, or might some forms of data be fundamentally incommensurable with others?
Disparate as these two discourses about “data diversity” may sound, they share an overlooked common ground, namely the perspective of data users. The utility of data can only be judged by those who use it. Whose needs are data being designed to meet? How might the community of potential users be expanded, and how might data be made more usable for previously neglected users?
Professor of History and History of Science and Medicine, Yale UniversityView Bio
Deborah R. Coen is a Professor of History and Chair of the Program in History of Science and Medicine at Yale University. Her first book, Vienna in the Age of Uncertainty: Science, Liberalism, and Private Life (University of Chicago Press, 2007), focused on an extraordinary scientific dynasty, the Exner-Frisch family. Vienna in the Age of Uncertainty won the Susan Abrams Prize from the University of Chicago Press, the Barbara Jelavich Prize from the Association for Slavic, East European, and Eurasian Studies, and the Austrian Cultural Forum Book Prize. Her latest book, Climate in Motion: Science, Empire, and the Problem of Scale (University of Chicago Press, 2018), won the 2019 Pfizer Award from the History of Science Society in recognition of an outstanding book dealing with the history of science. Climate in Motion is the first study of the science of climate dynamics before the computer age. It argues that essential elements of the modern understanding of climate arose as a means of thinking across scales of space and time. More recently, Coen is interested in the physical and social science of climate change. Her goal is to acquire the knowledge and skills necessary to bring history to bear on some of the implicitly historical questions that anthropogenic climate change raises.