Order of Multitudes

Examining Data and Categories with Hussein Mohsen

Ph.D. candidate Hussein Mohsen has been busy at work in the world of algorithms, network modeling, and data categorizations. Hussein is particularly interested in the ways in which science and society interact through the field of genomics. He works to bridge the gap between qualitative and quantitative approaches to big data. We got to speak about his passion for computer science and how to discern data bias in his work.

Allison Chu: Tell me a little about yourself and your background. How did you get to Yale?

Hussein Mohsen: It’s been a long journey. I’m from Beirut, Lebanon. I was born and raised there, and I went to college there, at the Lebanese American University (LAU). I wasn’t sure if I should do computer science or architecture, before I chose to pursue the former. Then I went to England for a year on the Erasmus Scholarship, and I started studying bioinformatics, which is an intersection of computational methods, algorithms, and problems in biology and medicine. One of my mentors from back home was working on protein structure predictions, but he was also a computer scientist. I thought it was exciting, so I went to Indiana University to do my masters in bioinformatics.

Afterwards I went to Silicon Valley for one year and worked as a machine learning engineer there. After my second or third month in Silicon Valley, I started applying to grad school. One of the main reasons I’m doing the research that I’m doing now is because in Silicon Valley, I was doing machine learning on predictive analytics of business data. I realized I could do the same exact work but on different sources and data sets, so I went back to my bioinformatics roots, and now I’m applying those methods to genomics data.

Allison Chu: The faculty who work on the Sawyer Seminar identified you as doing research that parallels some of the main ideas that they are thinking about. Could you give me an overview of what you are interested in?

Hussein Mohsen: There is an obsession with big data as a term now, and the broad idea that “objective truths lie in big data.” I’ve grown skeptical of this as I started working on data-intensive projects. I realized that you could often tell different stories with the same data, and all of them would sound logical and great. In my eyes, big data sets are of high value, but they must be mere means to an end. I’m interested in how scientific categories are shaped by political and historical forces that were picked up by scientists, especially in genomics. I began studying the history of these categorizations, particularly with a focus on the history of eugenics. It was eye-opening. One of the projects that I’m working on now, not my thesis project, has to do with the history of human categorizations. I’m taking a quantitative and qualitative approach, and I’m using genomics data for this purpose. Many companies and research groups are leveraging these data sets without questioning their histories. I think there should be more engagement by scientists with the history of what they do, because especially when it comes to the forced entanglements between genomics, ancestry, and race, these histories are heavily troubled: what kind of data were generated, and what kind of data were not? How were the data collected and organized? My own research is at the intersection of machine learning, network models, and cancer genomics. I build network and machine learning models to understand certain questions in cancer genomics.

Allison Chu: How did you get into your research area?

Hussein Mohsen: Well, it was always computer science or architecture, and I still like both, but computer science was always about problem-solving, however clichéd that may sound. I was always drawn to math in high school, and the idea of problem-solving was always more fun to me than other subjects that were often taught by memorization. Algorithms are enormously fun, and the algorithmic thinking you develop as you work on algorithms both shapes your understanding of the world and is shaped by it. That’s also why I think it’s so important to incorporate the humanities in algorithmic design. And then there’s the human genome itself; we know quite a bit about the genome now, but there’s still so much we don’t know.

Allison Chu: Where do you see the future of these projects, or where do you see yourself going with this work?

Hussein Mohsen: Well, I’m planning to do a post-doc, and I’m still torn between the two areas. It’s either artificial intelligence (AI) and the interpretability of AI and how those algorithms work, or population genetics through a heavily qualitative and quantitative perspective. I’m still in my fourth year, so I have time to decide. For the projects themselves, it’s interesting because when you look in hindsight at the projects overall, you realize how fascinating they are, but on a daily basis, the work is sometimes very tedious! I think that’s common among researchers. So the thing that really keeps me moving is the questions themselves and how important they are.

Allison Chu: One last question that is not related to work: do you have some fun hobbies you’d like to share that are not related to the worlds of machine learning and cancer genomics?

Hussein Mohsen: I bike and play racquet games, and I do like libraries. What else…I used to like independent films way more when I had time, up until my first year of the Ph.D. I still like independent cinema, but recently I haven’t had much time to follow the films. I also like reading, but I think that’s become heavily related to grad school. I hope I’ll still have the time to read more non-academic stuff.