- Evaluation of transparency, reproducibility, biases, and social impact of data-intensive research, with a current focus on evaluating applications of data science and artificial intelligence in social and health research
- Responsible data science workflows grounded in open reproducible research and data ethics
- Data governance and incentives for responsible sharing and reuse of research data and code
- Computational social science and network science applied to scientific communities, health, and migration
I am a Lecturer in Computational Social Science in the Department of Sociology at the University of Essex. I obtained my DPhil from the University of Oxford and held postdoctoral positions at the University of Chicago and the Stanford University School of Medicine. My research combines computational methods from social data science and network science with approaches from reproducible research and metascience to study the transparency, reproducibility, bias, and social impact of data-intensive research, with a current focus on evaluating and improving the transparency and reproducibility of applications of data science and artificial intelligence in social and health research. Published studies examine how biases resulting from the way scientific communities are networked can affect the robustness and replicability of biomedical findings and how the implementation of data sharing policies impact clinical trial data availability. In another stream of research, I use computational social science and network science to examine health-related misinformation, digital-health interventions, and inequality in network structures of global migration.
As a data science educator, I am passionate about democratising data literacy and broadening the adoption of open practices and reproducible data science research. I have developed open learning materials for Reproducible Data Science that provide an accessible introduction to open-source research software, reproducible workflows (with Jupyter Notebook), hands-on coding (with Python and Markdown), and data science techniques and skills necessary to perform open, reproducible, and ethical data analysis. Using real-world data about the COVID-19 pandemic and policy-relevant problems at the intersection of society and health, the materials introduce students to the principles of open and reproducible science, data wrangling, exploratory data analysis and visualisation, machine learning, causal inference, network analysis, and data ethics. Recently, I was invited faculty at the Research Transparency and Reproducibility Training (RT2) 2021, organised by the Berkeley Initiative for Transparency in the Social Sciences (BITSS), where I gave a hands-on tutorial on Dynamic Documents with Jupyter Notebook for Reproducible Workflows. In relation to current concerns about ethical and social implications of data science, artificial intelligence, and machine learning research, I am investigating with data science communities the challenges and opportunities for Responsible Data Science Workflows.