Training replicable predictors in multiple studies
Velodromo - room N15
-
Abstract
This lecture considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.
Speaker’s Bio
Giovanni Parmigiani is an academic statistician. He trained at Bocconi’s DES and at Carnegie Mellon, and held faculty positions at Duke, Johns Hopkins and Harvard. His work investigates statistical principles and tools, often with a focus on understanding cancer data. For example, he is currently interested in addressing the challenges of cross-study replication of predictions, by constructing predictors that learn replicability from being trained on multiple studies at once. He also has a long-term interest in helping families who are particularly susceptible to inherited cancer understand their risk and make informed decisions. He uses Bayesian modeling and machine learning concepts to predict who is at risk of carrying genetic variants, and to integrate literature-based and other information about the effects of mutations. Throughout his research activities, his broad goals are to find innovative ways to use data science and data technologies to fuel cancer prevention and early detection and, methodologically, to increase the rigor end efficiency with which we leverage the vast and complex information generated in today’s cancer research. He strives to foster the use of data sciences as a common thread to facilitate interactions between fields and academic cultures, and has a passion for mentoring and training young(er) scientists in interdisciplinary settings. Since joining Harvard in 2009, he has taken on several leadership roles: he is the Associate Director for Population Sciences of the multi-institutional Dana-Farber / Harvard Cancer Center (DF/HCC), and is the director of the postdoctoral training grant in Quantitative Sciences for Cancer Research at the Harvard T.H. Chan School of Public Health, where he is a Professor. He has been the Chairman of the Department of Biostatistics & Computational Biology at Dana-Farber Cancer Institute from 2009 to 2018, and the Leader of the DF/HCC Biostatistics and Computational Biology Program (now Cancer Data Sciences Program) from 2009 to 2015. He is the recipient of the Savage and deGroot prizes.