Variable selection with a biased sample using tilted knockoffs
Presented by Qian Zhao, University of Massachusetts, Amherst
Wednesday, April 3, 2024
4:00 PM-5:00 PM ET
AUST 163
Webex Meeting Link
Coffee will be served at 3:30 pm in the Noether Lounge (AUST 326)
Researchers in biomedical studies often work with biased samples that are not selected uniformly at random from the population of interest. One example is a case-control study, where cases are over-sampled to study risk factors of rare diseases. While these designs are motivated by specific scientific questions, it is often of interest to use them to pursue secondary lines of investigations. In these cases, the biased sample can lead to spurious association between an exposure and an outcome when both of them affect the case-control status. This phenomenon is known in the causal inference literature as collider bias. While tests of independence under biased sampling are available, these methods typically do not apply when the number of variables is large.
In this work, we are interested in using the knockoff framework to select important variables among very many with replicability guarantees. We show that the standard model-X knockoffs fail to control FDR in the presence of biased sampling. We show that by tilting the population distribution with the selection probability and constructing knockoff variables according to this tilted distribution, the knockoff filter would control the FDR. We apply the tilted knockoff method to identify genetic underpinning of endophenotypes in a case-control study.
Speaker Bio:
Qian Zhao is an Assistant Professor in Statistics at the University of Massachusetts, Amherst. Prior to joining UMASS, she was a postdoctoral researcher in the department of Biomedical Data Science at Stanford University. Her research focuses on developing statistical theory and methods to achieve valid inference for high-dimensional problems, where the number of variables is large or comparable to the number of observations. She is passionate about data science education, and using data science to achieve positive social impacts.