Harnessing Extra Randomness: Replicability, Flexibility and Causality
Presented by Richard Guo, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge
Wednesday, February 1, 2023
3:30 PM-4:30 PM ET
AUST 105
Webex Meeting Link
Many modern statistical procedures are randomized in the sense that the output is a random function of data. For example, many procedures employ data splitting, which randomly divides the dataset into disjoint parts for separate purposes. Despite their flexibility and popularity, data splitting and other constructions of randomized procedures have obvious drawbacks. First, two analyses of the same dataset may lead to different results due to the extra randomness introduced. Second, randomized procedures typically lose statistical power because the entire sample is not fully utilized.
To address these drawbacks, in this talk, I will study how to properly combine the results from multiple realizations (such as through multiple data splits) of a randomized procedure. I will introduce rank-transformed subsampling as a general method for delivering large sample inference of the combined result under minimal assumptions. I will illustrate the method with three applications: (1) a “hunt-and-test” procedure for detecting cancer subtypes using high-dimensional gene expression data, (2) calibrating confidence intervals for causal effect estimated with cross-fit “double machine learning”, and (3) testing the hypothesis of no direct effect in a sequentially randomized trial. For these problems, our method is able to de-randomize and improve power or coverage. Moreover, in contrast to existing approaches for combining p-values, our method enjoys type-I error control that asymptotically approaches the nominal level. This new development opens up the possibility of designing procedures that explicitly randomize and de-randomize: extra randomness is introduced to make the problem easier before being removed. I will also discuss the broader application of the method to causal inference and causal discovery. This talk is based on joint work with Rajen Shah.
Speaker Bio:
Richard Guo is a research associate in the Statistical Laboratory at the University of Cambridge, mentored by Rajen Shah. In Spring 2022, he was the Richard M. Karp Research Fellow in the causality program at the Simons Institute for the Theory of Computing. He received his PhD in Statistics from University of Washington in 2021, advised by Thomas Richardson, for which he received the Z. W. Birnbaum Award. His research interests include causal inference, graphical models and replicability of data analysis.