This event is part of the Fall 2022 Statistics Colloquium.
Valid inference after clustering, with application to single-cell RNA-sequencing data
Presented by Lucy Gao, Assistant Professor, Department of Statistics, University of British Columbia
Wednesday, November 2
4:00 p.m. ET
Virtual meeting - Webex meeting room
Testing for a difference in means between two groups is fundamental to answering research questions across virtually every scientific area. Standard hypothesis tests (e.g. the t-test) control the type I error rate when the groups to be tested are defined before looking at the data. However, if the groups are instead defined by applying a clustering algorithm to the data, then applying a standard test for a difference in group means to that same data yields an extremely inflated selective type I error rate. This two-step "double-dipping" procedure is common in the analysis of single-cell RNA-sequencing data.
In my talk, I will apply ideas from selective inference to enable valid inference after hierarchical clustering. If time permits, I will also introduce count splitting: a flexible framework that enables valid inference after latent variable estimation in count-valued data, for virtually any latent variable estimation technique and inference approach.
This talk is based on joint work with Jacob Bien (University of Southern California), Daniela Witten and Anna Neufeld (University of Washington), as well as Alexis Battle and Joshua Popp (Johns Hopkins University).
Lucy is an assistant professor in the Department of Statistics at the University of British Columbia. Prior to UBC, she was an assistant professor at the University of Waterloo.