Synthetic control removes spurious discoveries from double dipping in scRNA-seq post-clustering analyses
Presented by Dongyuan Song, Asst. Professor, Department of Genetics and Genome Sciences, University of Connecticut School of Medicine
Wednesday, December 4, 2024, 4:00 PM, AUST 202
Coffee will be served at 3:30 in the Noether Lounge (AUST 326)
Webex Meeting Link
Abstract: In typical single-cell RNA-seq (scRNA-seq) analyses, it is common to cluster cells into putative types and then employ statistical differential expression (DE) tests to identify genes that are differentially expressed across these clusters. This process, which involves using the same data twice, is known as “double dipping” and can lead to an inflated false discovery rate (FDR) in identifying cell-type marker genes. While methods have been proposed to address this issue, existing approaches assume that genes are all independent, which is unrealistic for real scRNA-seq datasets. Therefore, these methods fail to effectively control FDR in real data analyses. Here, we introduce ClusterDE, a novel post-clustering DE method designed to control FDR of identified DE genes, even under a gene-gene dependency structure. The core of ClusterDE involves generating synthetic null data as an in silico negative control that contains only one cell type. Through extensive simulations and real data analyses, we demonstrate that existing methods fail to control the FDR due to their unrealistic assumptions. Conversely, ClusterDE consistently manages to control the FDR when its assumptions are satisfied, which aligns with the conditions of most real datasets.
Speaker Bio:
Dr. Dongyuan Song is the new assistant professor of Genetics and Genome Sciences at UConn Health. He got his Ph.D. in Bioinformatics, advised by Professor Jingyi Jessica Li in the Department of Statistics and Data Science, UCLA. Before that, he obtained an M.S. in Computational Biology from the Department of Biostatistics at Harvard School of Public Health, advised by Professor Rafael Irizarry at Dana-Farber Cancer Institute. His research interest is developing novel statistical methods for analyzing single-cell and spatial omics data.