2024 Greenberg Lecture Series: “Selective Inference for Clustering”

Описание к видео 2024 Greenberg Lecture Series: “Selective Inference for Clustering”

May 20, 2024
This year, Daniela M. Witten, PhD, Professor of Statistics and Biostatistics the Dorothy Gilford Endowed Chair in Mathematical Statistics at the University of Washington, will present the 2024 Greenberg Lectures.

Lecture #1
In contemporary applications, it is common to collect very large data sets with the vaguely-defined goal of hypothesis generation. Once a dataset is used to generate a hypothesis, we might wish to test that hypothesis on the same set of data. However, this type of “double dipping” violates a cardinal rule of statistical hypothesis testing: namely, that we must decide what hypothesis to test before looking at the data. When this rule is violated, then standard statistical hypothesis tests (such as t-tests and z-tests) fail to control the selective Type 1 error — that is, the probability of rejecting the null hypothesis, provided that the null hypothesis holds, and given that we decided to test this null hypothesis. While double dipping is pervasive across many application areas, in this talk Dr. Witten will focus on the analysis of single-cell RNA-sequencing data, in which it is common to cluster a set of observations — corresponding to cells — and then to test for “statistical significance” of the resulting clusters. While of course a naive double-dipping approach to this task is not valid, she will show that we can apply the framework of conditional selective inference to conduct valid inference in this setting. In particular, she will consider settings in which the clusters are estimated via hierarchical or k-means clustering. This work was conducted in collaboration with UW PhD students Lucy Gao (Biostat PhD 2020) and Yiqun Chen (Biostat PhD 2022), as well as Jacob Bien (USC).

Комментарии

Информация по комментариям в разработке