2024 Greenberg Lecture Series: “Data Thinning and its Applications”

Описание к видео 2024 Greenberg Lecture Series: “Data Thinning and its Applications”

May 20, 2024
This year, Daniela M. Witten, PhD, Professor of Statistics and Biostatistics the Dorothy Gilford Endowed Chair in Mathematical Statistics at the University of Washington, will present the 2024 Greenberg Lectures.

Lecture #2
We propose data thinning, a new approach for splitting an observation from a known distributional family with unknown parameter(s) into two or more independent parts that sum to yield the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general and can be applied to a broad class of distributions within the natural exponential family, including the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. Furthermore, we generalize data thinning to enable splitting an observation into two or more parts that can be combined to yield the original observation using an operation other than addition; this enables the application of data thinning far beyond the natural exponential family. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the “usual” approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data, in a setting where sample splitting is not applicable. This is joint work with Anna Neufeld (Fred Hutch), Ameer Dharamshi (University of Washington), Lucy Gao (University of British Columbia), and Jacob Bien (University of Southern California).

Комментарии

Информация по комментариям в разработке