【The 6th Seminar】
Date and Time: December 15th 6:00pm – 7:00pm(JST)
Speaker: Hidetoshi Shimodaira, RIKEN AIP
Title: Selection bias may be adjusted when the sample size is negative in hierarchical clustering, phylogeny, and variable selection
For computing p-values, you should specify hypotheses before looking at data. However, people tend to use datasets twice for hypothesis selection and evaluation, leading to inflated statistical significance and more false positives than expected. Recently, a new statistical method, called selective inference or post-selection inference, has been developed for adjusting this selection bias. On the other hand, we also face biased p-values in multiple testing, although it is a different type of selection bias. In this talk, I present a bootstrap resampling method with a “negative sample size” for adjusting these two types of selection bias. The theory is based on a geometric idea in the data space, which bridges Bayesian posterior probability to the frequentist p-value. Examples are shown for the confidence interval of regression coefficients after model selection and significance levels of trees and edges in hierarchical clustering and phylogenetic inference.
Hidetoshi Shimodaira is a professor at Kyoto University and a team leader at RIKEN AIP. He has been working on theory and methods of statistics and machine learning. His multiscale bootstrap method is used in genomics for evaluating the statistical significance of trees and clusters. His “covariate shift” setting for transfer learning is popular in machine learning.