Jin Tian

Associate Professor

Computer Science

Iowa State University

Ames, Iowa

Inference with Selection Bias in Causal Graphical Models

Selection bias is induced by preferential selection of units for analysis, such that the sample obtained is not representative of the population. For instance, in a typical study of the effect of training program on earnings, subjects achieving higher incomes tend to report their earnings more frequently than those who earn less. Since the sample is no longer a faithful representation of the population, biased estimates may be produced regardless of how many samples were collected.

Selection bias poses a major obstacle to valid statistical and causal inferences. Current literature for treating selection bias often makes qualitative assumptions about the selection mechanism, or parametric (e.g. linear) assumptions regarding the data-generating model, or quantitative assumptions about the selection process. In this talk, we assume that the data-generating model is a causal graphical model possibly containing unobserved confounders. We present graphical and algorithmic conditions under which conditional probabilities and causal effects respectively are estimable unbiasedly from selection biased data. We further present conditions for recoverability when additional unbiased data are available.

Refreshments at 3:45pm in Snedecor 2101.