Goal: Prove that under conditional exchangeability, the inverse probability weighted estimate is unbiased for the population average treatment effect.

Let \((Y_i(z),Z_i, {X}_i)\) be the vector of potential outcomes under treatment (\(z=1\)) or control (\(z=0\)), the actual observed binary treatment assignment (\(1\) or \(0\)) and the vector of measured covariates \(X_i\). Let \(Y^{obs}_i=Y_i(1)Z_i+Y_i(0)(1-Z_i)\) denote the observed outcome for individual \(i\). Assume conditional exchangeability holds, i.e., \(Y_i(z)\perp Z_i \mid X_i,\forall z\)11 Because this means \(P(Y_i(z)=1\mid Z_i=z', {X}_i)\) are identical for any \(z'\), regardless of the actual assignment received, individuals with identical \({X}_i\) share the same distribution for \(Y_i(z)\mid {X_i}\), let’s prove that \[E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \Big/ E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\right\}=E[Y_i(z)],\] where \(I\{A\}\) equals one/zero if the statement A is true/false and \(f(Z_i\mid X_i)\) denotes the conditional probability density function and could be regarded as a random variable22 Not \(P(Z_i\mid X_i)\) because we would have to write \(P(Z_i=Z_i\mid X_i=X_i)\) when considering random \(Z_i\) and \(X_i\), an awkward notation..

\[ \begin{aligned} \text{Numerator of Left-Hand Side} & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \leftarrow (\text{by iterated expectations})\\ & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}[Y_i(1)Z_i+Y_i(0)(1-Z_i)]\right\} \leftarrow (\text{by definition of observed values})\\ & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i(z)\right\} \leftarrow (\text{by definition of observed values})\\ & = E_{X_i}E_{Y_i(z), Z_i\mid X_i}\left[\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i(z)\mid X_i\right] \leftarrow (\text{by iterated expectations})\\ & = E_{X_i}\left\{E_{ Z_i\mid X_i}\left[\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\mid X_i\right]E_{Y_i(z)\mid X_i}\left[Y_i(z)\mid X_i\right] \right\}\leftarrow (\text{by conditional exchangeability:}~Y_i(z)\perp Z_i\mid X_i)\\ & = E_{X_i}\left\{1\cdot E_{Y_i(z)\mid X_i}\left[Y_i(z)\mid X_i\right] \right\} \\ &=E[Y_i(z)] \end{aligned} \]

\[ \begin{aligned} \text{Denominator of Left-Hand Side} & = E_{X_i}\left[E_{Z_i\mid X_i}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\mid X_i\right\}\right] \leftarrow (\text{by iterated expectations})\\ &= 1, \end{aligned} \] which finishes the proof.

Remarks

Even if the denominator seems redundant for its unit expectation, in practice, sample estimates of the mean inverse probability weight will not be exactly \(1\). \(\hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \Big/ \hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\right\}\) is generally used for estimating parameters in a marginal structure models where weighted regressions are commonly used.33 We call it modified Horovitz-Thompson estimator. Compare it with the original Horovitz-Thompson estimator: \(\hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\}\)
Because \(f(Z_i\mid X_i)\) is regarded as a random variable, the expression on the left is well defined. But when \(f(Z_i=z\mid X_i)=0\) for some \(z\), the left-hand side will not have meaningful causal interpretation.
The weight \(1/f(Z_i\mid X_i)\) used here is referred to as unstablized weights. A more complete account of the inverse probability weighting with stablized weights \(f(Z_i)/f(Z_i\mid X_i)\) that might offer smaller variance of the causal effect estimates can be found here (Hernán and Robins ⊕2006Hernán, Miguel A, and James M Robins. 2006. “Estimating Causal Effects from Epidemiological Data.” Journal of Epidemiology and Community Health 60 (7). BMJ Publishing Group Ltd: 578–86.).

Notes on The Consistency of Inverse Probability Weighting

Zhenke Wu

zhenkewu@umich.edu

2016-11-22

Remarks