Goal: Prove that under conditional exchangeability, the inverse probability weighted estimate is unbiased for the population average treatment effect.
Let \((Y_i(z),Z_i, {X}_i)\) be the vector of potential outcomes under treatment (\(z=1\)) or control (\(z=0\)), the actual observed binary treatment assignment (\(1\) or \(0\)) and the vector of measured covariates \(X_i\). Let \(Y^{obs}_i=Y_i(1)Z_i+Y_i(0)(1-Z_i)\) denote the observed outcome for individual \(i\). Assume conditional exchangeability holds, i.e., \(Y_i(z)\perp Z_i \mid X_i,\forall z\)1 Because this means \(P(Y_i(z)=1\mid Z_i=z', {X}_i)\) are identical for any \(z'\), regardless of the actual assignment received, individuals with identical \({X}_i\) share the same distribution for \(Y_i(z)\mid {X_i}\), let’s prove that \[E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \Big/ E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\right\}=E[Y_i(z)],\] where \(I\{A\}\) equals one/zero if the statement A is true/false and \(f(Z_i\mid X_i)\) denotes the conditional probability density function and could be regarded as a random variable2 Not \(P(Z_i\mid X_i)\) because we would have to write \(P(Z_i=Z_i\mid X_i=X_i)\) when considering random \(Z_i\) and \(X_i\), an awkward notation..
[Proof]
\[ \begin{aligned} \text{Numerator of Left-Hand Side} & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \leftarrow (\text{by iterated expectations})\\ & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}[Y_i(1)Z_i+Y_i(0)(1-Z_i)]\right\} \leftarrow (\text{by definition of observed values})\\ & = E\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i(z)\right\} \leftarrow (\text{by definition of observed values})\\ & = E_{X_i}E_{Y_i(z), Z_i\mid X_i}\left[\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i(z)\mid X_i\right] \leftarrow (\text{by iterated expectations})\\ & = E_{X_i}\left\{E_{ Z_i\mid X_i}\left[\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\mid X_i\right]E_{Y_i(z)\mid X_i}\left[Y_i(z)\mid X_i\right] \right\}\leftarrow (\text{by conditional exchangeability:}~Y_i(z)\perp Z_i\mid X_i)\\ & = E_{X_i}\left\{1\cdot E_{Y_i(z)\mid X_i}\left[Y_i(z)\mid X_i\right] \right\} \\ &=E[Y_i(z)] \end{aligned} \]
\[ \begin{aligned} \text{Denominator of Left-Hand Side} & = E_{X_i}\left[E_{Z_i\mid X_i}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\mid X_i\right\}\right] \leftarrow (\text{by iterated expectations})\\ &= 1, \end{aligned} \] which finishes the proof.
Even if the denominator seems redundant for its unit expectation, in practice, sample estimates of the mean inverse probability weight will not be exactly \(1\). \(\hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\} \Big/ \hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}\right\}\) is generally used for estimating parameters in a marginal structure models where weighted regressions are commonly used.3 We call it modified Horovitz-Thompson estimator. Compare it with the original Horovitz-Thompson estimator: \(\hat{E}\left\{\frac{I(Z_i=z)}{f(Z_i\mid X_i)}Y_i^{obs}\right\}\)
Because \(f(Z_i\mid X_i)\) is regarded as a random variable, the expression on the left is well defined. But when \(f(Z_i=z\mid X_i)=0\) for some \(z\), the left-hand side will not have meaningful causal interpretation.
The weight \(1/f(Z_i\mid X_i)\) used here is referred to as unstablized weights. A more complete account of the inverse probability weighting with stablized weights \(f(Z_i)/f(Z_i\mid X_i)\) that might offer smaller variance of the causal effect estimates can be found here (Hernán and Robins 2006Hernán, Miguel A, and James M Robins. 2006. “Estimating Causal Effects from Epidemiological Data.” Journal of Epidemiology and Community Health 60 (7). BMJ Publishing Group Ltd: 578–86.).