Abstract
Optimal prevention and treatment strategies for a disease of multiple causes, such as pneumonia, must be informed by the population distribution of causes among cases, or cause-specific case fractions (CSCFs). CSCFs may further depend on additional explanatory variables. Existing methodological literature in disease etiology research does not fully address the regression problem, particularly under a casecontrol design. Based on multivariate binary non-gold-standard diagnostic data and additional covariate information, this paper proposes a novel and unified regression modeling framework for estimating covariate-dependent CSCF functions in casecontrol disease etiology studies. The model leverages critical control data for valid probabilistic cause assignment for cases. We derive an efficient Markov chain Monte Carlo algorithm for flexible posterior inference. We illustrate the inference of CSCF functions using extensive simulations and show that the proposed model produces less biased estimates and more valid inference of the overall CSCFs than analyses that omit covariates. A regression analysis of pediatric pneumonia data reveals the dependence of CSCFs upon season, age, HIV status and disease severity. The paper concludes with a brief discussion on model extensions that may further enhance the utility of the regression model in disease etiology research.
Keywords Bayesian methods; Case-control studies; Disease etiology; Latent class regression analysis; Measurement errors; Pneumonia; Semi-supervised learning.