Abstract
This paper is concerned with using multivariate binary observations to estimate the proportions of unobserved classes with scientiļ¬c meanings. We focus on the setting where additional information about sample similarities is available and represented by a rooted weighted tree. Every leaf in the given tree contains multiple independent samples. Shorter distances over the tree between the leaves indicate higher similarity. We propose a novel data integrative extension to classical latent class models (LCMs) with tree-structured shrinkage. The proposed approach enables 1) borrowing of information across leaves, 2) estimating data-driven leaf groups with distinct vectors of class proportions, and 3) individual-level probabilistic class assignment given the observed multivariate binary measurements. We derive and implement a scalable posterior inference algorithm in a variational Bayes framework. Extensive simulations show more accurate estimation of class proportions than alternatives that suboptimally use the additional sample similarity information. A zoonotic infectious disease application is used to illustrate the proposed approach. The paper concludes by a brief discussion on model limitations and extensions.
Keywords Gaussian Diffusion; Latent Class Models; Phylogenetic Tree; Zoonotic Infectious Diseases; Spike-and-Slab Prior; Variational Bayes.