October 6, 2016

## Examples (Whiteboard)

• Belief propagation on a factor tree
• Max-product (max-sum if taking logarithm) algorithm for predicting weather sequence from ice cream consumption
• Maximum A Posteri (MAP) computation for Hidden Markov Models

## Belief Propagation (BP) on Factor Trees

• Goal: Compute marginal probabilities or conditional probabilities given observed evidence (posterior)
• Approach: Decompose global calculation on a large joint probability into linked local message computations.
• Property:
• BP message updating and passing rules give the exact marginals when the underlying undirected graph is a tree.
• More generally, BP is correct if and only if the graph is triangluated, i.e., there are no chordless cycles of length $$\geq 4$$

## Junction Tree Algorithm

• In many applications, we have to deal with graphs with loops
• Need exact and efficient marginal probability calculations for arbitrary graph topology
• Key idea: Work with a tree-like data structure, in which the nodes are cliques rather than single nodes, to organize efficient probablistic inference.

## Clique Tree

• Clique Tree: A clique tree for a graph $$\mathcal{G}=(V,E)$$ is a tree $$T=(V_T,E_T)$$ where $$V_T$$ is a set of cliques of $$\mathcal{G}$$ that contains all maximal cliques of $$G$$. The edge connecting two cliques $$C_1, C_2\in V_T$$ are labeled with the intersection set $$C_1\cap C_2$$.

## Clique Tree

• D is not in the path from AD to DE
• Because propagations are local (shown later), there is no gaurantee that the marginal probability of D is the same for both sets of cliques

## Junction Tree

• Junction Tree: A junction tree for a graph $$\mathcal{G}$$ is a clique tree that satisfies the running intersection property, i.e., for any cliques $$C_1$$ and $$C_2$$ in the tree, every clique on the (unique) path connecting $$C_1$$ and $$C_2$$ contain $$C_1\cap C_2$$
• Running intersection property avoids the previous problematic feature

## Junction Tree Algorithm

1. Moralization (if starting from a Bayesian network)
2. Triangulation: select the order of variable elimination
3. Construct the junction tree: maximum cardinality search (Tarjan and Yannakakis, 1984)
• A triangluated graph must have a junction tree representation
4. Transfer the potentials: initialize the potential on the junction from those of an underlying graph

## Junction Tree Algorithm (continued)

Finally, 5) Propagate: Suppose cliques $$V$$ and $$W$$ have a non-empty intersection $$S$$, each with potentials $$\psi_V$$ and $$\psi_W$$. Also initialize the potential for $$S$$ ($$\phi(S)$$) with unity. Then update as follows (Hugin): $\phi^*_S=\sum_{V-S}\psi_V, \psi^*_W=\frac{\phi^*_S}{\phi_S}\psi_W$

• Once the algorithm terminates, the clique potentials and separator potentials are proportional to the marginal probabilities.

## Properties of Junction Tree Algorithm

• Exact marginal probability calculation for arbitrary graphs
• Efficient: for a given graph, there does not in general exist a computationally cheaper approach
• Must work with the joint distributions within each node (each of which corresponds to a clique of the triangulated graph)
• Computational cost: determined by the number of variables in the largest clique and will grow exponentially with this number (discrete case)
• Treewidth (Bodlaender, 1993): Find the number of variables in the largest clique, one for each possible junction tree representation of a graph. Treewidth is defined by the smallest one minus 1.

## Approximate Inference

• Treewidth could be very large ($$d-1$$ for a complete graph; At least $$k$$ for a $$k\times k$$ grid)!
• Variational algorithms: Characterize a probability distribution as the solution to an optimization problem.
• Sampling algorithms: Draw sample from desired marginal distributions or posterior distributions. For example, Markov chain Monte Carlo (MCMC), Importance sampling, etc.