- Belief propagation on a factor
**tree** - Max-product (max-sum if taking logarithm) algorithm for predicting weather sequence from ice cream consumption
- Maximum A Posteri (MAP) computation for Hidden Markov Models

October 6, 2016

- Belief propagation on a factor
**tree** - Max-product (max-sum if taking logarithm) algorithm for predicting weather sequence from ice cream consumption
- Maximum A Posteri (MAP) computation for Hidden Markov Models

**Goal**: Compute marginal probabilities or conditional probabilities given observed evidence (posterior)**Approach**: Decompose global calculation on a large joint probability into linked*local*message computations.**Property**:- BP message updating and passing rules give the exact marginals when the underlying undirected graph is a
**tree**. - More generally, BP is correct if and only if the graph is
*triangluated*, i.e., there are no chordless cycles of length \(\geq 4\)

- BP message updating and passing rules give the exact marginals when the underlying undirected graph is a

- In many applications, we have to deal with graphs with loops
- Need exact and efficient marginal probability calculations for arbitrary graph topology
**Key idea**: Work with a tree-like data structure, in which the nodes are cliques rather than single nodes, to organize*efficient probablistic inference*.

**Clique Tree**: A clique tree for a graph \(\mathcal{G}=(V,E)\) is a tree \(T=(V_T,E_T)\) where \(V_T\) is a set of cliques of \(\mathcal{G}\) that contains all maximal cliques of \(G\). The edge connecting two cliques \(C_1, C_2\in V_T\) are labeled with the intersection set \(C_1\cap C_2\).

- D is not in the path from AD to DE
- Because propagations are local (shown later), there is no gaurantee that the marginal probability of D is the same for both sets of cliques

**Junction Tree**: A junction tree for a graph \(\mathcal{G}\) is a clique tree that satisfies the*running intersection property*, i.e., for any cliques \(C_1\) and \(C_2\) in the tree, every clique on the (unique) path connecting \(C_1\) and \(C_2\) contain \(C_1\cap C_2\)- Running intersection property avoids the previous problematic feature

*Moralization*(if starting from a Bayesian network)*Triangulation*: select the order of variable elimination*Construct the junction tree*: maximum cardinality search (Tarjan and Yannakakis, 1984)- A triangluated graph must have a junction tree representation

*Transfer the potentials*: initialize the potential on the junction from those of an underlying graph

Finally, 5) *Propagate*: Suppose cliques \(V\) and \(W\) have a non-empty intersection \(S\), each with potentials \(\psi_V\) and \(\psi_W\). Also initialize the potential for \(S\) (\(\phi(S)\)) with unity. Then update as follows (Hugin): \[\phi^*_S=\sum_{V-S}\psi_V, \psi^*_W=\frac{\phi^*_S}{\phi_S}\psi_W\]

- Once the algorithm terminates, the clique potentials and separator potentials are proportional to the marginal probabilities.

- Exact marginal probability calculation for arbitrary graphs
- Efficient: for a given graph, there does not in general exist a computationally cheaper approach
- Must work with the joint distributions within each node (each of which corresponds to a clique of the triangulated graph)
- Computational cost: determined by the number of variables in the largest clique and will grow exponentially with this number (discrete case)
*Treewidth (Bodlaender, 1993)*: Find the number of variables in the largest clique, one for each possible junction tree representation of a graph. Treewidth is defined by the smallest one minus 1.

- Treewidth could be very large (\(d-1\) for a complete graph; At least \(k\) for a \(k\times k\) grid)!
- Variational algorithms: Characterize a probability distribution as the solution to an optimization problem.
- Sampling algorithms: Draw sample from desired marginal distributions or posterior distributions. For example, Markov chain Monte Carlo (MCMC), Importance sampling, etc.

**Required reading**:- Lauritzen, S.L. and Spiegelhalter, D.J., (1988). Local computations with probabilities on graphical structures and their application to expert systems.
*Journal of the Royal Statistical Society. Series B (Methodological)*, pp.157-224. - Has a numerical example of the junction tree algorithm (an early version)

- Lauritzen, S.L. and Spiegelhalter, D.J., (1988). Local computations with probabilities on graphical structures and their application to expert systems.