October 6, 2016

Examples (Whiteboard)

  • Belief propagation on a factor tree
  • Max-product (max-sum if taking logarithm) algorithm for predicting weather sequence from ice cream consumption
    • Maximum A Posteri (MAP) computation for Hidden Markov Models

Belief Propagation (BP) on Factor Trees

  • Goal: Compute marginal probabilities or conditional probabilities given observed evidence (posterior)
  • Approach: Decompose global calculation on a large joint probability into linked local message computations.
  • Property:
    • BP message updating and passing rules give the exact marginals when the underlying undirected graph is a tree.
    • More generally, BP is correct if and only if the graph is triangluated, i.e., there are no chordless cycles of length \(\geq 4\)

Junction Tree Algorithm

  • In many applications, we have to deal with graphs with loops
  • Need exact and efficient marginal probability calculations for arbitrary graph topology
  • Key idea: Work with a tree-like data structure, in which the nodes are cliques rather than single nodes, to organize efficient probablistic inference.

Clique Tree

  • Clique Tree: A clique tree for a graph \(\mathcal{G}=(V,E)\) is a tree \(T=(V_T,E_T)\) where \(V_T\) is a set of cliques of \(\mathcal{G}\) that contains all maximal cliques of \(G\). The edge connecting two cliques \(C_1, C_2\in V_T\) are labeled with the intersection set \(C_1\cap C_2\).

Clique Tree

  • D is not in the path from AD to DE
  • Because propagations are local (shown later), there is no gaurantee that the marginal probability of D is the same for both sets of cliques

Junction Tree

  • Junction Tree: A junction tree for a graph \(\mathcal{G}\) is a clique tree that satisfies the running intersection property, i.e., for any cliques \(C_1\) and \(C_2\) in the tree, every clique on the (unique) path connecting \(C_1\) and \(C_2\) contain \(C_1\cap C_2\)
    • Running intersection property avoids the previous problematic feature

Clique Tree vs Junction Tree

Junction Tree Algorithm

  1. Moralization (if starting from a Bayesian network)
  2. Triangulation: select the order of variable elimination
  3. Construct the junction tree: maximum cardinality search (Tarjan and Yannakakis, 1984)
    • A triangluated graph must have a junction tree representation
  4. Transfer the potentials: initialize the potential on the junction from those of an underlying graph

Junction Tree Algorithm (continued)

Finally, 5) Propagate: Suppose cliques \(V\) and \(W\) have a non-empty intersection \(S\), each with potentials \(\psi_V\) and \(\psi_W\). Also initialize the potential for \(S\) (\(\phi(S)\)) with unity. Then update as follows (Hugin): \[\phi^*_S=\sum_{V-S}\psi_V, \psi^*_W=\frac{\phi^*_S}{\phi_S}\psi_W\]

  • Once the algorithm terminates, the clique potentials and separator potentials are proportional to the marginal probabilities.

Properties of Junction Tree Algorithm

  • Exact marginal probability calculation for arbitrary graphs
  • Efficient: for a given graph, there does not in general exist a computationally cheaper approach
  • Must work with the joint distributions within each node (each of which corresponds to a clique of the triangulated graph)
  • Computational cost: determined by the number of variables in the largest clique and will grow exponentially with this number (discrete case)
  • Treewidth (Bodlaender, 1993): Find the number of variables in the largest clique, one for each possible junction tree representation of a graph. Treewidth is defined by the smallest one minus 1.

Approximate Inference

  • Treewidth could be very large (\(d-1\) for a complete graph; At least \(k\) for a \(k\times k\) grid)!
  • Variational algorithms: Characterize a probability distribution as the solution to an optimization problem.
  • Sampling algorithms: Draw sample from desired marginal distributions or posterior distributions. For example, Markov chain Monte Carlo (MCMC), Importance sampling, etc.

Comment

  • Required reading:
    • Lauritzen, S.L. and Spiegelhalter, D.J., (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological), pp.157-224.
    • Has a numerical example of the junction tree algorithm (an early version)