- Office hour: 2-3pm on Tuesdays or by appointment
- New office: 4623 SPH-I (within Suite 4605)
- Homework 1 due 11:59pm on October 10, 2016 (Questions?)

September 27, 2016

- Office hour: 2-3pm on Tuesdays or by appointment
- New office: 4623 SPH-I (within Suite 4605)
- Homework 1 due 11:59pm on October 10, 2016 (Questions?)

**Marginal probabilities**. Compute marginals of variables (given model parameters \(\mathbf{\theta}\)): \(p(x_i\mid \mathbf{\theta})=\sum_{\mathbf{x}': x_i'=x_i}p(\mathbf{x}'\mid \mathbf{\theta}).\) (Or conditional probabilities:**posterior distribution**)**Partition function**. For a Gibbs distribution representable as normalized products of factors: \(p(\mathbf{x})=\frac{1}{Z}\prod_{C\in\mathcal{C}}\psi_{C}(\mathbf{x}_C)\) with respect to a graph \(\mathcal{G}\) with cliques \(\mathcal{C}\), compute \[Z(\mathbf{\theta})=\sum_{\mathbf{x}}\prod_{C\in\mathcal{C}}\psi_{C}(\mathbf{x}_C)\]**Maximum A Posteriori (MAP) Inference**. Compute the variable configuration with the hightest probability: \(\hat{\mathbf{x}}=\arg \max_{\mathbf{x}}p(\mathbf{x}\mid \mathbf{\theta}),\) where \(p\) is a generic notation for joint or conditional distributions with unobserved variables \(\mathbf{X}=\mathbf{x}\).

- Let \(\mathbf{X}=\{X_1,\ldots,X_d\}\) denote the set of variable nodes, each of which can range from \(1\) to \(k\); \(F=\{\psi_{\alpha_1},\ldots, \psi_{\alpha_s}\}\) denote the set of factor nodes, where \(\alpha_1, \ldots, \alpha_s \subset \{1,\ldots, d\}\).
Compute the marginal probability naively, we do: \[p(X_1=x_1)=\frac{1}{Z}\sum_{x_2,\ldots, x_d}\prod_{i=1}^s \psi_{\alpha_i}(\mathbf{x}_{\alpha_i})\]

Computational complexity: \(\mathcal{O}(k^{d-1})\) additions of values read from the probability table, one row for each configuration \(X_2=x_2,\ldots,X_d=x_d\).

Instead, we choose an ordering \(\mathcal{I}\) among \(\{X_1,\ldots,X_d\}\) and utilize the factorization of \(p(\mathbf{x})\). (Variable Elimination)

- Explicitly encondes the relations among a subset of variables, which sometimes are not possible by clique potential parameterization (see an example on the next slide).
- Supports the inference/computational techniques
- Unifies directed and undirected graphical models
- Example (whiteboard)

- Example:
- \(\mathcal{H}=(V,E)\): complete (fully connected) undirected graph with vertices \(V\) and edges \(E\). \(|V|=d\)
- \(P\): positive distribution over \(d\) binary nodes; has only pairwise potentials (pairwise dependencies cannot be conditioned away).
- \(P\) is Markov to \(\mathcal{H}\), but not Markov to any graph missing even one edge.
- The graph \(\mathcal{H}\) can only guide use to use the maximal clique potential parametrization. For a omplete graph, we use a single, large potential \(\psi(X_1,\ldots,X_d)\), which requires \(2^d\) parameters.
- However, \(P\) by definition can be represented in a Gibbs distribution only with pairwise potentials \(\psi_{{j}{j'}}(X_j,X_{j'})\), which requires \(4{d \choose 2}\) parameters
- When \(d=10\), \(2^d=1,024\), much bigger than \(4{d \choose 2}=180\).

- Initialize an active list of potentials as read off from the factor graph.
- Choose an ordering \(\mathcal{I}\) for the variables in which \(x_1\) appears last
- For each \(i=1,\ldots,d-1\), let \(\mathcal{I}_i=j\), do the following:
- Find all potentials in the currently active list that reference \(x_j\) and remove them from the active list
- Let \(\phi_{j}(X_{T_j})\) denote the product of these potentials, with \(T_j\) being the union of nodes appearing in the active potentials referencing \(X_j\).
- Calculate \(\tau_i(x_{T_j-X_j})=\sum_{x_j}\phi_{j}(x_{T_j})\)
- Place \(\tau_i(x_{T_j-X_j})\) on the active list as one potential function

- Finally, we obtain \(p(x_1)=\frac{1}{Z}\tau_{d-1}(x_1)\)

- Let \(X_i\in \{1,\ldots,k\}\).
- Describe how you would compute \(p(X_1=x_1)\).
- Computational complexity of variable elimination: \(\mathcal{O}((d-1)k^r)\), where \(d\) is the number of variables, \(k\) is the maximum value a variable can take and \(r\) is the number of variables participating in the largest intermediate "factor".
- Finding a good ordering can reduce the computational complexity.

**Next Lecture**: Belief propagation (Sum-Product Algorithm on Polytrees)Required reading for the week: Chapter 9. Koller and Friedman (2009)