Lecture 7: Exact Inference

September 27, 2016

Course Logistics

Marginal probabilities. Compute marginals of variables (given model parameters \(\mathbf{\theta}\)): \(p(x_i\mid \mathbf{\theta})=\sum_{\mathbf{x}': x_i'=x_i}p(\mathbf{x}'\mid \mathbf{\theta}).\) (Or conditional probabilities: posterior distribution)
Partition function. For a Gibbs distribution representable as normalized products of factors: \(p(\mathbf{x})=\frac{1}{Z}\prod_{C\in\mathcal{C}}\psi_{C}(\mathbf{x}_C)\) with respect to a graph \(\mathcal{G}\) with cliques \(\mathcal{C}\), compute \[Z(\mathbf{\theta})=\sum_{\mathbf{x}}\prod_{C\in\mathcal{C}}\psi_{C}(\mathbf{x}_C)\]
Maximum A Posteriori (MAP) Inference. Compute the variable configuration with the hightest probability: \(\hat{\mathbf{x}}=\arg \max_{\mathbf{x}}p(\mathbf{x}\mid \mathbf{\theta}),\) where \(p\) is a generic notation for joint or conditional distributions with unobserved variables \(\mathbf{X}=\mathbf{x}\).

Let \(\mathbf{X}=\{X_1,\ldots,X_d\}\) denote the set of variable nodes, each of which can range from \(1\) to \(k\); \(F=\{\psi_{\alpha_1},\ldots, \psi_{\alpha_s}\}\) denote the set of factor nodes, where \(\alpha_1, \ldots, \alpha_s \subset \{1,\ldots, d\}\).
Compute the marginal probability naively, we do: \[p(X_1=x_1)=\frac{1}{Z}\sum_{x_2,\ldots, x_d}\prod_{i=1}^s \psi_{\alpha_i}(\mathbf{x}_{\alpha_i})\]
Computational complexity: \(\mathcal{O}(k^{d-1})\) additions of values read from the probability table, one row for each configuration \(X_2=x_2,\ldots,X_d=x_d\).
Instead, we choose an ordering \(\mathcal{I}\) among \(\{X_1,\ldots,X_d\}\) and utilize the factorization of \(p(\mathbf{x})\). (Variable Elimination)

Explicitly encondes the relations among a subset of variables, which sometimes are not possible by clique potential parameterization (see an example on the next slide).
Supports the inference/computational techniques
Unifies directed and undirected graphical models
Example (whiteboard)

Initialize an active list of potentials as read off from the factor graph.
Choose an ordering \(\mathcal{I}\) for the variables in which \(x_1\) appears last
For each \(i=1,\ldots,d-1\), let \(\mathcal{I}_i=j\), do the following:
- Find all potentials in the currently active list that reference \(x_j\) and remove them from the active list
- Let \(\phi_{j}(X_{T_j})\) denote the product of these potentials, with \(T_j\) being the union of nodes appearing in the active potentials referencing \(X_j\).
- Calculate \(\tau_i(x_{T_j-X_j})=\sum_{x_j}\phi_{j}(x_{T_j})\)
- Place \(\tau_i(x_{T_j-X_j})\) on the active list as one potential function
Finally, we obtain \(p(x_1)=\frac{1}{Z}\tau_{d-1}(x_1)\)

Let \(X_i\in \{1,\ldots,k\}\).
Describe how you would compute \(p(X_1=x_1)\).
Computational complexity of variable elimination: \(\mathcal{O}((d-1)k^r)\), where \(d\) is the number of variables, \(k\) is the maximum value a variable can take and \(r\) is the number of variables participating in the largest intermediate "factor".
Finding a good ordering can reduce the computational complexity.