\documentstyle[fleqn,epsf,aima-slides]{article}
\begin{document}
\begin{huge}
\titleslide{Belief networks}{Chapter 15.1--2}
\sf
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Outline}
\blob Conditional independence
\blob Bayesian networks: syntax and semantics
\blob Exact inference
\blob Approximate inference
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Independence}
Two random variables $A$ $B$ are (absolutely) \u{independent} iff\al
\phantom{or }$P(A|B) = P(A)$\al
or $P(A,B) = P(A|B)P(B) = P(A)P(B)$\\
e.g., $A$ and $B$ are two coin tosses
If $n$ Boolean variables are independent, the full joint is\al
$\pv(X_1,\ldots,X_n) = \myprod_i \pv(X_i)$\\
hence can be specified by just $n$ numbers
Absolute independence is a very strong requirement, seldom met
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Conditional independence}
Consider the dentist problem with three random variables:\al
$Toothache$, $Cavity$, $Catch$ (steel probe catches in my tooth)
The full joint distribution has $2^3 - 1$ = 7 independent entries
If I have a cavity, the probability that the probe catches in it
doesn't depend on whether I have a toothache:\al
(1) $P(Catch|Toothache,Cavity) = P(Catch|Cavity)$\\
i.e., $Catch$ is \u{conditionally independent} of $Toothache$ given $Cavity$
The same independence holds if I haven't got a cavity:\al
(2) $P(Catch|Toothache,\lnot Cavity) = P(Catch|\lnot Cavity)$
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Conditional independence contd.}
Equivalent statements to (1)
(1a) $P(Toothache|Catch,Cavity) = P(Toothache|Cavity)$ \q{Why}
(1b) $P(Toothache,Catch|Cavity) = P(Toothache|Cavity)P(Catch|Cavity)$ \q{Why}
Full joint distribution can now be written as\al
$\pv(Toothache,Catch,Cavity) = \pv(Toothache,Catch|Cavity) \pv(Cavity)$\nl
= $\pv(Toothache|Cavity)\pv(Catch|Cavity)\pv(Cavity)$\\
i.e., 2 + 2 + 1 = 5 independent numbers (equations 1 and 2 remove 2)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Conditional independence contd.}
Equivalent statements to (1)
(1a) $P(Toothache|Catch,Cavity) = P(Toothache|Cavity)$ \q{Why}
\begin{LARGE}
$P(Toothache|Catch,Cavity)$\al
= $P(Catch|Toothache,Cavity)P(Toothache|Cavity)/P(Catch|Cavity)$\al
= $P(Catch|Cavity)P(Toothache|Cavity)/P(Catch|Cavity)$ (from~1)\al
= $P(Toothache|Cavity)$
\end{LARGE}
(1b) $P(Toothache,Catch|Cavity) = P(Toothache|Cavity)P(Catch|Cavity)$ \q{Why}
\begin{LARGE}
$P(Toothache,Catch|Cavity)$\al
= $P(Toothache|Catch,Cavity)P(Catch|Cavity)$ (product rule)\al
= $P(Toothache|Cavity)P(Catch|Cavity)$ (from 1a)
\end{LARGE}
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Belief networks}
A simple, graphical notation for conditional independence assertions\\
and hence for compact specification of full joint distributions
Syntax:\al
a set of nodes, one per variable\al
a directed, acyclic graph (link $\approx$ ``directly influences'')\al
a conditional distribution for each node given its parents:\nl
$\pv(X_i|Parents(X_i))$
In the simplest case, conditional distribution represented as\\
a \u{conditional probability table} (CPT)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Example}
I'm at work, neighbor John calls to say my alarm is ringing, but
neighbor Mary doesn't call. Sometimes it's set off by minor
earthquakes. Is there a burglar?
Variables: $Burglar$, $Earthquake$, $Alarm$, $JohnCalls$, $MaryCalls$\\
Network topology reflects ``causal'' knowledge:
\vspace*{0.2in}
\epsfxsize=0.5\textwidth
\fig{\file{figures}{burglary2.ps}}
Note: $\leq k$ parents ${} \implies O(d^k n)$ numbers vs. $O(d^n)$
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Semantics}
``Global'' semantics defines the full joint distribution as\\
the product of the local conditional distributions:
\[
\pv(X_1,\ldots,X_n) = \myprod_{i\eq 1}^n \pv(X_i|Parents(X_i))
\]
e.g., $P(J\land M\land A\land \lnot B \land \lnot E)$ \q{is given by}\al
=
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Semantics}
``Global'' semantics defines the full joint distribution as\\
the product of the local conditional distributions:
\[
\pv(X_1,\ldots,X_n) = \myprod_{i\eq 1}^n \pv(X_i|Parents(X_i))
\]
e.g., $P(J\land M\land A\land \lnot B \land \lnot E)$ \q{is given by}\al
= $P(\lnot B)P(\lnot E)P(A|\lnot B \land \lnot E)P(J|A)P(M|A)$
``Local'' semantics: each node is conditionally independent\\
of its nondescendants given its parents
Theorem: Local semantics $\lequiv$ global semantics
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Markov blanket}
Each node is conditionally independent of all others given its\\
\u{Markov blanket}: parents + children + children's parents
\vspace*{0.3in}
\epsfxsize=0.5\textwidth
\fig{\file{figures}{markov-blanket.ps}}
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Constructing belief networks}
Need a method such that a series of locally testable assertions of\\
conditional independence guarantees the required global semantics
1. Choose an ordering of variables $X_1,\ldots,X_n$\\
2. For $i$ = 1 to $n$\al
add $X_i$ to the network\al
select parents from $X_1,\ldots,X_{i-1}$ such that\nl
$ \pv(X_i|Parents(X_i)) = \pv(X_i|X_1,\, \ldots,\, X_{i-1}) $
This choice of parents guarantees the global semantics:\al
$\pv(X_1,\ldots,X_n) = \myprod_{i\eq 1}^n \pv(X_i | X_1,\, \ldots,\, X_{i-1})$ (chain rule)\nl
= $\myprod_{i\eq 1}^n \pv(X_i|Parents(X_i))$ by construction
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Example}
Suppose we choose the ordering $M$, $J$, $A$, $B$, $E$
\vspace*{0.15in}
\epsfxsize=0.4\textwidth
\fig{\sfile{figures}{burglary-make1.ps}}
$P(J|M) = P(J)$?
%%%%%%%%%%%% Overlay %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pheading{Example}
\ptext{Suppose we choose the ordering $M$, $J$, $A$, $B$, $E$}
\vspace*{0.15in}
\epsfxsize=0.4\textwidth
\fig{\sfile{figures}{burglary-make2.ps}}
\ptext{$P(J|M) = P(J)$?}\quad No\\
$P(A|J,M) = P(A|J)$? $P(A|J,M) = P(A)$?
%%%%%%%%%%%% Overlay %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pheading{Example}
\ptext{Suppose we choose the ordering $M$, $J$, $A$, $B$, $E$}
\vspace*{0.15in}
\epsfxsize=0.4\textwidth
\fig{\sfile{figures}{burglary-make3.ps}}
\ptext{$P(J|M) = P(J)$?\quad No}\\
\ptext{$P(A|J,M) = P(A|J)$? $P(A|J,M) = P(A)$?}\quad No\\
$P(B|A,J,M) = P(B|A)$?\\
$P(B|A,J,M) = P(B)$?
%%%%%%%%%%%% Overlay %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pheading{Example}
\ptext{Suppose we choose the ordering $M$, $J$, $A$, $B$, $E$}
\vspace*{0.15in}
\epsfxsize=0.4\textwidth
\fig{\sfile{figures}{burglary-make4.ps}}
\ptext{$P(J|M) = P(J)$?\quad No}\\
\ptext{$P(A|J,M) = P(A|J)$? $P(A|J,M) = P(A)$?\quad No}\\
\ptext{$P(B|A,J,M) = P(B|A)$?}\quad Yes\\
\ptext{$P(B|A,J,M) = P(B)$?}\quad No\\
$P(E|B,A,J,M) = P(E|A)$?\\
$P(E|B,A,J,M) = P(E|A,B)$?
%%%%%%%%%%%% Overlay %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pheading{Example}
\ptext{Suppose we choose the ordering $M$, $J$, $A$, $B$, $E$}
\vspace*{0.15in}
\epsfxsize=0.4\textwidth
\fig{\sfile{figures}{burglary-make5.ps}}
\ptext{$P(J|M) = P(J)$?\quad No}\\
\ptext{$P(A|J,M) = P(A|J)$? $P(A|J,M) = P(A)$?\quad No}\\
\ptext{$P(B|A,J,M) = P(B|A)$?\quad Yes}\\
\ptext{$P(B|A,J,M) = P(B)$?\quad No}\\
\ptext{$P(E|B,A,J,M) = P(E|A)$?}\quad No\\
\ptext{$P(E|B,A,J,M) = P(E|A,B)$?}\quad Yes
%%[[13 numbers (vs.~10), and CPTs hard to assess---use \u{causal} direction!!]]
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Example: Car diagnosis}
Initial evidence: engine won't start\\
Testable variables (thin ovals), diagnosis variables (thick ovals)\\
Hidden variables (shaded) ensure sparse structure, reduce parameters
\vspace*{0.2in}
\epsfxsize=0.85\textwidth
\fig{\file{figures}{car-net.ps}}
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Example: Car insurance}
Predict claim costs (medical, liability, property)\\
given data on application form (other unshaded nodes)
\epsfxsize=0.85\textwidth
\fig{\file{figures}{insurance-net.ps}}
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Compact conditional distributions}
CPT grows exponentially with no.~of parents\\
CPT becomes infinite with continuous-valued parent or child
Solution: \u{canonical} distributions that are defined compactly
\u{Deterministic} nodes are the simplest case:\al
$X = f(Parents(X))$ for some function $f$
E.g., Boolean functions\al
$NorthAmerican \lequiv Canadian \lor US \lor Mexican$
E.g., numerical relationships among continuous variables
\[
\frac{\partial Level}{\partial t} = \mbox{ inflow + precipation
- outflow - evaporation}
\]
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Compact conditional distributions contd.}
\u{Noisy-OR} distributions model multiple noninteracting causes\al
1) Parents $U_1\ldots U_k$ include all causes (can add \u{leak node})\al
2) Independent failure probability $q_i$ for each cause alone\nl
${} \implies
P(X|U_1\ldots U_j,\lnot U_{j+1}\ldots \lnot U_k)
= 1 - \myprod_{i\eq 1}^j q_i$
\begin{center}
\begin{tabular}{|ccc|@{\ \ }l|@{\ \ }l|}
\hline
% .6 .2 .1
\tabhead \makebox[72pt]{$Cold$} & \makebox[72pt]{$Flu$} & \makebox[72pt]{$Malaria$} & $P(Fever)$ & $P(\lnot Fever)$ \\
\hline
\tabtop
F & F & F & $\mbf{0.0}$ & $1.0$\\
F & F & T & $0.9$ & $\mbf{0.1}$\\
F & T & F & $0.8$ & $\mbf{0.2}$\\
F & T & T & $0.98$ & $0.02 = 0.2 \times 0.1$\\
T & F & F & $0.4$ & $\mbf{0.6}$\\
T & F & T & $0.94$ & $0.06 = 0.6 \times 0.1$\\
T & T & F & $0.88$ & $0.12 = 0.6 \times 0.2$\\
\tabbot T & T & T & $0.988$ & $0.012 = 0.6 \times 0.2 \times 0.1$\\
\hline
\end{tabular}
\end{center}
Number of parameters \u{linear} in number of parents
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Hybrid (discrete+continuous) networks}
Discrete ($Subsidy?$ and $Buys?$); continuous ($Harvest$ and $Cost$)
\vspace*{0.2in}
\epsfxsize=0.35\textwidth
\fig{\file{figures}{continuous-net.ps}}
Option 1: discretization---possibly large errors, large CPTs
Option 2: finitely parameterized canonical families
1) Continuous variable, discrete+continuous parents (e.g., $Cost$)\\
2) Discrete variable, continuous parents (e.g., $Buys?$)
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Continuous child variables}
Need one \u{conditional density} function for child variable given
continuous parents, for each possible assignment to discrete parents
Most common is the \u{linear Gaussian} model, e.g.,:
\begin{eqnarray*}
\lefteqn{P(Cost\eq c|Harvest\eq h,Subsidy?\eq true)}\\
& = & N(a_t h + b_t, \sigma_t)(c)\\
&=& \frac{1}{\sigma_t \sqrt{2\pi}}
exp\left(-\frac{1}{2}
\left(\frac{c-(a_t h + b_t)}{\sigma_t}\right)^2
\right)
\end{eqnarray*}
Mean $Cost$ varies linearly with $Harvest$, variance is fixed\\
Linear variation is unreasonable over the full range\al
but works OK if the \u{likely} range of $Harvest$ is narrow
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Continuous child variables}
\threegraph{\file{graphs}{linear-gaussian-true.ps}}{\file{graphs}{linear-gaussian-false.ps}}{\file{graphs}{linear-gaussian-average.ps}}
All-continuous network with LG distributions\al
$\implies$ full joint is a multivariate Gaussian\al
Discrete+continuous LG network is a \u{conditional Gaussian} network
i.e., a multivariate Gaussian over all continuous variables
for each combination of discrete variable values
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Discrete variable w/ continuous parents}
Probability of $Buys?$ given $Cost$ should be a ``soft'' threshold:
\epsfxsize=0.6\textwidth
\fig{\file{graphs}{probit.ps}}
\u{Probit} distribution uses integral of Gaussian:\al
$\Phi(x) = \int_{-\infty}{^x} N(0,1)(x) dx$\al
$P(Buys?\eq true \given Cost \eq c) = \Phi((-c + \mu)/\sigma)$\\
Can view as hard threshold whose location is subject to noise
%%%%%%%%%%%% Slide %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\heading{Discrete variable contd.}
\u{Sigmoid} (or \u{logit}) distribution also used in neural networks:
\[
P(Buys?\eq true \given Cost \eq c) = \frac{1}{1+exp(-2\frac{-c+\mu}{\sigma})}
\]
Sigmoid has similar shape to probit but much longer tails:
\epsfxsize=0.6\textwidth
\fig{\file{graphs}{logit.ps}}
\end{huge}
\end{document}