Machine Learning
!
!
!
!
! Srihari
Max-Sum Inference Algorithm
Sargur Srihari
srihari@cedar.buffalo.edu
1 1
Machine Learning
!
!
!
!
! Srihari
The max-sum algorithm
• Sum-product algorithm
– Takes joint distribution expressed as a factor graph
– Efficiently finds marginals over component variables
• Max-sum addresses two other tasks
1. Setting of the variables that has the highest
probability
2. Find value of that probability
• Algorithms are closely related
– Max-sum is an application of dynamic programming
to graphical models
2
Machine Learning
!
Finding latent variable values
!
!
!
! Srihari
having high probability
• Consider simple approach
– Use sum-product to obtain marginals for every
p(xi
)
variable
ix
– For each variable find value that maximizes marginal
• This would give set of values that are individually
ix
*
most probable
• However we wish to find vector that
maxx
maximizes joint distribution, i.e.
p(x)
arg
x max
x
max =
• With join probability
p(x
max =
)
xmax
p(x)
3
Machine Learning
!
!
!
!
! Srihari
Example
• Maximum of joint distribution
– Occurs at x=1, y=0
– With p(x=1,y=0)=0.4
• Marginal p(x)
– p(x=0) = p(x=0,y=0)+p(x=0,y=0)=0.6
– p(x=1) = p(x=1,y=0)+p(x=1,y=1)=0.4
• Marginal p(y)
– P(y=0)=0.7
– P(y=1)=0.3
x=0 x=1
p(x,y
)
y=0 0.3
y=1 0.3
0.4
0.0
4
• Marginals are maximized by x=0 and y=0 which
•
corresponds to 0.3 of joint distribution
In fact, set of individually most probable values
can have probability zero in joint
Machine Learning
!
!
!
!
! Srihari
Max-sum principle
• Seek efficient algorithm for
– Finding value of x that maximizes p(x)
– Find value of joint distribution at that x
=
x
max
• Second task is written
max
Mx
p(x)
where M is total number of variables
• Make use of distributive law for max operator
max
x
1
p(x)
...
max
(ab,ac)
–
a
– Which holds for
– Allows exchange of products with maximizations
max
0≥a
(bc)
=
5
Machine Learning
!
!
!
!
! Srihari
Chain example
1
Z
• Markov chain joint distribution has form
p(x)
=
(xψ
,
21
1
)ψ,x
2
(x
2
,
x
3
,
32
)...ψ
(x
N
,x
N
)
1
−
,NN
1
−
• Evaluation of probability maximum has form
=
p(x)
max
)x
N,N
• Exchanging max and product operators
(xψ
,
1
21
max
x
1
max
x
)ψ,x
)...ψ
(x
2
,x
3
,NN
1
−
(x
...
,
32
1
−
2
x
N
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been
corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then
insert it again.
1
Z
p(x)
max
1
Z
– Results in
=
x
max
x
1
⎡
(xψ
,
1
21
⎢⎣
,x
2
⎡
)
⎢⎣
...
max
x
N
ψ
(x
,x
N
N
1
−
N
1
−
,N
⎤
)
⎥⎦
⎤
⎥⎦
• More efficient computation
• Interpreted as messages passed from node xN to node
x1
6
Machine Learning
!
!
!
!
! Srihari
Generalization to tree factor graph
• Substitution factored graph expansion
p(x)
p(x)
∏=
s
=
(xf
)
s
s
...
max
x
1
x
max
• Into
• And exchanging maximizations with products
• Final maximization is performed over product
max
Mx
p(x)
of all messages arriving at the root node
• Could be called the max-product algorithm
7
Machine Learning
!
!
!
!
! Srihari
Use of log probabilities
• Products of probabilities can lead to numerical
underflow problems
• Convenient to work with logarithm of joint
distribution
• Has the effect of replacing products in max-
product algorithm with sums
• Thus we obtain the max-sum algorithm
8