An Introduction to the Perceptron Algorithm
Haoji Hu
22 November 2011
1 The Problem
Suppose there are N vectors in a p dimensional vector space. Each of these N
vectors belongs to either one of two classes C1 and C2. Our task is to nd a
hyperplane to separate these vectors out based on the classes they belong to.
Figure 1: A hyperplane (or a line) which separates two-dimensional vectors into
two classes.
2 The Mathematical Description
−→
Here we formulate the problem into a more mathematical form. Suppose x1,
x2, ··· , xN are p dimensional vectors. Let’s dene their extended vectors,
−→
x1,
x2, ··· ,
[
]
−→
xN , as follows:
If xi ∈ C1; then
−→
xi =
xi
1
]
[ −xi−1
; if xi ∈ C2; then
−→
xi =
1
All extended vectors are p + 1 dimensional. The problem has been changed
as nding a p + 1 dimensional vector ! such that for each i = 1; 2; :::; N ,
!T−→
xi > 0
3 The Algorithm
The perceptron algorithm has been proposed by Frank Rosenblatt in 1956. It
is simple, but signicant. The algorithm has preluded the area of machine
learning and pattern recognition. The procedure of the algorithm is shown in
Algorithm 1.
−→
x1,
Input:
Output: w
−→
x2, ··· ,
−→
xN
;
0
0
...
0
1 w =
2 F LAG = 1;
3 while FLAG do
4
F LAG = 0;
for i=1:N do
5
6
7
8
9
if !T−→
xi ≤ 0 then
−→
! = ! +
xi;
F LAG = 1
end
end
10
11 end
12 return !;
Algorithm 1: The Perceptron Algorithm
4 The Proof of the Algorithm’s Convergence
−→
−→
−→
x2, ··· ,
−→
x1,
xN in the vector space, if there exists
xi > 0 for all i = {1; 2; :::; N}, then the per-
xi > 0 for all i = {1; 2; :::; N}. The convergence does not depend on the
Theorem 1 For N vectors
an vector !opt such that !T
opt
that !T−→
ceptron algorithm described in Algorithm 1 could nally nd a vector ! such
initial selection of !.
Proof Without loss of generality, we suppose that ∥!opt∥ = 1 (Think about
why we could make such an assumption).
We further dene !(k) as the ! at the kth iteration. Then there are two
cases:
2
1. If !(k)T−→
xi > 0 for all i = {1; 2; :::; N}, the theorem has been proved.
2. Otherwise, there exists at least one i ∈ {1; 2; :::; N} which makes !(k)T−→
xi ≤ 0.
Then based on the perceptron algorithm,
!(k + 1) = !(k) +
−→
xi
, which means that
!(k + 1) − a!opt = !(k + 1) − a!opt + xi
Taking the module of each side, we obtain
∥!(k + 1) − a!opt∥2
−→
=∥!(k) − a!opt +
xi∥2
=∥!(k) − a!opt∥2 + ∥−→
xi∥2 + 2!(k)T−→
xi − 2a!T
−→
xi
opt
Here a is a positive number which we are going to discuss later. Because !(k)T−→
xi ≤ 0,
we can have:
∥!(k + 1) − a!opt∥2
≤∥!(k) − a!opt∥2 + ∥−→
xi∥2 − 2a!T
∥−→
xi∥, and = minN
opt
−→
xi
i=1 (!T
Here we dene = maxN
opt
why?). Then it is easy to prove that when a = 2+1
2 ,
i=1
−→
xi) ( > 0 and > 0,
∥!(k + 1) − a!opt∥2 ≤ ∥!(k) − a!opt∥2 − 1
2 , then ∥a!opt∥ = a = 2+1
If we take a = 2+1
2 . Dene the distance between the
initial vector !(0) and a!opt as D = ∥!(0)− a!opt∥. Based on Eqn (1), for each
iteration, the distance from ! to a!opt has been decreased at least 1, so for at
most D2 iterations, the vector ! will converge to a!opt.
(1)
Thus, we have nished the proof.
3