中国矿业大学 计算机科学与技术学院 《模式识别》实验报告
实验报告要求:1.实验目的 2.实验内容 3.实验原理或步骤 4.运行结果和分析 5.实验体会与总结
一、实验目的
1.掌握监督学习方法与非监督学习方法的区别,2.对动态聚类方法有深入的理解,了解其步骤,3.掌握 C 均值算
法的基本思想、步骤,4.会熟练运用 C 均值算法进行聚类,5、通过 c 均值算法具体实现,熟练掌握其思想。
二、实验内容
写程序实现 c 均值算法,并用下表中的三维数据进行测试,下面给出了每种测试的类别数目和初始值。不
要求编程环境,可以使用 C,MATLAB 等语言来实现。
(1)c=2, m1(0)=(1,1,1)T, m2(0)=(-1,1,-1)T。 (2)c=2, m1(0)=(0,0,0)T, m2(0)=(1,1,-1)T。将(2)得到的结果与(1)
中的结果进行比较,并解释差别,包括迭代次数的差别。
( 3 ) c=3, m1(0)=(0,0,0)T, m2(0)=(1,1,1)T, m3(0)=(-1,0,2)T 。( 4 ) c=3, m1(0)=(-0.1,0,0.1)T, m2(0)=(0,-0.1,0.1)T,
m3(0)=(-0.1,-0.1,0.1)T。将(4)得到的结果与(3)中的结果进行比较,并解释差别,包括迭代次数的差别。
数据如下表所示:
样本编号
1
2
3
4
5
6
7
8
9
10
1x
-7.82
-6.68
4.36
6.72
-8.64
-6.87
4.47
6.73
-7.71
-6.91
2x
-4.58
3.16
-2.19
0.88
3.06
0.57
-2.62
-2.01
2.34
-0.49
3x
-3.97
2.71
2.09
2.80
3.50
-5.45
5.76
4.18
-6.33
-5.68
样本编号
11
12
13
14
15
16
17
18
19
20
1x
6.18
6.72
-6.25
-6.94
8.09
6.18
-5.19
-6.38
4.08
6.27
2x
2.81
-0.93
-0.26
-1.22
0.20
0.17
4.24
-1.74
1.30
0.93
3x
5.82
-4.04
0.56
1.13
2.25
-4.15
4.04
1.43
5.33
-2.78
三、实验原理或步骤
四、
(一)实验原理
首先该实验的前提是知道聚类的个数 C。
设 Ni 是第 i 聚类Γi 中的样本数目,mi 是样本均值
m
i
1
N
yi
i
y
,我们可以设计一个误差平方和聚类准则
Je,表示为:
J
e
c
i
1
y
i
2
my
i
,用该准则函数来度量用 C 个聚类中心 m1、m2、…、mi 代表的 C 个样本子
集Γ1、Γ2、…、Γc 时所产生的总的误差平方。因此为了使结果最优,我们只需使 Je 尽可能的小,越小就证明
结果越优。
(二)实验步骤:
(1)用 matlab 编出这些点的显示程序
(2) 计 算 出 每 个 聚 类 的 均 值 m1 、m2 、 … 、mi , 计 算 公 式 为
m
i
1
N
yi
i
y
; 再 计 算 出 Je , 计 算 公 式 为
J
e
c
i
1
y
i
2
my
i
。其中 y 指的是每一个样本点。
1
(3)选择一个 y,将其从Γi 移动到Γk 中,再重新计算 Je,如果使得 Je 的值变小,则执行移动,如果未变小,
则不执行。
(4)不断重复该迭代算发,知道连续迭代 N 次 Je 均不再变化为止。
程序:(对于划分成两类)
clear all;
a=[-7.28,-4.58,2.71;-6.68,3.16,-3.97;4.36,-2.19,2.09;6.72,0.88,2.80;-8.64,3.06,3.50;-6.87,0.57,-5.45;4.47,-2.62,
5.76;6.73,-2.01,4.18;-7.71,2.34,-6.33;
-6.91,-0.49,-5.68;6.18,2.81,5.82;6.72,-0.93,-4.04;-6.25,-0.26,0.56;-6.94,-1.22,1.13;8.09,0.20,2.25;6.18,0.17,-4.
15;-5.19,4.24,4.04;-6.38,-1.74,1.43;
4.08,1.30,5.33;6.27,0.93,-2.78
];
sum1=[0,0,0];
sum2=[0,0,0];
num=0;%ÏÔʾµü´ú´ÎÊý
m10=[1,1,1];%ÖÐÐÄλÖÃ
m20=[-1,1,-1];%ÖÐÐÄλÖÃ
%m10=[0,0,0];
%m20=[1,1,-1];
m0=[m10;m20];
site=zeros(1,20);%ÓÃÀ´±ê¼ÇÑù±¾µãÊôÓÚÄÄÒ»¸ö¾ÛÀ࣬Ϊ1ÊôÓÚ1¾ÛÀ࣬Ϊ2ÊôÓÚ2¾ÛÀà
m=0;k=0;
Je = 0;Je1 = 0;%Je±íʾ³õʼÎó²îƽ·½ºÍµü´úºóÎó²îºÍ
%Ñù±¾³õʼ»®·Ö
for i=1:20
s=1;%±ê¼ÇÊôÓÚÄÄÒ»¸öÀà
dis0 =(a(i,1)-m0(1,1))^2+(a(i,2)-m0(1,2))^2+(a(i,3)-m0(1,3))^2;%ÇóºÍm1µÄ¾àÀë
j = 2;
dis1 =(a(i,1)-m0(j,1))^2+(a(i,2)-m0(j,2))^2+(a(i,3)-m0(j,3))^2;%ÇóºÍm2µÄ¾àÀë
if (dis1 < dis0)
dis0 = dis1;
s = j;
end
site(1,i) = s;
end
%¼ÆËã³õʼµÄƽ¾ùÖµ¼´¾ÛÀàÖÐÐÄ
for i = 1:20
if(site(1,i) == 1)
sum1 = sum1 + a(i,:);
k = k + 1;
else
sum2 = sum2 + a(i,:);
m = m + 1;
end
end
2
m1=sum1/k;
m2=sum2/m;
for i = 1:20
if(site(1,i) == 1)
Je = Je + (a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2;
else
Je = Je +(a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))^2;
end
end
while(1)
p=0;
for i=1:20
%±ê¼ÇÔÚÕâ´ÎÑ-»·ÖÐʱºòÓеã¸Ä±äµØ·½
sum1 = [0,0,0];sum2 = [0,0,0];
if( site(1,i) == 1 )%´ËÌõ¼þÅжÏÏ൱ÓÚµü´ú¹ý³ÌÖеÚËIJ½
y1 = (a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2*k/(k-1);
y2 =(a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))*m/(m+1);
if(y2 <= y1)
site(1,i) = 2;
num = num + 1;
p=1;
end
else
y1 = (a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))*m/(m-1);
y2 = (a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2*k/(k+1);
if(y2 <= y1)
site(1,i) = 1;
num = num + 1;%µü´ú´ÎÊý¼Ó
p=1;
end
end
%ÖØÐ¼ÆËãÖÐÐÄλÖÃcenter1,center2
k = 0;m = 0;j=1;
while(j<=20)
if(site(1,j) == 1)
sum1 = sum1 + a(j,:);
k = k + 1;
else
sum2 = sum2 + a(j,:);
m = m + 1;
end
j=j+1;
end
m1 = sum1 / k;
3
m2 = sum2 / m;
i=i+1;
end
if(p==0)
break;
end
num=num+1;
end
figure(1)
i=1;
while(i<=20)
if(site(1,i) == 1)
Je1 = Je1 +(a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2 ;
figure(1)
plot3(a(i,1),a(i,2),a(i,3),'g*')
hold on
else
Je1 = Je1 +(a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))^2;
plot3(a(i,1),a(i,2),a(i,3),'r.')
end
i=i+1;
end
Je
Je1
num
site
程序(对于划分成三类)
m10 = [0,0,0] ; m20 = [1,1,1];m30= [-1,0,2]; %ʵÑéÈýµÄ³õʼÖÐÐÄλÖÃ
%m10 = [-0.1,0,0.1] ; m20 = [0,-0.1,0.1];m30 = [-0.1,-0.1,0.1] ;%ʵÑéËĵijõʼÖÐÐÄλÖÃ
m = 0;k = 0;n = 0;%¼Ç¼¾ÛÀàc1,c2,c3µÄÑù±¾Êý
sum1=[0,0,0];sum2=[0,0,0];sum3=[0,0,0];
y1 = 0;y2 = 0;y3 = 0;%·Ö±ð¼Ç¼Îó²îƽ·½ºÍµÄ¼õÉÙºÍÔö¼Ó
Je = 0;Je1 = 0;%Je±íʾ³õʼÎó²îƽ·½ºÍµü´úºóÎó²îºÍ
num = 0;%iteration_num±íʾµü´úµÄ´ÎÊý
a = [-7.82 -4.58 -3.97;-6.68 3.16 2.71;4.36 -2.19 2.09;
6.72 0.88 2.80;-8.64 3.06 3.50;-6.87 0.57 -5.45;
4.47 -2.62 5.76;6.73 -2.01 4.18;-7.71 2.34 -6.33;
-6.91 -0.49 -5.68;6.18 2.81 5.82;6.72 -0.93 -4.04;
-6.25 -0.26 0.56;-6.94 -1.22 1.13;8.09 0.20 2.25;
6.18 0.17 -4.15;-5.19 4.24 4.04;-6.38 -1.74 1.43;
4.08 1.30 5.33;6.27 0.93 -2.78];%Ñù±¾Öµ
site = zeros(1,20);%±ê¼ÇÑù±¾ÊôÓÚÄÄÒ»Àà
m1= [m10;m20;m30];
4
%»®·Ö³õʼÑù±¾¼¯
for i=1:20
l = 1;
current_dis =(a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2;
for j = 2:3
dis = (a(i,1)-m1(j,1))^2+(a(i,2)-m1(j,2))^2+(a(i,3)-m1(j,3))^2;%·Ö±ð¼ÆËãµ½µÚ¶þÀàµÚÈýÀàµÄ¾àÀë
if (dis
end
end
%c¾ùÖµËã·¨¹ý³Ì
while(1)
for i=1:20
d=hist(site,unique(site));
p=0;
sum1 = [0,0,0];sum2 = [0,0,0];sum3 = [0,0,0];
if( site(1,i) == 1 )%´ËÌõ¼þÅжÏÏ൱ÓÚc¾ùÖµËã·¨¹ý³ÌÖеÚËIJ½
if(d(1,1)<=1)
continue;
end
y1 = ((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*k/(k-1);
y2 = ((a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))^2)*m/(m+1);
y3 = ((a(i,1)-m3(1,1))^2+(a(i,2)-m3(1,2))^2+(a(i,3)-m3(1,3))^2)*n/(n+1);
if(y2 < y1 && y2 <= y3)
site(1,i) = 2;
p=1;
elseif(y3 < y1 && y3 <= y2)
site(1,i) = 3;
p=1;
end
elseif( site(1,i) == 2 )
if(d(1,2)<=1)
continue;
end
y1 = ((a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))^2)*m/(m-1);
y2 = ((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*k/(k+1);
y3 = ((a(i,1)-m3(1,1))^2+(a(i,2)-m3(1,2))^2+(a(i,3)-m3(1,3))^2)*n/(n+1);
if(y2 < y1 && y2 <= y3)
site(1,i) = 1;
p=1;
elseif(y3 < y1 && y3 <= y2)
site(1,i) = 3;
p=1;
end
elseif(site(1,i) == 3)
if(d(1,3)<=1)
continue;
end
y1 = ((a(i,1)-m3(1,1))^2+(a(i,2)-m3(1,2))^2+(a(i,3)-m3(1,3))^2)*n/(n+1);
y2 = ((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*k/(k+1);
y3 = ((a(i,1)-m2(1,1))^2+(a(i,2)-m2(1,2))^2+(a(i,3)-m2(1,3))^2)*m/(m+1);
if(y2 < y1 && y2 <= y3)
6
site(1,i) = 1;
p=1;
elseif(y3 < y1 && y3 <= y2)
site(1,i) = 2;
p=1;
end
end
num=num+1;
%ÖØÐ¼ÆËãÖÐÐÄλÖÃcenter1,center2,center3
k = 0;m = 0;n = 0;
for j = 1:20
if(site(1,j) == 1)
sum1 = sum1 + a(j,:);
k = k + 1;
elseif(site(1,j) == 2)
sum2 = sum2 + a(j,:);
m = m + 1;
elseif(site(1,j) == 3)
sum3 = sum3 + a(j,:);
n = n + 1;
end
end
m1 = sum1 / k;
m2 = sum2 / m;
m3 = sum3 / n;
end
if(p==0)
break;
end
end
%ÖØÐ¼ÆËãÎó²îºÍ
figure(1)
for i=1:20
if(site(1,i) == 1)
Je1 = Je1 +((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*m/(m+1);
plot3(a(i,1),a(i,2),a(i,3),'*')
hold on
else
if( site(1,i) == 2)
Je1 = Je1 +((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*k/(k+1);
plot3(a(i,1),a(i,2),a(i,3),'r.')
else
if( site(1,i) == 3)
Je1 = Je1 +((a(i,1)-m1(1,1))^2+(a(i,2)-m1(1,2))^2+(a(i,3)-m1(1,3))^2)*n/(n+1);
plot3(a(i,1),a(i,2),a(i,3),'g.')
7
end
end
end
end
Je
Je1
num
site
四、运行结果和分析
二聚类结果
(1)
10
5
0
-5
-10
5
(2)
0
0
-5
-5
-10
10
5
8