
K-Means聚类算法 Matlab代码.docx

第1页 / 共3页
第2页 / 共3页
第3页 / 共3页
K-Means 聚类算法 Matlab 代码 function y=kMeansCluster(m,k,isRand) %%%%%%%%%%%%%%%% % % kMeansCluster - Simple k means clustering algorithm % Author: Kardi Teknomo, Ph.D. % % Purpose: classify the objects in data matrix based on the attributes % Criteria: minimize Euclidean distance between centroids and object points % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html % Output: matrix data plus an additional column represent the group of each object % % Example: m = [ 1 1; 2 1; 4 3; 5 4] or in a nice form % % % % % m = [ 1 1; 2 1; 4 3; 5 4] k = 2 % kMeansCluster(m,k) produces m = [ 1 1 1; % % % % Input: 2 1 1; 4 3 2; 5 4 2] m k - required, matrix data: objects in rows and attributes in columns - optional, number of groups (default = 1) isRand - optional, if using random initialization isRand=1, otherwise input any number (default) it will assign the first k data as initial centroids % % % % % % Local Variables % % % % % % % % f c g i - row number of data that belong to group i - centroid coordinate size (1:k, 1:maxCol) - current iteration group matrix size (1:maxRow) - scalar iterator maxCol - scalar number of rows in the data matrix m = number of attributes maxRow - scalar number of columns in the data matrix m = number of objects temp - previous iteration group matrix size (1:maxRow) z - minimum value (not needed) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if nargin<3, isRand=0; end if nargin<2, k=1; end [maxRow, maxCol]=size(m) if maxRow<=k,
y=[m, 1:maxRow] else % initial value of centroid if isRand, p = randperm(size(m,1)); % random initialization for i=1:k c(i,:)=m(p(i),:) for i=1:k c(i,:)=m(i,:) % sequential initialization end else end end temp=zeros(maxRow,1); % initialize as zero vector while 1, d=DistMatrix(m,c); % calculate objcets-centroid distances [z,g]=min(d,[],2); % find group matrix g if g==temp, break; % stop the iteration else end temp=g; % copy group matrix to temporary variable for i=1:k f=find(g==i); if f % only compute centroid if f is not empty c(i,:)=mean(m(find(g==i),:),1); end end end y=[m,g]; end The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here. function d=DistMatrix(A,B) %%%%%%%%%%%%%%%%%%%%%%%%% % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2] % Copyright (c) 2005 by Kardi Teknomo, http://people.revoledu.com/kardi/
% % Numbers of rows (represent points) in A and B are not necessarily the same. % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway), % % A and B must contain the same number of columns (represent variables of n dimensions), % first column is the X coordinates, second column is the Y coordinates, and so on. % The distance matrix is distance between points in A as rows % and points in B as columns. % example: Spacing= dist(A,A) % Headway = dist(A,B), with hA ~= hB or hA=hB % % % % % % % % A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0] dist(A,B)= [ 4.69 5.83; 5.00 7.00; 5.48 7.48; 4.69 5.83] dist(B,A)= [ 4.69 5.00 5.48 4.69; 5.83 7.00 7.48 5.83] %%%%%%%%%%%%%%%%%%%%%%%%%%% [hA,wA]=size(A); [hB,wB]=size(B); if wA ~= wB, error(' second dimension of A and B must be the same'); end for k=1:wA C{k}= repmat(A(:,k),1,hB); D{k}= repmat(B(:,k),1,hA); end S=zeros(hA,hB); for k=1:wA S=S+(C{k}-D{k}').^2; end d=sqrt(S);