logo资料库

K means clustering algorithm.doc

第1页 / 共4页
第2页 / 共4页
第3页 / 共4页
第4页 / 共4页
资料共4页,全文预览结束
function y=kMeansCluster(m,k,isRand) %%%%%%%%%%%%%%%% % % kMeansCluster - Simple k means clustering algorithm % Author: Kardi Teknomo, Ph.D. % % Purpose: classify the objects in data matrix based on the attributes % Criteria: minimize Euclidean distance between centroids and object points % For more explanation of the algorithm, see http://people.revoledu.com/kardi/tutorial/kMean/index.html % Output: matrix data plus an additional column represent the group of each object % % Example: m = [ 1 1; 2 1; 4 3; 5 4] % % % % % % kMeansCluster(m,k) produces m = [ 1 1 1; 2 1 1; % % 4 3 2; % 5 4 2] % Input: % columns % % input any number (default) % % % Local Variables % % % % % attributes % objects % % f c g i maxCol - scalar number of rows in the data matrix m = number of - row number of data that belong to group i - centroid coordinate size (1:k, 1:maxCol) - current iteration group matrix size (1:maxRow) - scalar iterator - optional, number of groups (default = 1) k isRand - optional, if using random initialization isRand=1, otherwise maxRow - scalar number of columns in the data matrix m = number of temp z - previous iteration group matrix size (1:maxRow) - minimum value (not needed) m - required, matrix data: objects in rows and attributes in or in a nice form m = [ 1 1; 2 1; 4 3; 5 4] k = 2 it will assign the first k data as initial centroids
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% if nargin<3, if nargin<2, isRand=0; k=1; end end [maxRow, maxCol]=size(m) if maxRow<=k, y=[m, 1:maxRow] else % initial value of centroid if isRand, p = randperm(size(m,1)); for i=1:k c(i,:)=m(p(i),:) % random initialization end else end end for i=1:k c(i,:)=m(i,:) % sequential initialization temp=zeros(maxRow,1); % initialize as zero vector while 1, d=DistMatrix(m,c); [z,g]=min(d,[],2); if g==temp, % calculate objcets-centroid distances % find group matrix g break; % stop the iteration else temp=g; % copy group matrix to temporary variable end for i=1:k f=find(g==i); if f % only compute centroid if f is not empty c(i,:)=mean(m(find(g==i),:),1); end end end y=[m,g]; end
The Matlab function kMeansCluster above call function DistMatrix as shown in the code below. It works for multi-dimensional Euclidean distance. Learn about other type of distance here. function d=DistMatrix(A,B) %%%%%%%%%%%%%%%%%%%%%%%%% % DISTMATRIX return distance matrix between points in A=[x1 y1 ... w1] and in B=[x2 y2 ... w2] % Copyright (c) 2005 by Kardi Teknomo, http://people.revoledu.com/kardi/ % % Numbers of rows (represent points) in A and B are not necessarily the same. % It can be use for distance-in-a-slice (Spacing) or distance-between-slice (Headway), % % A and B must contain the same number of columns (represent variables of n dimensions), % first column is the X coordinates, second column is the Y coordinates, and so on. A=[1 2 3; 4 5 6; 2 4 6; 1 2 3]; B=[4 5 1; 6 2 0] dist(A,B)= [ 4.69 5.00 5.48 4.69 % The distance matrix is distance between points in A as rows % and points in B as columns. % example: Spacing= dist(A,A) % Headway = dist(A,B), with hA ~= hB or hA=hB % % % % % % % % %%%%%%%%%%%%%%%%%%%%%%%%%%% [hA,wA]=size(A); [hB,wB]=size(B); if wA ~= wB, error(' second dimension of A and B must be dist(B,A)= [ 4.69 5.83 5.00 7.00 5.83; 7.00; 7.48; 5.83] 5.48 7.48 4.69; 5.83] the same'); end for k=1:wA C{k}= repmat(A(:,k),1,hB); D{k}= repmat(B(:,k),1,hA); end S=zeros(hA,hB); for k=1:wA S=S+(C{k}-D{k}').^2;
end d=sqrt(S);
分享到:
收藏