cluster analysis - Modifying k-means package in R -


this question has answer here:

is possible modify notion of distance in kmeans package?

i have cyclical data , want use alternative notion of distance such.

k-means shouldn't used other distances. not distance based algorithm. k-means optimizing sum-of-squares. if want proper distance-based k-means variation, called pam. pam proper way of using k-means other distance functions.

because the mean least-squares estimator, optimizes sum-of-squares.

if put different distance function k-means, may stop converging. here counterexample:

consider absolute correlation distance. 2 series

+1 +2 +3 +4 +5 -1 -2 -3 -4 -5 

are negatively correlated, have absolute correlation distance 0.

taking mean of these 2 vectors yields

 0  0  0  0  0 

for correlation isn't defined anymore. if wiggle around numbers slightly, can avoid definition gap, nevertheless the vectors dissimilar mean. because mean optimized squared deviations, , not find best-correlation center.

the point mean. if want use different distance function, need substitute mean function, too. 1 possible choice medoid, yields pam (partitioning around medoids).


Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -