cluster analysis - Modifying k-means package in R -
this question has answer here:
is possible modify notion of distance in kmeans package?
i have cyclical data , want use alternative notion of distance such.
k-means shouldn't used other distances. not distance based algorithm. k-means optimizing sum-of-squares. if want proper distance-based k-means variation, called pam. pam proper way of using k-means other distance functions.
because the mean least-squares estimator, optimizes sum-of-squares.
if put different distance function k-means, may stop converging. here counterexample:
consider absolute correlation distance. 2 series
+1 +2 +3 +4 +5 -1 -2 -3 -4 -5
are negatively correlated, have absolute correlation distance 0.
taking mean of these 2 vectors yields
0 0 0 0 0
for correlation isn't defined anymore. if wiggle around numbers slightly, can avoid definition gap, nevertheless the vectors dissimilar mean. because mean optimized squared deviations, , not find best-correlation center.
the point mean. if want use different distance function, need substitute mean function, too. 1 possible choice medoid, yields pam (partitioning around medoids).
Comments
Post a Comment