Tags: , ,

Categories:

Updated:


K-MeansPermalink

algorithmPermalink

  1. taking K random points as centroids.
  2. For each point, decide which centroid is closer, which forms clusters
  3. Move centroids to the mean of the clusters
  4. repeat 2-3 until centroids are not moving anymore

K-Means++Permalink

It is smart initialization method. When adding one more point, no optimal is often happen if two points are close. Thus, pick next point with probabilty proportional to distance from the centroid.

distance(xi)2)/j=1Kdistance(xj)2distance(x_i)^2)/\sum_{j=1}^{K}distance(x_j)^2

Choosing right KPermalink

e.g.)
K = cpu core
K is fixed to target number of clusters

IntertiaPermalink

Similar values corresponds to tighter clusters. But Value senstivie to number of points in cluseter

i=1Kdistance(xi,ci)2\sum_{i=1}^{K}distance(x_i,c_i)^2

DistortionPermalink

Adding more points will not increase distortion.

1/ni=1ndistance(xi,ci)21/n \sum_{i=1}^{n}distance(x_i,c_i)^2

Elbow MethodPermalink

We initiate K-mean several times, and choose the one with lowest distortion or inertia. Find a point where the decrement of value is lower.

Choosing K

CodePermalink

from sklearn.cluster import KMeans, MiniBatchKMeans

kmeans = Kmeans(n_clusters=3, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans = kmeans.fit(X1)
y_predict = kmeans.predict(X2)

InertiaPermalink

inertia = []
list_clusters = list(range(10))
for k in list_clusters:
    kmeans = KMeans(n_clusters = k)
    kmenas.fit(X1)
    inertia.append(kmeans.inertia_)

Leave a comment