#12 K-Medoids Clustering

K-Medoids is an unsupervised machine learning algorithm used for clustering data points into groups.
It is similar to K-Means, but instead of using the mean (centroid) of a cluster, it uses an actual data point called a medoid as the center of the cluster.

K-Medoids is more robust to ... noise, outliers, extreme values.

because medoids are real points from the dataset not the means like the K-means clustering.

What is a Medoid?

A medoid is the data point inside a cluster whose total distance to all other points is minimum.

Unlike K-means :

K-Means → uses average point
K-Medoids → uses actual data point

K-Medoids Objective

The algorithm tries to minimize total distance inside clusters.

Cost Function

$$\mathbf{Cost = \sum_{i=1}^{n} distance(x_i, m_j)}$$

where:

$$\mathbf{x_i = data\ point}$$

$$ \mathbf{m_j = medoid}$$

Steps of K-Medoids

Select K random medoids
Assign points to nearest medoid
Compute total clustering cost
Swap medoids with non-medoid points
Keep the swap if cost decreases
Repeat until cost no longer improves

K-means vs. K-medoids

Feature	K-Means	K-Medoids
Center	Mean (centroid)	Actual data point
Sensitive to outliers	Yes	No
Robustness	Lower	Higher
Speed	Faster	Slower
Best for	Large datasets	Noisy datasets

Example Dataset

Point	X	Y
A	1	2
B	2	3
C	8	9
D	9	10

choose k=2

Step 1 — Initialize Medoids

choose 2 random centroids. suppose C1 = (1,2), C2 = (8,9)

Distance Formula : Eucledian Distance

$$\mathbf{d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}}$$

Step 2 - Cluster Assignment

Point	Distance to (M_1)	Distance to (M_2)	Assigned Cluster
A	0	9.89	Cluster 1
B	1.41	8.48	Cluster 1
C	9.89	0	Cluster 2
D	11.31	1.41	Cluster 2

Step 3 - Medoid Update (Check for better medoid)

Now we test whether replacing the current medoid reduces the total cluster cost.

Current medoid in Cluster 1: A(1,2)

Other possible points : B(2,3)

Calculate Total Cost using A as Medoid

$$\mathbf{d(A,B)=\sqrt{2}\approx1.41}$$

Calculate Total Cost using B as Medoid

$$\mathbf{d(B,A)=\sqrt{2}\approx1.41}$$

You can see the 2 costs are same ... so that the medoid will not be updated.

Actually here we have to find the total costs from a particular point ... as here only 2 points in a single cluster So the changes will not happen because distance from A to B and from B to A always be the same.

So this will be better understandable if there is >2 points per cluster.

Suppose if there is A, B, C in the cluster 1 then we have to calculate total 3 costs

if A is medoid : total cost Cost(A) = d(A,B) + d(A,C)

if B is medoid : total cost Cost(B) = d(B, A) + d(B, C)

if C is medoid : total cost Cost(C) = d(C, A) + d(C, B)

we have to find out min(Cos(A), Cost(B), Cost(C))

suppose Cost(B) is minimum

so the updated medoid will be B.

Advantages of K-Medoids

Robust to outliers
Uses real data points
Better for noisy datasets
Produces stable clusters

Disadvantages

Slower than K-Means
Expensive for large datasets
Requires distance calculations repeatedly

Applications of K-Medoids

Customer segmentation
Medical data clustering
Fraud detection
Recommendation systems
Document clustering

Conclusion

K-Medoids is a powerful clustering algorithm that groups similar data points using actual data points as cluster centers called medoids. Compared to K-Means, it is more resistant to noise and outliers, making it suitable for real-world datasets where extreme values may affect clustering quality.

#12 K-Medoids Clustering

What is a Medoid?