Capacitated Clustering with improved K-means in R

Often in a supply chain design, we need to divide the whole customer space into some segments/polygons/clusters so that, each of these segments will not cross a given threshold of demand. It is achieved with a special type of clustering: Capacitated clustering.

Capacitated Clustering refers to a type of clustering where the following two conditions are generally met:

  1. The points of the clusters will be in the neighborhood
  2. The total demand of the cluster will not exceed a certain given upper limit

The condition number 1 can be easily achieved with K-means or Hierarchical clustering but it is impossible to restrain the demand of the clusters to a specific limit with these two.

In general, the Capacitated clustering problem can be expressed in an LP form (Page 3: http://www.din.uem.br/~ademir/sbpo/sbpo2017/pdf/168777.pdf) but it is time-consuming.

I came across a paper (Eetha, S. (2009). Improved K-Means Algorithm for Capacitated Clustering Problem.) where Capacitated clustering was done by a variant of K-means. The logic adopted here seemed pretty straightforward and easy to use. I have modified it in some aspects and embedded that into R. The results I am getting are pretty encouraging.

The function takes 4 inputs:

  1. Input: This should contain the dataframe with the details of the customers with their respective Latitude, Longitude and Demands.
  2. Capacity constraint: This is the upper limit of capacity for each cluster.
  3. relax_c: This is the relaxation constant. Ideally this should be between 0.7-1. This factor will be multiplied by the capacity constraint so that the number of clusters will be more than the optimal number of clusters, hence we will have some flexibility. Default value is 0.8.
  4. iterations: This is the number of iterations to achieve a proper set of centroids for the clusters.

The Github link: https://github.com/shibaprasadb/capacitated_clustering_improved_kmeans

Example:

library(data.table)
set.seed(4)
#Creating a data frame
customer_details <- data.frame(lat=runif(50,77,78),
                               long=runif(50,23,24),
                               demand=runif(50,50,100))
#Calling the function
cap <- capacitated_clustering(customer_details, 500, 0.8, 15)

#Changing the format to convert it into data.table
cap_dt<- data.table(cap)
hulls = cap_dt[,.SD[chull(long,lat)],by=.(cluster)]

#Visualising
ggplot() +
  geom_point(data=cap_dt,aes(x=long,y=lat,color=as.factor(cluster))) +
  geom_polygon(data = hulls,aes(x=long, y=lat, fill=as.factor(cluster),alpha = 0.5))+
  theme_bw()+
  coord_equal()+
  theme_bw()+
  theme(legend.position = 'none')

Few overlapping is there, but overall it looks good!

If this can be improved in any way, then please let me know by leaving a comment.

Advertisement