Connecting the dots with heuristics and empirics

Warning: This is kind of an opinionated article, based on the little experience I have. The world is Bayesian, and I may update my beliefs as I come across more intriguing problems in the near future.

One of my professors at Jadavpur University always used to emphasize on “connecting the dots”. He used to say to become an asset to an organisation one needs to master this art. Over the last 16 months of working, heuristics and empirics have been two of my biggest ‘friends’ when it boiled down to connecting the dots.

When I first started working, I was given a variant of a Facility Location Problem. And my manager said something that time: “Try coming up with your own heuristics. They are more flexible.” At first, I found it hard to understand, but over time I realised how right he was.

When it comes to mathematical modelling (and if you’re using an open-source solver), there are a few trade-offs that you need to take care of. If the problem size is huge, and there are too many real-life constraints, then you’re better off using your own heuristics than aiming for an exact solution. I remember one time I wrote a MILP, it almost took 2 hours for the MILP formulation and for the solver to solve it. Then I thought of changing the method, it took me almost 3 days to come up with my own heuristic. The result? The solution time went down to ~100 secs. Yes! Not even kidding! Albeit, the solution cost was higher than the exact solution, but if one wanted to put the whole thing into production then the heuristic would have been the far better choice. I am planning to write an academic paper on this so not going into the details of the method right now.

Next, I remember I had data on the Pincode level for different Indian cities. I had to convert them to latitude and longitude points so that every point would be around the given Pincode, approximately. This has to be one of the most interesting data pre-processing I have done so far. So what did I do? I used numerical methods! More precisely: I formulated the problem through an approximate equation and then used the Newton-Raphson method (learned this in my 3rd sem of Engineering, never thought would come in handy like this). Bam! Solved!

Now, as I had said earlier it is about those trade-offs. So, there will be situations when building a solution based on your heuristic will not be worth it. For me, so far in 15+ months, it had happened once. I had to solve a knapsack problem in R. The Knapsack function in R was erratic, and writing my own logic for that didn’t seem worth it as it was just a sub-problem of a bigger problem. So, I just formulated an LP and solved it. The solution time was very less and the results were satisfactory!

Nowadays, the term “connecting the dots” seems very pertinent to me for these reasons. And two of the best tools to do that have been: heuristics and empirics. The only downside is, that the art of connecting the dots isn’t taught proactively. It is something that you need to acquire. And yes, one more important thing which I strongly feel: before anything, aligning oneself with the business is the most crucial thing. If you’re not doing that in a religious manner, then it will be impossible to take a little help from these two friends!

Solving a capacitated VRP with the nearest neighbor heuristic in R

Google OR Tools have dedicated functions and packages for Python, C++, Java and C#. But there is nothing available for R. So, in this post, I aimed to tackle the Capacitated VRP in R, with the help of the nearest neighbor heuristic. The result obtained here is encouraging. That’s why I am sharing it in the blog post. The main aim here is to even better this approach with open-source collaboration. The Github link has also been shared at the end of this post. Before proceeding further, let’s have a look at some of the details of the VRP function:

  1. The function will accept a list of vehicles with distinctive ids and capacities. It is assumed that the capacity of the vehicles are enough to meet the customer demand for the given day.
  2. The nearest neighbor heuristic is an example of a greedy algorithm. Which signifies that it will search for the local optimum solution. In order to find the best result, a user defined parameter (either can be the total time of iterations or the number of iterations) is introduced and through these iterations, the best solution can be obtained.
  3. In order to obtain the best solution, a cost function needs to be introduced. Here the cost function is designed with a simple assumption that the cost for each vehicle depends on the duration for which they have been hired. And if two vehicles are hired for the same duration then the cost will be more for the vehicle with more capacity.
  4. One advantage of this approach is both homogeneous and heterogenous fleets can be easily accomodated in the model without any effort.
  5. The model can be built by taking into account either distance or time, whatever the designer may like. For distance the Geodist can be used, for travel time OSRM package can be used or any model built for a specific problem statement can be utilised. In this example travel time has been computed considering the speed to be 9 km/hr.

The code is available here: https://github.com/shibaprasadb/Capacitated_VRP_with_nearest_neighbor

Implementing Sweep Algorithm for Constrained Clustering in R

Constrained-based clustering is a unique way of clustering where not only we take into account the similarity of the points but also the specific constraints which they might have or we manually impose.
Sweep algorithm is a popular algorithm used for constrained clustering and vehicle routing problems. The basic idea here is that we will ‘sweep’ the destinations with their similarity and cluster them together. The basic objective here is something like that:

Here we can see the depot will serve each cluster. Now, to implement it in real life, I have made a modification. Not only each point will be under a cluster like the above image but, each cluster will have a specific capacity that it will not exceed. This may become useful if someone is planning to use a single vehicle for a single cluster.
First, I have ordered the points in a clockwise manner with respect to the center. After that, the cluster has been formed with respect to their demands. The R code for implementing it is as below. Let’s first create a dummy data frame and load the tidyverse package.

library(tidyverse)

set.seed(123)
id<- seq(1:600)
lon <- rnorm(600, 88.5, 0.125)
lat <- rnorm(600, 22.4, 0.15)
demand <- round(rnorm(600, 40, 20))

df<- data.frame(id, lon, lat, demand)

Now let’s create a cluster function directly. It will do both the sorting and clustering:

constrained_cluster <- function(df,truck_load=170, lat_median=median(df$lat), lon_median=median(df$lon)){
  df$angle = atan2(df$lat - median(df$lat),
                   df$lon - median(df$lon))
  df<- df[order(df$angle, decreasing = TRUE),]
  d<-0
  cluster_number<-1
  cluster_list<- c()
  i<-1
  while (i <= length(df$angle)){
    d <- d+ df$demand[i]
    if(d<=truck_load){
      cluster_list[i] <- cluster_number
      i<- i+1
    }
    else{
      cluster_number <- cluster_number+1
      d <- 0
      i<-i
    }
  }
  return(cbind(df,as.data.frame(cluster_list)))
}

Next step, let’s call the function and visualize the clusters. Let’s see if they resemble the initial picture:

df_with_cluster<- constrained_cluster(df)

ggplot(data=df_with_cluster) +
  geom_point(aes(x=lon, y=lat, colour=factor(cluster_list)))+
  theme_bw()+
  theme(legend.position = 'none')

Voila! It is done!