Advanced modelling techniques: big M constraints

Expressing logical conditions in models.

Prerequisites

Advanced modelling techniques: "Redundant" variables

Advanced GMPL techniques: data/logic decoupling

The usual issue

When we are modeling something we often face conditions in the form of "if something holds, another something should be like this". Examples could be:

The domain for the size/capacity/activity level of something is often not just a closed interval, or a simple on/off. It can be a set of disjoint intervals, a very common situation is: 0 or a \([min,max]\) interval. This decision can not be described by a single decision variable, discrete or continuous. This is usually modelled by a binary \(y\) denoting, whether we are at 0 or in the interval, and then a continuous \(x\), that will hold the actual value. In this case, we have the constraint: if \(y=0\) then \(x\) should be 0 too, otherwise \(x\in [min,max]\)
In scheduling we often have constraints in the form of if two tasks are assigned to the same resource, their processing time should not overlap
When writing a model, we often end up with bilinear terms, i.e., a product of a continuous and an integer (typically binary) variable. Sucha a model is not linear, however, it can be converted to a linear one by assigning a continuous variable to the product, and say, that if the binary variable is 0, it should be 0, otherwise, it should be equal to the original continuous variable
Sometimes a binary decision is dependent on another. Let's say it can be decided to which site to sell a worker, and at each site, we can decide, which tasks he/she will perform. Obviously, one cannot perform a task at a location he/she did not visit.

The problem is, there is no conditional "language element" of MILP models. Luckily, there is a modeling trick, that could be applied to express the same thing, as discussed below. First, however, let's see an example where this is needed:

Repairman allocation problem

We provide technical support for some companies located in different cities. On each site, there are some issues, that we have to fix today. We work with specialist contractors, whom we will call repairmen from now on. For each day we have to tell in advance to these repairman whether we will be hiring them for that day, and if we do, where they should go, and which issues should they fix. The shift of a repairman can be at most 12 hours, and we have to consider, that the travel overhead of each location is 1 hour in average.

For the moment we will make some assumptions:

each repairman can fix any of the issues
an issue is fixed within the same time, regardless of the repairman
the wage of the repairman are the same
the traveling time between any two sites is the same
there is no timing involved, and no dependency between the issues

Later on, you can generalize the problem by removing some of these assumptions, and alter the model accordingly.

Now, the goal is to minimize the salary we have to pay, that is basically the hourly wage multiplied by all the travel time, and time used for fixing issues. Now, let's start making our model

The input data for the problem is given with these sets and parameters:

    set Sites;
    param traveltime;
    set Issues;
    param site{Issues} symbolic, within Sites;
    param time{Issues} >=0;
    set Repairmen;
    param maxhours;
    param wage;

The variables are self evident:

    var visits{Repairman,Sites} binary;
    var fixes{Repairman,Issues} binary;

As for the constraints: first, each and every one of the issues need to be fixed:

    s.t. FixAllIssues{i in Issues}:
      sum{r in Repairmen} fixes[r,i] = 1;

A repairman can not work more than the given limit:

    s.t. MaxWorkLoad{r in Repairmen}:
      sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i]) <= maxhours;

The objective is simple, minimizing the total cost, which will be pretty similar to the lhs of the last constraint, only summed for all of the repairmen:

    minimize TotalCost:
      wage * sum {r in Repairmen} (sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i]));

If we solver this model, we recognize, that all of the visits[r,s] variables are 0 in the optimal solution. The reason for that is, that we have not expressed yet in any of the constraints, that: A repairman can only fix an issue, if he/she visits its site.

Translating that to our notation, it means, that: fixes[r,i] can only be 1, if visits[r,site[i]] is also 1. Maybe the following equivalent statement from opposite direction is more straight forward: if a repairman does not visit a site, he/she can not fix any of the issues there., i.e., if visits[r,s]==0 then fixes[r,i] must be 0 as well for all i such that site[i]==s. We could be tempted to express this in the following way:

    s.t. NoMagicalTeleportation{r in Repairmen, s in Sites, i in Issues : visits[r,s]==0 && site[i]==s}:
      fixes[r,i]=0;

The problem with this constraint is, that the filtering is based not only on "static" parameters and sets, but "dynamic" variables as well. This would mean, that depending on the values of the variables, some constraints are added/removed during the optimization procedure. Well, that's not how MILP solvers work, and glpsol will give us back an error message.

So we have to find another way. Let's start by putting our idea down in an informal way:

    s.t. NoMagicalTeleportation{r in Repairmen, s in Sites, i in Issues: site[i]==s}:
      if visits[r,s]==0 then fixes[r,i]=0;

First of all, there is no need for the s in Sites part, as there will always be only a single s for each i:

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      if visits[r,site[i]]==0 then fixes[r,i]=0;

The core idea for our trick is to include the "deciding" binary variable, visits[r,site[i]] into the constraint fixes[r,i]=0 in such a way, that when it is 0, then it "disappears", and when it is 1, the constraint becomes "relaxed", meaning, that it will not actually state anything.

Relaxing an equality is kind of troublesome. So when possible, we should try to figure out, whether we can state the same with an inequality or not. In the worst case, an \(lhs=rhs\) equality can be transformed to the conjunction of two inequalities: \(lhs\ge rhs\) and \(lhs\le rhs). However, in this case, the following will simply do:

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      if visits[r,site[i]]==0 then fixes[r,i]<=0;

Since fixes[r,i] can only be 0 or 1, "equals to 0" is equivalent to "is equal or less than 0". Now, drumroll, visits[r,site[i]] can be included in the constraint like this:

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      fixes[r,i]<=0          + visits[r,site[i]];

Now, let us examine this constraint:

If visits[r,site[i]] is 0, then it "disappears" from the right side, and the constraint fixes[r,i]<=0 remains, as it should.
If visits[r,site[i]] is 1, then the RHS becomes 1, and the constraint is essentially fixes[r,i]<=1. Since fixes[r,i] is a binary variable, that can be at most 1, this constraint doesn't really say anything, as intended.

As we can see, the above constraint does exactly what we needed, and implements that conditional-like behaviour. Of course, it could be simplified like this:

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      fixes[r,i]<=visits[r,site[i]];

Pimping the model a little bit

Putting together everything from above, we have:

    set Sites;
      param traveltime;
    set Issues;
      param site{Issues} symbolic, within Sites;
      param time{Issues} >=0;
    set Repairmen;
      param maxhours;
      param wage;

    var visits{Repairman,Sites} binary;
    var fixes{Repairman,Issues} binary;

    s.t. FixAllIssues{i in Issues}:
      sum{r in Repairmen} fixes[r,i] = 1;

    s.t. MaxWorkLoad{r in Repairmen}:
      sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i]) <= maxhours;

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      fixes[r,i]<=visits[r,site[i]];

    minimize TotalCost:
      wage * sum {r in Repairmen} (sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i]));

Now let us assume a more realistic scenario: if we hire someone for the day, we will have to pay him/her at least 8 hours work, even if a single issue is solved in 3 hours for example.

To implement this, first it is beneficial to introduce a redundant variable describing, how much a person has actually worked. This will also simplify the MaxWorkLoad constraint and the objective function:

    set Sites;
      param traveltime;
    set Issues;
      param site{Issues} symbolic, within Sites;
      param time{Issues} >=0;
    set Repairmen;
      param maxhours;
      param wage;

    var worked{Repairmen} >= 0;  # <--- NEW REDUNDANT VARIABLE
    var visits{Repairman,Sites} binary;
    var fixes{Repairman,Issues} binary;

    s.t. FixAllIssues{i in Issues}:
      sum{r in Repairmen} fixes[r,i] = 1;

    s.t. Workload{r in Repairmen}: #<--- new constraint to set the new variable
      worked[r] = sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i])
      
    s.t. MaxWorkLoad{r in Repairmen}: # <--- SIMPLIFIED CONSTRAINT
       worked[r] <= maxhours;

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      fixes[r,i]<=visits[r,site[i]];

    minimize TotalCost: # <--- SIMPLIFIED OBJECTIVE
      wage * sum {r in Repairmen} worked[r];

Now, we should somehow implement the "at least 8 hours rule". The first idea could be to change the declaration of the new variable like this:

    var worked{Repairmen} >= 8;

This would, however require all of the repairman to work at least 8 hours, which is double-bad:

Not everyone should work 8 hours.
The rule said nothing about working at least 8 hours, only about paying at least 8 hours.

From the second point above, it is straight forward, that we need a new variable, that tells, how much is paid to a person:

    var paid{Repairmen} >=0;

Then the objective can easily be changed:

    minimize TotalCost: 
      sum {r in Repairmen} paid[r];

And for the moment, lets's connect paid[r] with worked[r] like this:

    s.t. Payment{r in Repairman}:
      paid[r]=wage*worked[r];

With all these changes, the model does exactly the same so far.

However, now we have the opportunity to implement the "at least 8 hours" rule. First, let us introduce the new parameter:

    param minhourspaid;

Then we can say in the variable declaration, that:

    var paid{Repairmen} >= wage * minhourspaid;

And we need to relax the above constraint like this:

    s.t. Payment{r in Repairman}:
      paid[r]>=wage*worked[r];  # <--- GREATER-EQUAL INSTEAD OF EQUAL

What we achieved like this is that paid[r] is now greater or equal than wage*worked[r] and wage * minhourspaid. So basically, it is at least as much as the maximum of the two.

Note, that paid[r] can now be larger than wage*worked[r] even if worked[r]>minhourspaid. Basically, the model allows to pay more to a person than he/she deserved/worked for. This is, however not a problem, as we want to minimize the costs, so such solutions, even though feasible, would not lead to optimal solutions.

Another problem still remains, however. This way, everyone needs to be paid at least 8 hours, even those who are not hired for the day. To put it this way: now paid[r] is always greater-or equal than wage*worked[r], which is fine, but it is also always greater-or equal than wage * minhourspaid, which is not. We should remove that initial bound from the variable declaration, and have a constraint like this instead:

    var paid{Repairmen} >= 0; # <--- CHANGED LOWER BOUND TO 0

    ...

    s.t. MinimumWorkHours{r in Repairman}:
      if r is hired for the day then: paid[r] >= wage * minhourspaid;

Now we have a new problem (we just can't run out of them it seems), that we have no expression describing, whether someone is hired or not. A general rule of thumb: if there is no easy expression for something, it should probably be introduced as a variable, even if it is redundant:

    var paid{Repairmen} >= 0;
    var hired{Repairmen} binary;

    ...

    s.t. MinimumWorkHours{r in Repairman}:
      if hired[r]==1 then: paid[r] >= wage * minhourspaid;

Since the RHS is only a constant expression (no variables included), we can easily "resolve" this if:

    var paid{Repairmen} >= 0;
    var hired{Repairmen} binary;

    ...

    s.t. MinimumWorkHours{r in Repairman}:
      paid[r] >= wage * minhourspaid * hired[r];

It is easy to see, that if r is hired, the intended constraint holds, otherwise, it just says, that paid[r]>=0, which again, doesn't say anything at all.

We are not ready yet though, since hired[r] has no connection with the other binary variables, so the model allows a repairman not to be hired while still solving issues. What we would like to express: r can only solve issues or travel to sites if he/she is hired for the day. Putting it to a more formal way: visits[r,s] or fixes[r,i] can only be 1, if hired[r]=1. Again, it may be simpler to look at it from the other direction: if hired[r]==0 then all visits[r,s],fixes[r,i] variables must be 0 as well:

    s.t. CannotWorkIfNotHired{r in Repairmen}:
      if hired[r]==0 then all visits[r,s],fixes[r,i] must be 0;

There is an easy way of telling that a bunch of binary variables should be 0: their sum should be 0, or (since inequality is more loved - see above) less or equal than 0. So:

    s.t. CannotWorkIfNotHired{r in Repairmen}:
      if hired[r]==0 then sum{s in Sites }visits[r,s] + sum {i in Issues}fixes[r,i] <= 0;

Now we only need to resolve that conditional. The RHS should be 0 if hired[r] is 0, otherwise, it should be a big number to make the constraint "meaningless". So something like this would do:

    s.t. CannotWorkIfNotHired{r in Repairmen}:
      sum{s in Sites }visits[r,s] + sum {i in Issues}fixes[r,i] <= ReallyBigNumber * hired[r];

We can see, that this equation would do exactly what we want it to do, we just have to give a value to ReallyBigNumber, for example:

    param ReallyBigNumber := 1000;

But how do we know that 1000 is enough? What if we have 100000 sites and 99999999999 issues? Well, our model would probably never be solved on current machines for start, but putting that aside, even if r is hired, He could work on at most 1000 issues, regardless how quick they are. In practice, selecting such a value is usually easy, as we know roughly the range of other parameters, and the context in which it will be used. But, for the mathematician-hearted people that is not a satisfactory response. We can often be theoretically correct too. Here the sum of the sizes of the sets Sites and Issues is a good-enough value. And that can actually be expressed in GMPL:

    param ReallyBigNumber := card(Sites)+card(Issues);

One final touch: this parameter is often denoted with M, a capital/big M, that is large/big. Thus, these kind of constraints are often referred to as big-M constraints.

The final model

    set Sites;
      param traveltime;
    set Issues;
      param site{Issues} symbolic, within Sites;
      param time{Issues} >=0;
    set Repairmen;
      param maxhours;
      param wage;

    param M:=card(Sites)+card(Issues)

    var worked{Repairmen} >= 0;
    var paid{Repairmen} >= 0;
    
    var hired{Repairmen} binary;

    var visits{Repairman,Sites} binary;
    var fixes{Repairman,Issues} binary;

    s.t. FixAllIssues{i in Issues}:
      sum{r in Repairmen} fixes[r,i] = 1;

    s.t. Workload{r in Repairmen}: 
      worked[r] = sum{s in Sites} (traveltime*visits[r,s]) + sum{i in Issues}(time[i]*fixes[r,i])
      
    s.t. MaxWorkLoad{r in Repairmen}:
       worked[r] <= maxhours;
       
    s.t. Payment{r in Repairman}:
      paid[r]=wage*worked[r];

    s.t. NoMagicalTeleportation{r in Repairmen, i in Issues}:
      fixes[r,i]<=visits[r,site[i]];

    s.t. MinimumWorkHours{r in Repairman}:
      if hired[r]==1 then: paid[r] >= wage * minhourspaid;

    s.t. CannotWorkIfNotHired{r in Repairmen}:
      sum{s in Sites }visits[r,s] + sum {i in Issues}fixes[r,i] <= M * hired[r];
      
    minimize TotalCost: 
      sum {r in Repairmen} paid[r];

Several forms of big-M constraints / logical conditionals

In the example above we have seen several different ways of implementing logical conditionals in simple linear constraints. Here some of the most commonly used ones are summarized without aiming for any kind of completeness. It is always assumed that \(y,y',y^*,y_1,y_2,\dots,y_n\) are binary variables.

If \(y=1\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot (1-y)\]
If \(y=0\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot y\]
If \(y=1\) then \(LHS = RHS\): \[ LHS \ge RHS - M \cdot (1-y)\] \[ LHS \le RHS + M \cdot (1-y)\]
If \(y=0\) then \(y'\) must be 0: \[y' \le y\]
If \(y=1\) then \(y'\) must be 0: \[y' \le 1-y\]
If \(y=0\) then \(y_1,y_2,\dots,y_n\) must be 0: \[y_1+y_2+\dots+y_n \le M \cdot y\]
If \(y=0\) then \(y_1,y_2,\dots,y_n\) must be 1: \[y_1+y_2+\dots+y_n \ge n \cdot (1- y)\]
If \(y=1\) and \(y'=1\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot (2-y-y')\]
If \(y=0\) and \(y'=0\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot (y+y')\]
If \(y=0\) and \(y'=1\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot (y+ (1-y')\]
If \(y=1\) and \(y'=0\) and \(y^*=1\) then \(LHS \ge RHS\): \[ LHS \ge RHS - M \cdot ( (1-y) +(y') +(1-y^*))\]
\(y^*\) must be 1 if and only if \(y=1\) and \(y'=1\) \[ y^* \ge -1 + y + y' \] \[y^* \le \frac{y+y'}{2} \]

Bilinear terms

A bilinear term is something like \(y \cdot x\) where \(y\) and \(x\) are both variables. If one of the terms is binary, and the other is, for example, a non-negative continuous variable, the model can be linearized using big M constraints, as follows. The term itself should be replaced by a new non-negative continuous variable \(z\), and the model should be extended with the following constraints: \[z \le x \] \[z \le M \cdot y\] \[z \ge x - M \cdot (1-y)\]

If \(y=1\) then \(z\) should just be \(x\). In this case, the above equations reduce to this: \[z \le x \] \[z \le M \] \[z \ge x \] The second doesn't do anything, the other two set \(z\) to \(x\).
If \(y=1\) then \(z\) should just be 0. In this case, the above equations reduce to this: \[z \le x \] \[z \le 0 \] \[z \ge -M (+x) \] The second is stronger than the first, and sets \(z\) to 0. The third doesn't do anything.

A note on efficiency

Big-M constraints can slow down the solution of an MILP problem, as they usually do not contribute well to the bounding of a subproblem. Thus, as a general thumb of rule: only use such constraints when necessary, and if so, try not to use unnecessarily big values in the constraints. Like don't write \(y'\le 1000 \cdot y\) instead of just \(y'\le y\). It is not guaranteed, that big-M constraints will make Your model slow, sometimes they don't, but it is generally good to avoid them if possible.

Final notes

The goal of this lecture was to introduce a modeling technique that allows seemingly unmodellable conditionals to be translated to standard linear constraints. In practice, this technique is used very extensively, as it is often needed.