Advanced GMPL techniques: data/logic decoupling

Introduction to using sets and parameters in GMPL in order to separate data from the logic

Prerequisites

Our previous good model is actually bad

The models we created so far are correct, there is no mistake about that. However, when building a mathematical model, we should pay attention to how "manageable" the code is in the future. Similar to programming, there is a huge difference between a code that just works, and a code that is easy to use if later modifications are needed.

Let us have a look at the festival model again:

var y1 binary;
var y2 binary;
var y3 binary;
var y4 binary;
var y5 binary;

s.t. Haggard: y1 + y3 + y4 >= 1;
s.t. Stratovarius: y1 + y2 + y3 + y5 >= 1;
s.t. Epica: y1 + y2 + y4 + y5 >= 1;
s.t. Dalriada: y3 + y4 >= 1;
s.t. Apocalyptica: y4 >= 1;
s.t. Liva: y2 + y3 + y4 +y5 >= 1;
s.t. Eluveitie: y3 + y5 >= 1;

minimize NumberOfFestivals: y1 + y2 + y3 + y4 + y5;

We use our model, plan our summer, everything is fine. Then, a new festival F6 is announced, where Haggard, Epica, and Apocalyptica will perform. What kind of changes would be needed for the model?

A new variable y6 will be needed.
In the constraints of Haggard, Epica, and Apocalyptica, an additional +y6 is needed to be added
The objective function needs an additional +y6 too.

This is a lot of changes, in different places of our code. But still more-or-less manageable. Now imagine, that we want to use our model one year later for the next summer as well. Our favorite bands may have changed, the set of interesting festivals may remain the same, but their list of bands will definitely change. We can, obviously, build a model similar to this one using the same logic, but we feel, that we would be doing something unnecessary repeatedly. This repetition, let us call it redundancy is hardly ever a sign of a quality code.

What we feel here is, that there are two things in our model, that are "interlaced" in the same file:

Logic: The "soul" of our model. What are the decisions to be made, how are they represented with variables, what kind of constraints are defined on them, what is our goal, etc.
Data: The "physical body" of our model. Which festivals do we have, which bands do we like, where do they perform.

The logic of the model is the same regardless of the actual data, and such, it should be separated from it. This is similar to programming, where we do not want to include our database in our source code. Besides redundancy, it is really easy to make semantic mistakes, when changing the data can alter the logic of the model as well. (Imagine to put somewhere a minus instead of plus, or write y23 instead of y32 somewhere. It will be really difficult to debug and find this kind of mistake in a bigger model.) GMPL allows this kind of separation of the code into model and data section (or even files) but let us go back to the mathematical realm of optimization briefly.

Problems, problem classes, instances

Looking at the same thing from a different perspective: if we have a model for this years festivals, another one for next years festivals, and a third for the fröccs example, then we would know that the first two are "similar" and the third is "different". We would may even say, that the first two models are of the same type, but different instances. But how can we define when two models are similar or different? What is the exact dividing factor?

After the introduction in the previous section, it is easy to see, that the first two models share the same logic, but have different data, while the third one has a completely different model as well. In practice, it is very common, that we have a lot of instances of the problems with the same logic but different data. For example: planning the schedule of a factory for the next week is always a similar task, but each week has different set of products, deadlines, initial resource stocks, etc.

As a result, we are usually more interested to provide a general solution for a type of problems, not just a single instance. On this page, we will use the term Problem class to denote the set of all of the problems, that are of the same type, i.e., share the same logic. The problem class itself has no data, but it has "features", that are shared among all of its members. Each element of a problem class is termed a problem instance, which already have the data as well.

We will generally want to provide generic models (model logic) to a problem class, rather than to an instance, and then it can be used for all of the instances in that class. In that sense, a generic model can be considered as a "recipe" which can be used to generate a complete model, once the data is provided for it. But let us have a hands on experience now:

Generic mathematical model for the Festivals example

To create our generic model, we need to scrap away the instance specific data, and describe it in a generic manner:

There will be five feastivals this summer, this is a problem-specific data. In general, we could say, that there will be a set of festivals, and we could denote that with \(S\).
We are interested in the bands Haggard, Stratovarius, etc. That is again, something very specific, Your music taste is probably different from mine. In general we could say, that there is a set of interesting bands, and we could denote that with \(B\).
Finally, we know, that Haggard will perform at festivals 1, 3 and 4. In general we could say, that for each band we have a subset of festivals, where it will perform, we may denote it by \(F_b\) for each \(b\in B\). While this is true, having a set of sets is a bit complex, so a simpler way to look at it would be to have a parameter, that describes whether a band \(b\in B\) performs at a festival \(f\in F\), and it could be denoted by \(p_{b,f}\). Either way is fine.

Note, that if there is something, that has to be defined for all values from a set, that will only have value in a problem instance, we put a dummy index into the subscript of that something. A subscript is like a key for an associative array in programming languages. On the contrary, when we will use superscripts, it will be the part of the name of the variable, not an index.

Now, we have a generic problem definition: Given a set of bands \(B\), a set of festivals \(F\), and a subset of festivals \(F_b\subseteq F\) for all \(b\in B\), we should find a minimal set of festivals \(F^*\) such that each \(F_b\) has at least one element in it.

To make our model, we just have to follow the previously learnt 3 steps, only do it in a general way:

Step 1: Variables: We still have to make the same decision: whether we go to a festival or not. We do not know, how many festivals there will be in a problem instance, but we can say, that we will declare a binary variable \(x_f\) for all of them (for all \(f\in F\)).
Step 2: Constraints: We should have a constraint for all of the bands \(b\in B\), and in the constraints we should sum up the variables with \(f\in F_b\), and that should be greater or equal to 1. Note, that we already have a value for \(b\) if we are writing a specific constraint.
Step 3: Objective function: The objective function is simple: sum all the variables, and that should be minimal.

Putting all of these together, we have:

\[ \begin{array}{rclp{2cm}r} y_f &\in& \{0,1\} & & \quad \forall f\in F\\ \\ \displaystyle\sum_{f \in F_b} y_f &\ge& 1 & & \forall b\in B\\ \\ \hline \\ \displaystyle\sum_{f \in F} y_f &\to& min \end{array} \]

Or, if You prefer to use the \(p_{b,f}\) parameter instead of the \(F_b\) sets, where we assume that \(p_{b,f}=1\) if band \(b\) performs at festival \(f\), and 0 otherwise, then:

\[ \begin{array}{rclp{2cm}r} y_f &\in& \{0,1\} & & \quad \forall f\in F\\ \\ \displaystyle\sum_{f \in F} p_{b,f} \cdot y_f &\ge& 1 & & \forall b\in B\\ \\ \hline \\ \displaystyle\sum_{f \in F} y_f &\to& min \end{array} \]

Note, that in this version, the constraint is a bit different, here the sum goes over all of the festivals, however, the variables for festivals where the band \(b\) does not perform are multiplied by 0, and all the others are multiplied by 1, so the constraints will be the same. For example, instead of \(y_1+y_3+y_4 \ge 1\) it will be \(1\cdot y_1 + 0\cdot y_2 + 1\cdot y_3 + 1\cdot y_4 + 0\cdot y_5\ge1\).

To keep things simpler, we will focus on the second model below.

Generic GMPL models

GMPL (and most modern modeling languages) support this generic model definition. The previous structure for our GMPL model was this:

Variable declarations
Constraint definitions
Objective function definition

To make the desired separation, the model file has to have this structure:

Model logic / Generic model
data; keyword
Model data
end; keywords

Note, that the absence of the end; keyword resulted a warning in our previous models too. We will talk a bit later about the data section, let's focus on the generic model first which usually follows this template:

Generic set and parameter definitions
Generic Variable declarations
Generic Constraint definitions
Generic Objective function definition

Sets and parameters can be declared with the set and param keywords, respectively. If a set, parameter, variable has an subscript/index, the name should be followed by the following in the declaration: {Setname}, where Setname is the domain on which that set, parameter, or variable is defined. When using this set, variable or parameter, the index should be put within brackets after the name of the variable. If there are more indeces, they should be sparated by commas both in the definition and in case of use. If we want a constraint to be defined for each element of a set, a {s \in Setname} should be put after the name of the constraint, where Setname is the name of the set, and s is a dummy index that can be used within the constraint. The syntax for the sum symbol is similar, but let's see how it looks for our second model with \(p_{b,f}\) parameter:

Mathematical notation	GMPL code
Sets and parameters
\(F\)	`set F;`
\(B\)	`set B;`
\(p_{b,f}\)	`param p{B,F};`
Variables
\(y_f \in \{0,1\} \qquad \quad \forall f\in F\)	`var y{F} binary;`
Constraints
\(\displaystyle\sum_{f \in F} p_{b,f} \cdot y_f \ge 1 \qquad \forall b\in B\)	`s.t. Constraints{b in B}: sum{f in F} p[b,f]*y[f] >= 1;`
Objective function
\(\displaystyle\sum_{f \in F} y_f \to min\)	`minimize Objective: sum{f in F} y[f];`

So all in all, our model is something like this:

    set B;
    set F;
    param p{B,F};

    var p{F} binary;

    s.t. Constraints{b in B}:
      sum{f in F} p[b,f]*y[f] >= 1;

    minimize Objective:
      sum{f in F} y[f];

But we have already learned, that it is always a good practice to have meaningful names, so before we dive into the data section, let's refactor our model to this:

set Bands;
set Festivals;

param performs{Bands,Festivals};

var go{Festivals};

s.t. ListenToAllBandsAtLeastOnce {b in Bands}:
  sum{f in Festivals} performs[b,f] * go[f] >= 1;

minimize FestivalsWentTo:
  sum{f in Festivals} go[f];

If we were to "run" this model with the usual glspsol -m model.mod command, we would receive an error message something like this: Missing value for S. That is because the data section is still missing.

The data section gives values for sets and parameters. Many syntax tweaks can be used here, so reading the GMPL reference is highly recommended. We only mention here the 4 simplest cases:

    # Set definition
    set Setname := element1 element2 element3;
    set Setname2 :=  element anotherelement;

    # Definition of parameter with 0 subscripts/indices/dimensions:
    # E.g. for declaration param Paramname0;
    param Paramname0 := numericvalue;

    # Definition of parameter with 1 subscript/index/dimension:
    # E.g. for declaration param Paramname1{Setname};
    param Paramname :=
      element3  numericvalue
      element2  numericvalue
      element1  numericvalue
      ;

    # Definition of parameter with 2 subscripts/indices/dimensions:
    # E.g. for declaration param Paramname2{Setname,Setname2}
    param Paramname :
                element         anotherelement :=
      element1  numericvalue    numericvalue
      element3  numericvalue    numericvalue
      element2  numericvalue    numericvalue
      ;

Note, that the order in which the elements of a set are listed in the parameter definitions is irrelevant. Also, sets can have numeric elements, and parameters can be symbolic too, but we will discuss such things in more advanced models. For now, let's write our data section with the syntax above:

set Bands :=
  Haggard
  Stratovarius
  Epica
  Dalriada
  Apocalyptica
  Liva
  Eluveitie
  ;

set Festivals := F1 F2 F3 F4 F5;

param performs:
                  F1  F2  F3  F4  F5 :=
    Haggard       1   0   1   1   0
    Stratovarius  1   1   1   0   1
    Epica         1   1   0   1   1
    Dalriada      0   0   1   0   1
    Apocalyptica  0   0   0   1   0
    Liva          0   1   1   1   1
    Eluveitie     0   0   1   0   1
    ;

Note, that in the definition of the set Bands, the elements are given in new lines. This is ok, GMPL does not care about the number and type of white space characters (spaces, tabs, new lines). So if our model is more readible like this, feel free to put in additional newlines or spaces. Similarly, a 1 dimensional parameter could be defined like this as well:

param Param1:= element1 value1 element3 value3 element2 value2;

Solve the new model

We are nearly done, we just have to copy these two parts of code into a single file, and separate them with data;. While that is good (and You should definitely try it), a better way is to put the model part into a festival.mod and the data part to a SummerOf18.dat. Then run the following command: glpsol -m Festival.mod -d SummerOf18.dat -o FestivalsSummer18.out

The solver puts similar things to the standard output as before, but let's have a look at the solution file:

Problem:    festival
Rows:       8
Columns:    5
Non-zeros:  25
Status:     OPTIMAL
Objective:  FestivalsWentTo = 2 (MINimum)

   No.   Row name   St   Activity     Lower bound   Upper bound    Marginal
------ ------------ -- ------------- ------------- ------------- -------------
     1 ListenToAllBandsAtLeastOnce[Haggard]
                    B              2             1               
     2 ListenToAllBandsAtLeastOnce[Stratovarius]
                    NL             1             1                           1 
     3 ListenToAllBandsAtLeastOnce[Epica]
                    B              1             1               
     4 ListenToAllBandsAtLeastOnce[Dalriada]
                    B              1             1               
     5 ListenToAllBandsAtLeastOnce[Apocalyptica]
                    NL             1             1                           1 
     6 ListenToAllBandsAtLeastOnce[Liva]
                    B              2             1               
     7 ListenToAllBandsAtLeastOnce[Eluveitie]
                    B              1             1               
     8 FestivalsWentTo
                    B              2                             

   No. Column name  St   Activity     Lower bound   Upper bound    Marginal
------ ------------ -- ------------- ------------- ------------- -------------
     1 go[F1]       NF             0                                     < eps
     2 go[F2]       NF             0                                     < eps
     3 go[F3]       B              1                             
     4 go[F4]       B              1                             
     5 go[F5]       NF             0                                     < eps

Karush-Kuhn-Tucker optimality conditions:

KKT.PE: max.abs.err = 0.00e+00 on row 0
        max.rel.err = 0.00e+00 on row 0
        High quality

KKT.PB: max.abs.err = 0.00e+00 on row 0
        max.rel.err = 0.00e+00 on row 0
        High quality

KKT.DE: max.abs.err = 0.00e+00 on column 0
        max.rel.err = 0.00e+00 on column 0
        High quality

KKT.DB: max.abs.err = 0.00e+00 on row 0
        max.rel.err = 0.00e+00 on row 0
        High quality

End of output

From the first table we can see, that the constraints were really generated separately for all of the bands, and from the second table, that the go variables were generated for all of the festivals. Also, we get the same optimal solution of going to F3 and F4 only.

Later on, we will learn some other features of GMPL, with which we can tweak this model to be nicer, but for now, we have all the tools needed to build more complex models.

Final notes

Putting the logic and the data in one place is never a good idea, and it isn't different for mathematical modeling either. GMPL lets us separate these things by abstracting the data into sets, parameters, and putting their declaration to the generic model, and their definition into a separate file.