Introduction to using sets and parameters in GMPL in order to separate data from the logic
The models we created so far are correct, there is no mistake about that. However, when building a mathematical model, we should pay attention to how "manageable" the code is in the future. Similar to programming, there is a huge difference between a code that just works, and a code that is easy to use if later modifications are needed.
Let us have a look at the festival model again:
var y1 binary; var y2 binary; var y3 binary; var y4 binary; var y5 binary; s.t. Haggard: y1 + y3 + y4 >= 1; s.t. Stratovarius: y1 + y2 + y3 + y5 >= 1; s.t. Epica: y1 + y2 + y4 + y5 >= 1; s.t. Dalriada: y3 + y4 >= 1; s.t. Apocalyptica: y4 >= 1; s.t. Liva: y2 + y3 + y4 +y5 >= 1; s.t. Eluveitie: y3 + y5 >= 1; minimize NumberOfFestivals: y1 + y2 + y3 + y4 + y5;
We use our model, plan our summer, everything is fine.
Then, a new festival F6
is announced, where Haggard, Epica, and Apocalyptica will perform.
What kind of changes would be needed for the model?
y6
will be needed.+y6
is needed to be added+y6
too.This is a lot of changes, in different places of our code. But still more-or-less manageable. Now imagine, that we want to use our model one year later for the next summer as well. Our favorite bands may have changed, the set of interesting festivals may remain the same, but their list of bands will definitely change. We can, obviously, build a model similar to this one using the same logic, but we feel, that we would be doing something unnecessary repeatedly. This repetition, let us call it redundancy is hardly ever a sign of a quality code.
What we feel here is, that there are two things in our model, that are "interlaced" in the same file:
The logic of the model is the same regardless of the actual data, and such, it should be separated from it. This is similar to programming, where we do not want to include our database in our source code. Besides redundancy, it is really easy to make semantic mistakes, when changing the data can alter the logic of the model as well. (Imagine to put somewhere a minus instead of plus, or write y23 instead of y32 somewhere. It will be really difficult to debug and find this kind of mistake in a bigger model.) GMPL allows this kind of separation of the code into model and data section (or even files) but let us go back to the mathematical realm of optimization briefly.
Looking at the same thing from a different perspective: if we have a model for this years festivals, another one for next years festivals, and a third for the fröccs example, then we would know that the first two are "similar" and the third is "different". We would may even say, that the first two models are of the same type, but different instances. But how can we define when two models are similar or different? What is the exact dividing factor?
After the introduction in the previous section, it is easy to see, that the first two models share the same logic, but have different data, while the third one has a completely different model as well. In practice, it is very common, that we have a lot of instances of the problems with the same logic but different data. For example: planning the schedule of a factory for the next week is always a similar task, but each week has different set of products, deadlines, initial resource stocks, etc.
As a result, we are usually more interested to provide a general solution for a type of problems, not just a single instance.
On this page, we will use the term
We will generally want to provide generic models (model logic) to a problem class, rather than to an instance, and then it can be used for all of the instances in that class. In that sense, a generic model can be considered as a "recipe" which can be used to generate a complete model, once the data is provided for it. But let us have a hands on experience now:
To create our generic model, we need to scrap away the instance specific data, and describe it in a generic manner:
Now, we have a generic problem definition: Given a set of bands \(B\), a set of festivals \(F\), and a subset of festivals \(F_b\subseteq F\) for all \(b\in B\), we should find a minimal set of festivals \(F^*\) such that each \(F_b\) has at least one element in it.
To make our model, we just have to follow the previously learnt 3 steps, only do it in a general way:
Or, if You prefer to use the \(p_{b,f}\) parameter instead of the \(F_b\) sets, where we assume that \(p_{b,f}=1\) if band \(b\) performs at festival \(f\), and 0 otherwise, then:
Note, that in this version, the constraint is a bit different, here the sum goes over all of the festivals, however, the variables for festivals where the band \(b\) does not perform are multiplied by 0, and all the others are multiplied by 1, so the constraints will be the same. For example, instead of \(y_1+y_3+y_4 \ge 1\) it will be \(1\cdot y_1 + 0\cdot y_2 + 1\cdot y_3 + 1\cdot y_4 + 0\cdot y_5\ge1\).
To keep things simpler, we will focus on the second model below.
GMPL (and most modern modeling languages) support this generic model definition. The previous structure for our GMPL model was this:
To make the desired separation, the model file has to have this structure:
data;
keywordend;
keywords
Note, that the absence of the end;
keyword resulted a warning in our previous models too.
We will talk a bit later about the data section, let's focus on the generic model first which usually follows this template:
Sets and parameters can be declared with the set
and param
keywords, respectively.
If a set, parameter, variable has an subscript/index, the name should be followed by the following in the declaration: {Setname}
, where Setname is the domain on which that set, parameter, or variable is defined.
When using this set, variable or parameter, the index should be put within brackets after the name of the variable.
If there are more indeces, they should be sparated by commas both in the definition and in case of use.
If we want a constraint to be defined for each element of a set, a {s \in Setname}
should be put after the name of the constraint, where Setname is the name of the set, and s is a dummy index that can be used within the constraint.
The syntax for the sum symbol is similar, but let's see how it looks for our second model with \(p_{b,f}\) parameter:
Mathematical notation | GMPL code |
---|---|
Sets and parameters | |
\(F\) | set F; |
\(B\) | set B; |
\(p_{b,f}\) | param p{B,F}; |
Variables | |
\(y_f \in \{0,1\} \qquad \quad \forall f\in F\) | var y{F} binary; |
Constraints | |
\(\displaystyle\sum_{f \in F} p_{b,f} \cdot y_f \ge 1 \qquad \forall b\in B\) | s.t. Constraints{b in B}: sum{f in F} p[b,f]*y[f] >= 1; |
Objective function | |
\(\displaystyle\sum_{f \in F} y_f \to min\) | minimize Objective: sum{f in F} y[f]; |
So all in all, our model is something like this:
set B; set F; param p{B,F}; var p{F} binary; s.t. Constraints{b in B}: sum{f in F} p[b,f]*y[f] >= 1; minimize Objective: sum{f in F} y[f];But we have already learned, that it is always a good practice to have meaningful names, so before we dive into the data section, let's refactor our model to this:
set Bands; set Festivals; param performs{Bands,Festivals}; var go{Festivals}; s.t. ListenToAllBandsAtLeastOnce {b in Bands}: sum{f in Festivals} performs[b,f] * go[f] >= 1; minimize FestivalsWentTo: sum{f in Festivals} go[f];
If we were to "run" this model with the usual glspsol -m model.mod
command, we would receive an error message something like this: Missing value for S.
That is because the data section is still missing.
The data section gives values for sets and parameters. Many syntax tweaks can be used here, so reading the GMPL reference is highly recommended. We only mention here the 4 simplest cases:
# Set definition set Setname := element1 element2 element3; set Setname2 := element anotherelement; # Definition of parameter with 0 subscripts/indices/dimensions: # E.g. for declaration param Paramname0; param Paramname0 := numericvalue; # Definition of parameter with 1 subscript/index/dimension: # E.g. for declaration param Paramname1{Setname}; param Paramname := element3 numericvalue element2 numericvalue element1 numericvalue ; # Definition of parameter with 2 subscripts/indices/dimensions: # E.g. for declaration param Paramname2{Setname,Setname2} param Paramname : element anotherelement := element1 numericvalue numericvalue element3 numericvalue numericvalue element2 numericvalue numericvalue ;
Note, that the order in which the elements of a set are listed in the parameter definitions is irrelevant. Also, sets can have numeric elements, and parameters can be symbolic too, but we will discuss such things in more advanced models. For now, let's write our data section with the syntax above:
set Bands := Haggard Stratovarius Epica Dalriada Apocalyptica Liva Eluveitie ; set Festivals := F1 F2 F3 F4 F5; param performs: F1 F2 F3 F4 F5 := Haggard 1 0 1 1 0 Stratovarius 1 1 1 0 1 Epica 1 1 0 1 1 Dalriada 0 0 1 0 1 Apocalyptica 0 0 0 1 0 Liva 0 1 1 1 1 Eluveitie 0 0 1 0 1 ;
Note, that in the definition of the set Bands, the elements are given in new lines. This is ok, GMPL does not care about the number and type of white space characters (spaces, tabs, new lines). So if our model is more readible like this, feel free to put in additional newlines or spaces. Similarly, a 1 dimensional parameter could be defined like this as well:
param Param1:= element1 value1 element3 value3 element2 value2;
We are nearly done, we just have to copy these two parts of code into a single file, and separate them with data;
.
While that is good (and You should definitely try it), a better way is to put the model part into a festival.mod
and the data part to a SummerOf18.dat
.
Then run the following command: glpsol -m Festival.mod -d SummerOf18.dat -o FestivalsSummer18.out
The solver puts similar things to the standard output as before, but let's have a look at the solution file:
Problem: festival Rows: 8 Columns: 5 Non-zeros: 25 Status: OPTIMAL Objective: FestivalsWentTo = 2 (MINimum) No. Row name St Activity Lower bound Upper bound Marginal ------ ------------ -- ------------- ------------- ------------- ------------- 1 ListenToAllBandsAtLeastOnce[Haggard] B 2 1 2 ListenToAllBandsAtLeastOnce[Stratovarius] NL 1 1 1 3 ListenToAllBandsAtLeastOnce[Epica] B 1 1 4 ListenToAllBandsAtLeastOnce[Dalriada] B 1 1 5 ListenToAllBandsAtLeastOnce[Apocalyptica] NL 1 1 1 6 ListenToAllBandsAtLeastOnce[Liva] B 2 1 7 ListenToAllBandsAtLeastOnce[Eluveitie] B 1 1 8 FestivalsWentTo B 2 No. Column name St Activity Lower bound Upper bound Marginal ------ ------------ -- ------------- ------------- ------------- ------------- 1 go[F1] NF 0 < eps 2 go[F2] NF 0 < eps 3 go[F3] B 1 4 go[F4] B 1 5 go[F5] NF 0 < eps Karush-Kuhn-Tucker optimality conditions: KKT.PE: max.abs.err = 0.00e+00 on row 0 max.rel.err = 0.00e+00 on row 0 High quality KKT.PB: max.abs.err = 0.00e+00 on row 0 max.rel.err = 0.00e+00 on row 0 High quality KKT.DE: max.abs.err = 0.00e+00 on column 0 max.rel.err = 0.00e+00 on column 0 High quality KKT.DB: max.abs.err = 0.00e+00 on row 0 max.rel.err = 0.00e+00 on row 0 High quality End of output
From the first table we can see, that the constraints were really generated separately for all of the bands, and from the second table, that the go variables were generated for all of the festivals. Also, we get the same optimal solution of going to F3 and F4 only.
Later on, we will learn some other features of GMPL, with which we can tweak this model to be nicer, but for now, we have all the tools needed to build more complex models.