A discrete decision problem for motivation, and to discuss basic concepts and principles
Who doesn't like music, and want to listen to their favorite bands live in summer festivals? In this example, this is what we want to do exactly, while saving as much money as possible. First things first, fill the fields with the name of your favorite bands, performers, or just leave there my selection.
Now, let us assume, that there are five festivals, we have the chance to go to during the summer. For the sake of simplicity, these festivals will be called F1, F2, etc. Below is a table, that details, which band will perform at which festival.
F1 | F2 | F3 | F4 | F5 | |
---|---|---|---|---|---|
${b1} | 🗸 | 🗸 | 🗸 | ||
${b2} | 🗸 | 🗸 | 🗸 | 🗸 | |
${b3} | 🗸 | 🗸 | 🗸 | 🗸 | |
${b4} | 🗸 | 🗸 | |||
${b5} | 🗸 | ||||
${b6} | 🗸 | 🗸 | 🗸 | 🗸 | |
${b7} | 🗸 | 🗸 |
Let us suppose for now, that the ticket for each festival costs the same, and we want to minimize the money we spend during the summer while hearing every favorite band of us at least once.
The problem with this size is easy to solve in a few minutes without any software tool, just by a trial and error approach. Try it yourself before clicking the button below. Show solution
It seems that going to festivals F3 and F4 only is the best solution. But how can we be sure, that this is really the optimal solution? A reasonable argument would be to say, that our solution has 2 festivals selected. If this is not the best solution, then the best should have at most 1 festival selected. With just checking the 5 festivals, it is easy to see, that none of them covers all of our favorite bands.
Now that we are assured, that F3,F4 is optimal, another question may arise: is this the only optimal solution?
It is pretty easy to see, that F4,F5 is also a feasible solution with 2 festivals, thus, let us note that
A problem with this size can be handled effortlessly by hand, however, as the number of bands and festivals increases, it is less and less so. Imagine a problem with 30 bands and 50 festivals. If the "performing-matrix" is not trivial, solving the problem will definitely require some kind of computer assistance.
Even if you have no background in optimization, but have basic programming skills, you should be able to write a computer program solving this problem in your favorite language. Try to come up with the algorithm yourself, then proceed by clicking on the button. Show algorithm
The first thing is to observe, that the flexibility we have is to make a yes-or-no type of decision about all of the festivals. This basically gives us \(2^{50}\) different solutions.
Some of these solutions are satisfying the constraints of the problem, i.e., that all of the bands are visited, others don't. Solutions of the former group are often called as good or feasible solutions, while the others as bad or infeasible.
In the case of the matrix above, selecting F1,F3,F4 is a feasible solution, while F1,F3 is infeasible, as we will miss ${b5}. Below is the complete list of the 32 subsets, and whether they are feasible or not.
none | F1 | F2 | F2,F1 |
F3 | F3,F1 | F3,F2 | F3,F2,F1 |
F4 | F4,F1 | F4,F2 | F4,F2,F1 |
F4,F3 | F4,F3,F1 | F4,F3,F2 | F4,F3,F2,F1 |
F5 | F5,F1 | F5,F2 | F5,F2,F1 |
F5,F3 | F5,F3,F1 | F5,F3,F2 | F5,F3,F2,F1 |
F5,F4 | F5,F4,F1 | F5,F4,F2 | F5,F4,F2,F1 |
F5,F4,F3 | F5,F4,F3,F1 | F5,F4,F3,F2 | F5,F4,F3,F2,F1 |
If we already have the set of finite solutions, we just have to select the smallest from the feasible ones, e.g.,:
S = generate_solutions(); min = infinity for(s in S) { if (feasible(s)){ if (value(s)<min) { min = value(s); } } }
or if you prefer functional programming style:
(min (filter (generate_solutions) feasible) value)
Let us assume, that we implemented the above algorithm in the most efficient way possible. It is fair to say, that evaluating a single solution takes at least as much time, as a floating point operation. As of today, the fastest supercomputer is the Summit with 200 petaflops of computational power. That would mean that even this computer could not investigate more than around \(10^{17}\) solutions in a second. Having the aforementioned problem with 50 festivals, the whole computation would take more than \(2^{50} / 10 ^ {17}\) seconds. Since \(2^{10}\)~\(10^3\), that would mean 10 milliseconds. That may seem acceptable for an average computer, but not for the fastest supercomputer of our time. Moreover, if the problem size increased to let's say 100, even this computer would need more than 300 thousand years. It is clear, that if we want to tackle this type of problems, with "practical" sizes in a reasonable amount of time, some accelerations are needed.
A reasonable intuition says, that festivals with a lot of bands should have priority, and should be selected. A simple algorithm could be built upon this idea:
Applying this procedure to the original problem, it would first select either F3, F4 or F5, as all three of them covers 5 bands. This ambiguity should somehow be resolved with the algorithm, we will always select the leftmost festival. So F3 is selected, and then ${b1},${b2},${b4},${b6},${b7} are removed from the list and the table is reduced to the following:
F1 | F2 | F4 | F5 | |
---|---|---|---|---|
${b3} | 🗸 | 🗸 | 🗸 | 🗸 |
${b5} | 🗸 |
As there are two bands still left, the algorithm continues with the festival covering most of them, that is F4. Selecting F4 will have all of the bands covered, thus we end up with the solution of F3,F4.
For this particular example, this so-called Greedy algorithm provided the optimal solution. For some problem classes, the optimality can be guaranteed in such a way, however, this problem is not like that. It is not difficult to create an instance, where this algorithm would provide a sub-optimal solution. Try to construct such an example by yourself. Show problem instance
F1 | F2 | F3 | |
---|---|---|---|
${b1} | 🗸 | 🗸 | |
${b2} | 🗸 | 🗸 | |
${b3} | 🗸 | 🗸 | |
${b4} | 🗸 | 🗸 | |
${b5} | 🗸 | ||
${b6} | 🗸 |
It is easy to see, that the optimal solution is F1,F2, however, the greedy algorithm would select F3 first and then include the other two.
Looking at the original problem, it is very obvious, that ${b5} is a rather picky band, as they only perform at F4. Hence, F4 must be on our festival list, no matter what. This decision can be made with 100% certainty without losing optimality.
Another less obvious observation could be, that F2 is inferior to F5, as it only covers a subset of bands covered by F5. Even if there is an optimal solution having F2 selected, another optimal solution must exist, where F5 is selected instead of F2. Thus, we can neglect F2 right from the beginning, and reduce the size of the problem like this.
This kind of simple tricks can usually be applied in a recursive manner. Note, that F3 does not dominate F1 in the same way, as F5 did F2, because ${b3} will perform at F1 but not at F3. However, if F4 gets selected because of ${b5}, then ${b3} becomes irrelevant, as we will see them there anyway. Thus, after selecting F4, F1 becomes inferior to F3, and thus can be neglected. Similarly, F3 and F5 will become equivalent after including F4, so one of them can be neglected.
For this specific problem instance, these two simple presolve techniques can solve the problem to optimality.
For this particular problem, half of the solutions were infeasible (see above). This means, that we are wasting computational time on a lot of infeasible solutions, that will definitely not end up optimal. This ratio is usually much higher for real world problems, thus it is a nice idea, to somehow come up with an approach generating only the feasible solutions.
Investigating this is not our main goal here, so such an algorithm is not presented here, but you should be able to construct such an algorithm, and implement it in your language of selection.
Since we are only interested in the smallest feasible subset of festivals, it would be reasonable to design our algorithm in a way, that it iterates through the singleton sets first, then the 2 element subsets, and so on. Thus, if a feasible solution is found, it can be crowned as an optimal one, and the algorithm could stop.
Unfortunately the "best" search algorithm, or subproblem selection approach can vary from one problem class to another. In this particular problem class, if all of the festivals are needed, then this algorithm would go through all of the solutions anyway.