Motivational example: festivals

A discrete decision problem for motivation, and to discuss basic concepts and principles

Problem description

Who doesn't like music, and want to listen to their favorite bands live in summer festivals? In this example, this is what we want to do exactly, while saving as much money as possible. First things first, fill the fields with the name of your favorite bands, performers, or just leave there my selection.

Now, let us assume, that there are five festivals, we have the chance to go to during the summer. For the sake of simplicity, these festivals will be called F1, F2, etc. Below is a table, that details, which band will perform at which festival.

F1 F2 F3 F4 F5
${b1} 🗸 🗸🗸
${b2} 🗸🗸🗸 🗸
${b3} 🗸🗸 🗸🗸
${b4} 🗸 🗸
${b5} 🗸
${b6} 🗸🗸🗸🗸
${b7} 🗸 🗸

Let us suppose for now, that the ticket for each festival costs the same, and we want to minimize the money we spend during the summer while hearing every favorite band of us at least once.

Solving the problem by hand

The problem with this size is easy to solve in a few minutes without any software tool, just by a trial and error approach. Try it yourself before clicking the button below. Show solution

It seems that going to festivals F3 and F4 only is the best solution. But how can we be sure, that this is really the optimal solution? A reasonable argument would be to say, that our solution has 2 festivals selected. If this is not the best solution, then the best should have at most 1 festival selected. With just checking the 5 festivals, it is easy to see, that none of them covers all of our favorite bands.

Note, that this logic would be much more complex to execute, if we were to have a larger problem, let's say 30 bands and 50 festivals. If our presumably optimal solution would have 9 festivals selected, all of the 8 element subsets of the 50 festivals should be checked, that is a pretty big number: \( {50 \choose 8} = 536878650 \).

Now that we are assured, that F3,F4 is optimal, another question may arise: is this the only optimal solution?

It is pretty easy to see, that F4,F5 is also a feasible solution with 2 festivals, thus, let us note that

A problem may have one or more optimal solutions at the same time.

What about a larger festival problem?

A problem with this size can be handled effortlessly by hand, however, as the number of bands and festivals increases, it is less and less so. Imagine a problem with 30 bands and 50 festivals. If the "performing-matrix" is not trivial, solving the problem will definitely require some kind of computer assistance.

Even if you have no background in optimization, but have basic programming skills, you should be able to write a computer program solving this problem in your favorite language. Try to come up with the algorithm yourself, then proceed by clicking on the button. Show algorithm

The first thing is to observe, that the flexibility we have is to make a yes-or-no type of decision about all of the festivals. This basically gives us \(2^{50}\) different solutions.

Some of these solutions are satisfying the constraints of the problem, i.e., that all of the bands are visited, others don't. Solutions of the former group are often called as good or feasible solutions, while the others as bad or infeasible.

In the case of the matrix above, selecting F1,F3,F4 is a feasible solution, while F1,F3 is infeasible, as we will miss ${b5}. Below is the complete list of the 32 subsets, and whether they are feasible or not.

none F1 F2 F2,F1
F3 F3,F1 F3,F2 F3,F2,F1
F4 F4,F1 F4,F2 F4,F2,F1
F4,F3 F4,F3,F1 F4,F3,F2 F4,F3,F2,F1
F5 F5,F1 F5,F2 F5,F2,F1
F5,F3 F5,F3,F1 F5,F3,F2 F5,F3,F2,F1
F5,F4 F5,F4,F1 F5,F4,F2 F5,F4,F2,F1
F5,F4,F3 F5,F4,F3,F1 F5,F4,F3,F2 F5,F4,F3,F2,F1

If we already have the set of finite solutions, we just have to select the smallest from the feasible ones, e.g.,:

      S = generate_solutions();
      min = infinity
      for(s in S) {
        if (feasible(s)){
          if (value(s)<min) {
            min = value(s);
          }
        }
      }
    

or if you prefer functional programming style:

(min (filter (generate_solutions) feasible) value)

Accelerating the solution procedure

Let us assume, that we implemented the above algorithm in the most efficient way possible. It is fair to say, that evaluating a single solution takes at least as much time, as a floating point operation. As of today, the fastest supercomputer is the Summit with 200 petaflops of computational power. That would mean that even this computer could not investigate more than around \(10^{17}\) solutions in a second. Having the aforementioned problem with 50 festivals, the whole computation would take more than \(2^{50} / 10 ^ {17}\) seconds. Since \(2^{10}\)~\(10^3\), that would mean 10 milliseconds. That may seem acceptable for an average computer, but not for the fastest supercomputer of our time. Moreover, if the problem size increased to let's say 100, even this computer would need more than 300 thousand years. It is clear, that if we want to tackle this type of problems, with "practical" sizes in a reasonable amount of time, some accelerations are needed.

Idea 1: Use the Greedy algorithm

A reasonable intuition says, that festivals with a lot of bands should have priority, and should be selected. A simple algorithm could be built upon this idea:

  1. Select and include a festival with the most bands.
  2. Remove the bands that are covered by the selected festival
  3. If there is at least one performer left, go to Step 1.

Applying this procedure to the original problem, it would first select either F3, F4 or F5, as all three of them covers 5 bands. This ambiguity should somehow be resolved with the algorithm, we will always select the leftmost festival. So F3 is selected, and then ${b1},${b2},${b4},${b6},${b7} are removed from the list and the table is reduced to the following:

F1 F2 F4 F5
${b3} 🗸🗸🗸🗸
${b5} 🗸

As there are two bands still left, the algorithm continues with the festival covering most of them, that is F4. Selecting F4 will have all of the bands covered, thus we end up with the solution of F3,F4.

For this particular example, this so-called Greedy algorithm provided the optimal solution. For some problem classes, the optimality can be guaranteed in such a way, however, this problem is not like that. It is not difficult to create an instance, where this algorithm would provide a sub-optimal solution. Try to construct such an example by yourself. Show problem instance

F1 F2 F3
${b1} 🗸 🗸
${b2} 🗸🗸
${b3} 🗸 🗸
${b4} 🗸🗸
${b5} 🗸
${b6} 🗸

It is easy to see, that the optimal solution is F1,F2, however, the greedy algorithm would select F3 first and then include the other two.

The greedy algorithm can be considered as a heuristic approach on this problem class, as it provides a (most probably) reasonably good solution very quickly, however, it can not guarantee the optimality of the solution (even if it happens to be optimal).

Idea 2: Make inevitable decisions right away

Looking at the original problem, it is very obvious, that ${b5} is a rather picky band, as they only perform at F4. Hence, F4 must be on our festival list, no matter what. This decision can be made with 100% certainty without losing optimality.

Another less obvious observation could be, that F2 is inferior to F5, as it only covers a subset of bands covered by F5. Even if there is an optimal solution having F2 selected, another optimal solution must exist, where F5 is selected instead of F2. Thus, we can neglect F2 right from the beginning, and reduce the size of the problem like this.

These kind of simplifications are often implemented as a presolve procedure in modern optimization software tools. In the best case they can solve the problem, in the worst the problem remains unaltered.

This kind of simple tricks can usually be applied in a recursive manner. Note, that F3 does not dominate F1 in the same way, as F5 did F2, because ${b3} will perform at F1 but not at F3. However, if F4 gets selected because of ${b5}, then ${b3} becomes irrelevant, as we will see them there anyway. Thus, after selecting F4, F1 becomes inferior to F3, and thus can be neglected. Similarly, F3 and F5 will become equivalent after including F4, so one of them can be neglected.

For this specific problem instance, these two simple presolve techniques can solve the problem to optimality.

Idea 3: Investigate only feasible solutions

For this particular problem, half of the solutions were infeasible (see above). This means, that we are wasting computational time on a lot of infeasible solutions, that will definitely not end up optimal. This ratio is usually much higher for real world problems, thus it is a nice idea, to somehow come up with an approach generating only the feasible solutions.

Investigating this is not our main goal here, so such an algorithm is not presented here, but you should be able to construct such an algorithm, and implement it in your language of selection.

Idea 4: Select the order of the solutions

Since we are only interested in the smallest feasible subset of festivals, it would be reasonable to design our algorithm in a way, that it iterates through the singleton sets first, then the 2 element subsets, and so on. Thus, if a feasible solution is found, it can be crowned as an optimal one, and the algorithm could stop.

Unfortunately the "best" search algorithm, or subproblem selection approach can vary from one problem class to another. In this particular problem class, if all of the festivals are needed, then this algorithm would go through all of the solutions anyway.


Final notes

This problem class is called Set covering problem, which was dressed up in this festival-band theme. This problem class is known to be NP-Complete, i.e., unless P=NP, no "fast" algorithm can be expected to solve it for larger instances.