My kids are on a summer swim team in the local league. It's a great activity--exercise, a bit of healthy competition, and all four of my kids and step-kids are on the same team for one season out of the year. It's a lot of fun for everyone.
It's complicated, though. There are five age groups for both boys and girls for a total of ten competition groups. There are 4 strokes (fly, back, breast, and freestyle). Each swimmer is assigned to one of three classes for each stroke. The A class has the faster kids, the B class the next faster kids, and the C class has the rest. First, second, and third place in each stroke-class (for each age/gender group) earn points for the team. The A, B, and C classes all count equally, in the spirit of the league. It's 5 points for a 1st, 3 for a 2nd, and 1 for a 3rd, regardless of class. This way, nearly everyone's performance can affect the outcome of the meet.
There's only one strategic variable in the meet. Each swimmer can only swim in 3 of the 4 stroke events in the meet. In other words, each swimmer has to skip one of the four strokes for the day. The manager of each team seeds the meet a couple days beforehand. You'd think that the best strategy is to have each swimmer participate in their 3 best strokes. But that's not the case.
Consider a swimmer who can take 2nd place in his age group/class in his best stroke, and who can take 3rd place in his worst. His manager may want to have him skip his best stroke if the team's next fastest swimmers were poised to take 3rd and 4th, (sliding up to 2nd and 3rd). The swimmer's absence in his best stroke would not cost his team any points, and his addition in his slowest stroke would add a point for his team. With 110 swimmers, 10 age/gender groups, and 3 classes in each group per team, there could be several convoluted and counter-intuitive possibilities--too many for any one person to think through.
I thought this could be conquered quickly by using Excel's its 'Solver' optimization add-in. But after a lot of work I learned that Solver doesn't play nice with problems like this. The swim meet scoring function is non-linear and non-continuous. There are only binary options (does a swimmer participate in a certain event or not?). Each option can have a large impact on the outcome. After investigating and experimenting, I realized Solver just can't handle complex binary problems.
It's a fun problem, so I pursued it further by coding my own solution. The swim league isn't intended to be super-competitive, and each kid should have a fair opportunity to compete in each stroke. But occasionally our team is challenged by our cross-town rivals for the championship, and the team wants to go all-out. With the swimmers' times posted online, all the necessary data is available to analyze.
There are two parts to the optimization problem. The first is computational complexity, and the second is strategic and related to game theory.
Complexity
I needed way to run through all possible combinations of event participation for each group, and identify which combination provides the optimum score. This sounds easy, and it is except for one thing--it's exponentially complex. There are 4 possible combinations of events for every swimmer in a group. If there are 6 swimmers in an age/gender group, that's 4^6 = 4,096 different combinations we need to check for the optimum score. Any computer can do that in a flash. What if there are 8 swimmers in a group? That's 65,536, which is still very crunchable in a few seconds. What about 12 swimmers in a group, which is very common? That's 4^12 = 16,777,216 permutations, which takes the Advanced NFL Stats supercomputer over 2 hours to crunch.
The toughest challenges are caused by the largest groups. The 8-10 girls age group on our team has 20 swimmers, which means there are 1,099,511,627,776 combinations of swimmer-events, just for one team. Uh-oh. That would take 5,461 days to crunch. We can do some triage to exclude some swimmers who rank low enough in every stroke/class not to matter, but directly computing the optimum would still take far too much time. Using a compiled executable program or multi-threaded coding could also accelerate things, but not to the degree necessary for the larger groups.
From xkcd.com (If you get this right away, you're way ahead of me.)
Game Theory
The second part of the problem has to do with game theory. This isn't the typical engineering or transportation optimization problem. There's a thinking human on the other side of equation trying to win. The value of my team's optimum lineup depends on the opponent's lineup and vice versa. In the optimization problem, I can only control my own team's lineup. Otherwise, the algorithm would make the other team swim all their worst strokes.
The algorithm needs a starting point to represent the opponent's lineup of events. A team's lineup needs to be drafted without knowledge of the other's. There are a few options here. We can start with a randomly generated opponent lineup, and then iterate. Once I've optimized against a random opponent lineup, I could optimize the opponent lineup vs my own team's, and then optimize my own lineup again to counter the opponent's optimization. Presuming there isn't another sports-analytics-obsessed parent on the other team, this might be a good approach.
Further, we can continue to iterate until we reach a Nash Equilibrium where both teams are optimized against each other's optimizations, and no team could benefit from changing its lineup. Although it's not realistic to ever expect that level of overkill by managers, it's still interesting to see where the equilibrium is and what score it represents. This approach is feasible for smaller groups of swimmers, but not for larger groups--at least using the direct computational method. It requires the optimization to be run multiple times.
Another method is to start with an intuitively reasonable opponent lineup. A very good heuristic is to have each swimmer participate in his three most competitive strokes. This would be reasonably optimum except for combinations like those mentioned previously--when it makes sense for a competitive swimmer to allow slower swimmers on his own team to score just as many points in order to allow him to place in another stroke. I'm not the manager of our team, but I imagine this is how most managers seed their lineups.
Solution
A classic example of computationally complex optimization problem is the traveling salesman problem, aka the TSP. Consider a salesman who must visit 10 cities and return to his starting point. There is known time/distance between each city, and the salesman needs to visit each city exactly once in the minimum amount of time. There are 362,880 combinations of paths he could travel. Add an 11th city, and there are now 3,628,800 combinations. Every day in the working life of a FedEx or UPS driver is one big TSP.
One of the ways to crack the TSP is to use a genetic algorithm. The power of evolution's selection effect can be harnessed to quickly approximate the optimum solution, and often find the precise optimum solution. For extremely complex problems, the tradeoff between computational time and the level of optimality usually makes sense. I thought this approach was super-cool, so I thought I'd try to apply it to the swim meet problem.
Here's how I approached the swim meet optimization problem using genetic concepts. In every age/gender group, think of the lineup of events that each swimmer skips as a genome--a strand of DNA. But instead of GCTACTCAGGCA, it's Fly-Breast-Fly-Fly-Back-Free-Fly-Breast-Free... I created an original generation of 30 "individuals" (combinations of own-team swimmer-event lineups) randomly. Each different combination of swimmer-event "DNA" will produce a fitness level, determined by the meet scoring rules.
Then I spawned a new generation of 30 individuals (lineups) based on three different methods. First, I kept the 10 best lineups from the previous generation, defined as producing the highest meet scores. These are known as "the elite." The other 20 lineups die off and are replaced by children of the elite. This ensures the evolution process doesn't randomly regress into less fit populations.
Ten of the next-generation "children" are created by asexual mutation. Each of the elite is randomly mutated slightly to create a batch of new children, some of which may happen be more 'fit' than their parent. [Yes, that's right. I wasted an unknown amount of time making imaginary mathematical 'asexual mutant children.']
The other ten children lineups were created sexually [I know how weird this sounds] by combining the genomes of two of the elite chosen randomly. In this way, the algorithm can eventually merge the best sub-combinations into fitter and fitter solutions. This can provide an evolutionary leap out of a local maximum, where slight mutations can no longer improve outcomes, and onto a region of the problem closer to the global maximum. The algorithm is repeated until it no longer produces any improvement in fitness (score) within a reasonable number of iterations.
I used a set of data from our team's closest meet this year as a test. I selected an age group that happened to have 12 swimmers from each team. This provided a useful test because it's enough swimmers to make things tough but also solvable by brute force. That way I know the true optimum as the answer key for testing the genetic algorithm.
The brute-force optimization approach took 2 hours and 14 minutes to evaluate every possible lineup. The score was 46 for the Barracudas, our home team, to 62 for the Lightning, our rival. (We happened to be outgunned in this age group.) This result was based on a single iteration optimization versus the 'intuitively reasonable' opponent lineup heuristic, where each opponent swimmer simply swims his three most competitive strokes. It began from a standing start, initialized with a population of randomly chosen lineups.
The genetic algorithm took a fraction of the time of the brute force computation, averaging 17 seconds to reach optimum and then continue for another 1000 iterations in vain search for a higher score. In the dozens of times I ran it, the algorithm never failed to produce a lineup worth the known optimum of 46 points. It reached the optimum after an average of 75 generations, which equates to 75 * 30 = 2,250 potential solutions, and never exceeded 184 generations.
The impractically large group of 20 swimmers produced a (probable) optimum in a very steady average of only 24 seconds. I wrote probable because I'm not going to try to run the program for 12 years to search all the possible combinations. The genetic algorithm consistently produced the same "best" score every time I ran it, but I can't prove it's the optimum. I think the reason it worked so well with the very large group was because the larger groups still has the same number of swimmers as the smaller groups that matter in terms of scoring, and the rest of the swimmers can choose any event combination without an impact on the meet score. (Kind of like real-world 'junk DNA', I guess.)
I could probably tune the constants in the model to improve the speed. The number of 'stall generations' allowed where there is no longer any improvement was set to 1000, which could be safely lowered to quicken things. The biggest help would be to initialize the population with reasonably high scoring lineups instead of random ones, but that wouldn't have been a fair comparison with the brute force method. The number of elites, mutants, and children could also be tweaked to improve things, but I'm satisfied with it as is. Besides, this is more of a learning exercise than a real attempt to squeeze out an extra few points at the meet.
I enjoyed the article. Really interesting. First, there's the implication that a swim meet is, in part, a contest of each coach's optimization skills.
Second, you've described a dual meet. You could add another layer of complexity with an open meet, where three or more teams compete. (That gives you something to work on for the rest of the summer. :)
One thing that wasn't quite clear to me: Were you optimizing both teams at once or optimizing your team vs. a set lineup? You discuss doing the former, but the testing you describe seems to work off the latter. I'd be curious how much difference there is between the Nash Equilibrium optimum vs. the optimum against a set lineup.
Brian,
I'm a bit shocked to no longer be the only stathead to publish an article on TSP and Genetic Algorithms.
http://dl.acm.org/citation.cfm?id=1756693
Sean-That's the real deal!
Bravo! I bet your team's coach and all of the opposing parents are going to hate you.
Did you assign winners deterministically or probabilistically based on variation in times?
Other avenues you could explore are:
1. What is the home pool advantage?
2. Do certain kids perform better in different air/water temperatures?
3. Do kids perform better breathing every 2, 3, or 4 strokes?
I'll be looking forward to your followup in the next decade on the 20 person case.
How about this algorithm (which assumes that swim times are known exactly)?
Step 1: Start with the illegal lineup where every swimmer swims in every event.
Step 2: With the lineup that you're looking at, check if any swimmer scores points in all 4 events. If not, go to step 4. If yes, go to step 3.
Step 3: Take a swimmer who scores points in all 4 events, and create 4 alternate lineups, which remove them from each of the events. Send each lineup back to step 2.
Step 4: Any swimmer that is left participating in 4 events can be removed from any one of the events in which they do not score points to create a legal lineup (it doesn't matter how you choose which of their non-scoring events to remove them from since the point totals will be the same).
You will end up with a set of possible lineups which covers all of the relevant variation and is much smaller than the set of all possible lineups. Essentially, you're taking the most general (illegal) case, and breaking it into subcases based on what events a top swimmer is skipping, and then breaking each of those subcases into subcases, and so on, until there is no relevant variation left (only options for the non-scoring swimmers). At most you'll end up with 4^k possible lineups, where k is the number of swimmers who could score points in all 4 events if they swam in all 4 (given that other swimmers can only swim in 3).
Then you need to optimize both team's lineups. That optimization process is not trivial (given the game theory issues), but it should be much simpler given that you only need to optimize over a small subset of possible lineups.
(In practice, you may want to pretend that the 4th & 5th spots also score points, to account for the fact that swimming times are non-deterministic. e.g., It's much better to have a swimmer in an event where they are expected to finish 4th than in one where they are expected to finish 14th, because of the chance that they will crack the top 3. Or you could account for this using a rough hack in step 4, where you remove each swimmer from their worst event.)
Dan-That's excellent. My own thought was to give 4th place a notional half point, or to weight all the expected 1st, 2nd, 3rd, 4th place points according to probabilistic averages.
Which software did you use for the genetic algorithm?
Sean-I wrote my own code. But there are several analytics applications and add-ins for MatLab or R that could do something like this. I wanted to do it myself just to learn it, and also, if I wanted to actually employ it, I could customize it to automatically download and parse the relevant data for each meet.
This is more like knapsack than the traveling salesman. (Of course both are NP-hard.) Something to remember is that general optimizations are hard, but real world examples can often be solved relatively easily.
Another heuristic that I would expect to be very powerful here is using a greedy strategy - get as many first places as possible, then look for 2nd places, and then 3rd places.
While this sort of thing is instructive as an exercise, it does seem to go in the face of the structure which is clearly set up so that lots of swimmers can get 'small wins'. (If it were really competitive, teams would just sandbag.) So, maybe, instead of optimizing for total team points, it may make more sense to spread out the podium spots -- especially in the B and C groups.
One doesn't have to examine every possibility for things like this... Treat it as depth-first tree searching using alpha-beta pruning and reasonable ordering of nodes will result in significant reductions. Basically, you can discard huge portions of the tree where you know that you cannot beat a previously calculated score.
In the swim league of my youth, a swimmer could race in any age group at or above their current age, which could add an interesting extra part of the algorithm (if there is a strong enough age group/swimmer). In one meet, an opponent had a very strong age group that projected out to taking 1st and 2nd in their own age group, but my team had a stronger 'next' age group. Against my team, they decided to swim one of the kids up to the older group in hopes that they'd take two 1sts instead of first/second (5+5 versus 5+3). Had the strategy worked, they would have won the meet, but, instead, they 'just' got the same score as if they didn't swim him up.
Miles-They can do that in this league too. But it's usually only done in the relay portion, which does add another fun optimization wrinkle. Sometimes a team can get a 3rd place (out of 3) in the relays just by being able to field an additional 4-swimmer lane.
Brian,
I've been reading your blog for some time, but this is the first time I have commented. As the son of an academic with a specialism in GAs/heuristics, I was pleasantly surprised to find mention of them on this site. Keep up the good work, I'm always interested to see real-world applications of his field.
Great stuff. This could almost be the wikipedia article explaining how genetic algorithm problem solving works in practice. Thanks.
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and accounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in
Do your accounting homework, accounting assignments with our accounting homework solution and accounting tutoring help. Our homework assistance helps you to do your accounting homework and a
ccounting assignments.
Please visit once www.homeworkhelpindia.com or email me on prashant8778@yahoo.co.in