I'm a Baltimore guy, and aside from an affinity for steamed crabs and a regrettable taste for National Bohemian beer, the mid-Atlantic has given me an appreciation for the sport of lacrosse. To most North American sports fans, lacrosse must seem like some strange niche sport, like "jousting" or "baseball." But it's very entertaining and fun to watch. It's growing fast, particularly in the super-zips around DC where the ANS headquarters is.
For those not familiar with lacrosse, imagine hockey played on a football field but, you know, with cleats instead of skates. And instead of a flat puck and flat sticks, there's a round ball and the sticks have small netted pocket to carry said ball. And instead of 3 periods, which must be some sort of weird French-Canadian socialist metric system thing, there's an even 4 quarters of play in lacrosse, just like God intended. But pretty much everything else is the same as hockey--face offs, goaltending, penalties & power plays. Lacrosse players tend to have more teeth though.
Because players carry the ball in their sticks rather than push it around on ice, possession tends to be more permanent than hockey. Lacrosse belongs to a class of sports I think of as "flow" sports. Soccer, hockey, lacrosse, field hockey, and to some degree basketball qualify. They are characterized by unbroken and continuous play, a ball loosely possessed by one team, and netted goals at either end of the field (or court). There are many variants of the basic team/ball/goal sport--for those of us old enough to remember the Goodwill Games of the 1980s, we have the dystopic sport of motoball burned into our brains. And for those of us (un)fortunate enough to attend the US Naval Academy (or the NY State penitentiary system) there's field ball. The interesting thing about these sports is that they can all be modeled the same way.
So with lacrosse season underway, I thought I'd take a detour from football work and make my contribution to lacrosse analytics. I built a parametric win probability model for lacrosse based on score, time, and possession. Here's how often a team can expect to win based on neutral possession--when there's a loose ball or immediately upon a faceoff following a previous score:
For example, a team leading by one goal (denoted by the green line) with 6 minutes left in the game can expect to win 70% of the time. And, obviously, with the score tied and possession undetermined, each team has a 50% chance of winning.
The model is built using a method similar to what I used to build my NHL WP model a few years ago. That model was based on a Poisson distribution of scoring, and lacrosse could be modeled the same way. However, because lacrosse scores tend to be much higher, a normal distribution works very well and is much simpler to model. Unfortunately, football can't be modeled accurately this way. But because lacrosse scores come in increments of 1 instead of 3 or 7 or 8 or 6 (or 2), and can generally happen at any time, a normal approximation is very suitable.
This approach also gives the model a great deal of flexibility. For example, to estimate WP for the team with possession, we can bias the score distribution according to scoring efficiency--goals per possession. If a team scores on 25% of its possessions, we can simply bias the model by 0.25 goals in that team's favor. Here's the resulting chart.
I'll spare you the WP graph for a team without possession, but needless to say it's the mirror image to the graph above. This is significant because it allows us to place a value on gaining or losing possession, which is a critical aspect of any sport, but particularly for lacrosse because possession is relatively enduring. It allows us to answer questions like, how important is face-off percentage? Or how important is winning possession of lose balls? Turnovers? Chase-downs? (A chase-down is a unique rule of lacrosse when, following a shot on goal that goes out of bounds, the team whose player is nearest the ball wins possession.) And the best thing is that the answers come in the form of what ultimately matters--winning.
Here is how the model values possession based on time and score. The chart below illustrates the difference between having possession and the opponent having possession.
For example, the value of possession up (or down) by one goal with 4 minutes to play is about 0.15 WP. The model makes intuitive sense because possession has its greatest value when the score is close in the end-game. And when a team is ahead or behind by more than one goal toward the end of the game, possession value drops precipitously. In other words, it just doesn't matter who has the ball because the team ahead is most likely going to keep its lead regardless of who has the ball.
This type of model also allows us to easily adjust for team strength. If one team were a 2.5-goal favorite over another, here's how the WP chart for neutral possession would look.
Applications for Strategy
Lacrosse doesn't offer the rich strategic questions that football does. It doesn't have the game-theory dynamic of run/pass, two-point conversion dilemmas, or 4th down conundrums. But there are questions of strategy that can be analyzed. For example, let's say a team has a 1-goal lead in the final minutes of the game. Should it try to score or try to simply run out the clock?
A team with a 1-goal lead and possession with 3 minutes to play has a 0.829 WP. If it can hold onto the ball until 1-minute to play (without trying to score), it will have a 0.950 WP.
But if they try to score in typical fashion, having a 25% immediate chance of scoring on its current possession, they have a lottery of outcomes to consider. They could score to make it a 2-goal lead with, say, 2:30 to play, which is good for a 0.969 WP. If they miss their shot they could regain possession by 'chasing-down' the ball out of bounds, which would retain possession and give them a 0.956 WP. Alternatively, the opponent could gain possession which equates to a 0.28 WP. Taken together, the try-to-score option is worth:
That suggests each alternative is just about at the break-even point. Playing keep-away would be slightly preferable, assuming the team can hold the ball for 2 minutes, even without the opportunity to score.
There's no limit to how we apply models like this. We can use Expected Goals Added or WPA just like we would in any sport. These stats are simply the measures of utility of any event. We can measure the effectiveness of players, squads, types of plays, types of defenses, substitution patterns--just about anything there is data for.
Closer to Home
Watching my son's team so far this season, I can see where just a cursory review of some good stats could help. The team is usually outscored but has a scoring rate equal to any of their opponents. It's just that unlike many sports, possessions are not guaranteed to be equal (or within 1). The team has lost not because their offense or defense struggles, but simply because they lose the battle for possession.
The coaches tend to focus practices on set offense and defense--running plays or countering plays. But I'd question how much that stuff really matters, at least at this level. The key to victory appears to be winning the battle of possession, which comes from winning faceoffs and lose balls. I'd recommend devoting more time in practice to those skills.
Face offs tend to be won by the fastest midfielder not directly engaged the face-off. Each team may have 2 midfielders several yards away from the point of the face off, one on each side of the field, while the two representatives from each team struggle for possession. At this level, it appears that the face off competitors rarely get the ball themselves, but it squirts out several yards, almost like a tipoff in basketball. I'd recommend making your two very fastest kids be the midfielders who race in for the lose ball, regardless of their otherwise natural position. Once possession is obtained, substitute as needed. (In lacrosse, substitutions are fluid just like in hockey.)
The real focus should be teaching the kids the sport and preparing them for the next level, or so we say. But the reality is that everyone, from the coaches to the kids to the screamer parents, really want to win.
Tuning the Model
This model happens to be tuned for the scoring patterns of Virginia high school lacrosse. But there are only a handful of parameters that are easily enough obtained for any level of play, including the top teams in the NCAA. Scoring rates and efficiency are they only inputs. Specific opponents can be modeled just as easily as an entire league.
For more lacrosse analysis, I'd recommend starting with this series by Michael Mouboussin.
SB Nation has a blog on college lacrosse with some analytic-esque posts, found here.
Here is a slide deck from James Piette about bringing 'Moneyball' to lacrosse.
A website called Tempo Free Lax has some good stats and data. Here's a write-up about the site.
Search here for over 1,500 research and analysis articles at AFA and the AFA Archives.
Dave asks Brian about his 'knapsack' model for building an NFL roster and about the trade-offs the NFL GM’s must make. Brian goes on to explain how he re-examined the landmark Massey-Thaler draft surplus study with updated data. Brian also previews the newly improved Bayesian Draft Prediction tool and the new Draft Trade Evaluator tool, and explains how teams and fans can use them during the draft .