Monday, November 08, 2010

What Best Predicts a First Round MLS Playoff Winner?

Can you name one thing that all four MLS playoff quarterfinal winners had in common this past season?

They all played fewer games than the teams that they defeated:

SJ 31 vs 34 NY
COL 32 vs 42 CLB
LA 34 vs 42 SEA
DAL 31 vs 38 RSL

Once again, the curse of the CONCACAF Champions League comes into play. Only one MLS team which has made the group stage has won a playoff series (Houston 2009) in six tries. This year, Columbus, Real Salt Lake, and Seattle all are now included in that 1-5 record.

I want to talk a bit more about what are the best predictors of a team advancing. I looked at all 32 quarterfinal matchups since the current two-leg aggregate format has been in place (2003), and I compared each team in a variety of statistics. It turns out that the best predictor of the ones I looked at is actually fewer games played, just beating out goal difference.

Won Lost Even PCT
18 7 7 0.672 fewer games
21 11
0.656 goal difference
19 12 1 0.609 less experienced coach
19 13
0.594 momentum
19 13
0.594 higher seed
17 11 4 0.594 top goalscorer
18 13 1 0.578 goals for
18 14
0.563 home record
18 14
0.563 older team
17 15
0.531 away record
17 15
0.531 younger coach
12 10 10 0.531 won season series
16 13 3 0.547 goals against
16 16
0.500 2 year record
14 18
0.438 3 year record
13 19
0.406 coach has better all time record

All records used were regular season only, and teams who didn't have a 2 or 3 year record, I used the years they had. For results from 1996-9 (such as with the coach records), shootouts were counted as draws. PPG was used for all comparisons. Momentum was a comparison of the final 5 regular season games. For this table, each "even" listing is counted like it's usually done in sports: half a win and half a loss.

If you have any other ideas to look at, please comment below.

Labels: ,

Comments on "What Best Predicts a First Round MLS Playoff Winner?"

 

Anonymous ambrown said ... (5:59 AM, November 08, 2010) : 

I don't think this could possibly affect the outcome of these games, but sadly for the MLS, the team with the lower season average attendance advanced in all four series as well.

 

Anonymous Anonymous said ... (11:11 AM, November 08, 2010) : 

What about number of injuries and red card suspensions?

Also, I know you only have 32 observations, but it would be cool to do this in a regression setting where you could run a true horse race between the different candidates.

 

Blogger Phillip Foose said ... (11:27 AM, November 08, 2010) : 

How about team playoff experience? Coaching playoff experience?

 

Blogger Zach Slaton said ... (2:40 PM, November 08, 2010) : 

Sadly, even the "fewer games" category is not statistically significant when compared to the 50/50 chance most teams have of advancing. Using this site (http://www.stat.ubc.ca/~rollin/stats/ssize/b1.html) and entering the following values:
p0 = 0.5
p1 = 0.672
1 sided Test
and default alpha and power levels
Yields a sample size of 51. This means this proportion of win percentage would have to be viewed over 51 matches to determine that fewer games was a good predictor vs. a coin flip.

What may be more interesting is plotting win percentage vs. the actual number of fewer games. One could then run a Pearson correlation test to determine if correlation did exist, and if it did then run a linear regression to determine the strength. This could even be run for various metrics (win percentage, goal differential, etc.).

Sorry to be such a stickler, but I come from a viewpoint that most sports statistics actually aren't statistically significant. And it's a good thing they aren't - it's the random, 50/50 nature of any one sporting event that makes them interesting.

 

Anonymous Ryan said ... (7:03 PM, November 08, 2010) : 

Zach,

Very good points.

If you only count the scenarios where there is a team who as actually played fewer games then the percentage is 18/25 or 72%.

At that percentage, to get 95% statistical confidence we'd need 30 series, which is pretty close to where we are. However, to get 90% confidence, we'd only need 22 series.

It's obviously not as significant as you'd hope (yet), but I think it's a solid trend.

 

Anonymous Ryan said ... (7:05 PM, November 08, 2010) : 

Also, scaryice, I'd love to look at how the Elo ratings I developed (http://mlselo.f2f2s.com) are at predicting playoff performance.

If you can share the matchups, winners, and who had fewer games played, I can match up their values at the end of the regular seasons.

 

Blogger Daniel said ... (3:09 AM, November 09, 2010) : 

MLS is experiencing growing pains. San Jose and Colorado play for the Eastern Conference final? How absurd. They should have simply changed the named to MLS Cup semifinals. New York won the East, Galaxy and West. The playoffs should eliminate all the conference names, unless they plan to have four from each conference qualify.

MLS needs to help San Jose acquire a real stadium, and not one with an opened end as it is currently planned (please!). The Quakes have a rich successful history in MLS. Build it and they will come. The Quakes only bring about 9000 per game, but that's because Buck Shaw only holds 10000. That's 90% filled. More people would attend a professional soccer game not played at an amateur stadium.

 

Blogger scaryice said ... (3:46 AM, November 09, 2010) : 

Phillip -

Looking at team playoff experience, the team with more previous playoff games is 17-14-1.

That's kind of unfair since the early years had more playoff games, so I also looked at just the total number of years a team made the playoffs.

The teams with more previous years in the playoffs were 12-11-9.

 

Blogger scaryice said ... (4:30 AM, November 09, 2010) : 

I forgot when writing the post that Houston did win last year, so I edited the CCL record to 1-5 rather than 0-6. It's still a curse, just less of one. :)

 

Blogger scaryice said ... (4:43 AM, November 09, 2010) : 

Zach, I don't mind you being a stickler. I really appreciate comments like yours, since I'm not so knowledgeable when it comes to actual statistical analysis (go figure).

Ryan, here's the list of matchups along with the difference in games played (winners on the left):

2008 NY vs HOU -14
2007 CHI vs DC -10
2010 COL vs CLB -10
2010 LA vs SEA -8
2008 CHI vs NE -7
2010 DAL vs RSL -7
2009 RSL vs CLB -6
2003 SJ vs LA -4
2005 CHI vs DC -4
2005 COL vs DAL -4
2008 RSL vs CHV -4
2009 LA vs CHV -3
2010 SJ vs NY -3
2003 NE vs NY -1
2003 KC vs COL -1
2004 NE vs CLB -1
2004 KC vs SJ -1
2008 CLB vs KC -1
2003 CHI vs DC 0
2004 DC vs NY 0
2004 LA vs COL 0
2005 NE vs NY 0
2006 NE vs CHI 0
2006 COL vs DAL 0
2007 KC vs CHV 0
2006 DC vs NY 1
2009 CHI vs NE 1
2005 LA vs SJ 2
2006 HOU vs CHV 2
2007 HOU vs DAL 2
2007 NE vs NY 3
2009 HOU vs SEA 5

 

Blogger Zach Slaton said ... (1:53 PM, November 09, 2010) : 

Ryan -

I agree. The trend makes sense, and I'd love to be able to point to it being statistically significant after next year's season.

I think this points to the challenges of MLS being a salary capped league vs. Europe not being one. The elite European teams who regularly qualify for UCL have two things going for them:
1) Qualifying for Champions League - finishing top of the table - is also the thing that determines who's the champion in their domestic league.
2) The fact that they can spend as much money as they like (ignoring the soon-to-be-phased in Fair Play ruels), which means their Starting XI is better than everyone else and typically their talent on the bench is too.
Combine these two, and you see why this is likely less an issue in UEFA leagues than it is in MLS. MLS is playing a delicate balancing match here - trying to keep costs from running away, trying to continuously improve the league internally, and improve it's stature within CONCACAF.

Let's see how much of this LA team returns next year, and see how they do if they get serious about US Open Cup and do well in CCL.

Interesting conundrum for my Sounders though - do they ditch the effort for the US Open Cup three-peat, play scrubs in CCL, and focus on getting their first MLS playoff win and (hopefully) an MLS Cup?

Scaryice -

Thanks for the encouragement. My stats nerdiness sometimes comes off as abbrasive. Nonetheless, what you've highlighted is very cool data. Do you have the goal differential data for each of the series/matches? I could combine that with the matche differential data you already provdided to make a nice correlation/regression analysis.

 

Anonymous Ryan said ... (9:36 PM, November 09, 2010) : 

scary,

Thanks for the stats. The higher rated team in Elo wins 21/32 times:


2007 HOU vs DAL 2 ELO-diff: 92.41
2006 HOU vs CHV 2 ELO-diff: 56.99
2008 CLB vs KC -1 ELO-diff: 56.84
2006 DC vs NY 1 ELO-diff: 53.85
2003 CHI vs DC 0 ELO-diff: 53.84
2005 NE vs NY 0 ELO-diff: 43.24
2009 CHI vs NE 1 ELO-diff: 42.04
2007 NE vs NY 3 ELO-diff: 40.12
2008 CHI vs NE -7 ELO-diff: 34.88
2003 NE vs NY -1 ELO-diff: 34.07
2010 LA vs SEA -8 ELO-diff: 27.33
2006 NE vs CHI 0 ELO-diff: 23.31
2003 SJ vs LA -4 ELO-diff: 20.61
2003 KC vs COL -1 ELO-diff: 16.81
2004 LA vs COL 0 ELO-diff: 12.96
2005 COL vs DAL -4 ELO-diff: 12.82
2009 LA vs CHV -3 ELO-diff: 12.48
2004 DC vs NY 0 ELO-diff: 12.3
2010 COL vs CLB -10 ELO-diff: 8.94
2009 HOU vs SEA 5 ELO-diff: 8.71
2004 KC vs SJ -1 ELO-diff: 4.91
2008 RSL vs CHV -4 ELO-diff: -30.67
2010 DAL vs RSL -7 ELO-diff: -39.07
2010 SJ vs NY -3 ELO-diff: -41.22
2006 COL vs DAL 0 ELO-diff: -51.11
2004 NE vs CLB -1 ELO-diff: -61.74
2009 RSL vs CLB -6 ELO-diff: -68.02
2007 KC vs CHV 0 ELO-diff: -70.75
2007 CHI vs DC -10 ELO-diff: -77.37
2005 CHI vs DC -4 ELO-diff: -81.97
2005 LA vs SJ 2 ELO-diff: -99.84
2008 NY vs HOU -14 ELO-diff: -105.34

What's interesting is that there is only one case where a team played more games and had a lower Elo rating and still managed to win the series (LA in 2005)

 

Blogger scaryice said ... (10:55 PM, November 11, 2010) : 

Alright, here's the same list with the first number being the difference in games played, and the second the goal differential of the series:

2008 NY vs HOU -14 3
2007 CHI vs DC -10 1
2010 COL vs CLB -10 0
2010 LA vs SEA -8 2
2008 CHI vs NE -7 3
2010 DAL vs RSL -7 1
2009 RSL vs CLB -6 2
2003 SJ vs LA -4 1
2005 CHI vs DC -4 4
2005 COL vs DAL -4 0
2008 RSL vs CHV -4 1
2009 LA vs CHV -3 1
2010 SJ vs NY -3 1
2003 NE vs NY -1 2
2003 KC vs COL -1 2
2004 NE vs CLB -1 1
2004 KC vs SJ -1 1
2008 CLB vs KC -1 2
2003 CHI vs DC 0 4
2004 DC vs NY 0 4
2004 LA vs COL 0 1
2005 NE vs NY 0 1
2006 NE vs CHI 0 0
2006 COL vs DAL 0 0
2007 KC vs CHV 0 1
2006 DC vs NY 1 1
2009 CHI vs NE 1 1
2005 LA vs SJ 2 2
2006 HOU vs CHV 2 1
2007 HOU vs DAL 2 2
2007 NE vs NY 3 1
2009 HOU vs SEA 5 1

 

Blogger Zach Slaton said ... (11:51 PM, November 15, 2010) : 

Thanks for posting those. I am on my honeymoon right now, but I will do some analysis of the data when I get back and will send a link to the results.

 

Blogger Zach Slaton said ... (7:14 PM, December 14, 2010) : 

Any way you can post the goal differential and coach experience differential data as well?

 

post a comment