Recently there has been some discussion about the use of WAR in determining or at least discussing an MVP candidate for position players (pitchers are eligible too for MVP, obviously, and WAR includes defense and base running, but I am restricting my argument to position players and offensive WAR). Judging from the comments and questions coming my way, many people don’t understand exactly what WAR measures, how it is constructed, and what it can or should be used for.
In a nutshell, offensive WAR takes each of a player’s offensive events in a vacuum, without regard to the timing and context of the event or whether that event actually produced or contributed to any runs or wins, and assigns a run value to it, based on the theoretical run value of that event (linear weights), adds up all the run values, converts them to theoretical “wins” by dividing by some number around 10, and then subtracts the approximate runs/wins that a replacement player would have in that many PA. A replacement player produces around 20 runs less than average for every 650 PA, by definition. This can vary a little by defensive position and by era. And of course a replacement player is defined as the talent/value of a player who can be signed for the league minimum even if he is not protected (a so-called “freely available player”).
For example, let’s say that a player had 20 singles, 5 doubles, 1 triple, 4 HR, 10 non-intentional BB+HP, and 60 outs in 100 PA. The approximate run values for these events are .47, .78, 1.04, 1.40, .31, and -.25. These values are marginal run values and by definition are above or below a league average position player. So, for example, if a player steps up to the plate and gets a single, on the average he will generate .47 more runs than 1 generic PA of a league average player. These run values and the zero run value of a PA for a league average player assume the player bats in a random slot in the lineup, on a league average team, in a league average park, against a league-average opponent, etc.
If you were to add up all those run values for our hypothetical player, you would get +5 runs. That means that theoretically this player would produce 5 more runs than a league-average player on a league average team, etc. A replacement player would generate around 3 fewer runs than a league average player in 100 PA (remember I said that replacement level was around -20 runs per 650 PA), so our hypothetical player is 8 runs above replacement in those 100 PA.
The key here is that these are hypothetical runs. If that player produced those offensive events while in a league average context an infinite number of times he would produce exactly 5 runs more than an average player would produce in 100 PA and his team would win around .5 more games (per 100 PA) than an average player and .8 more games (and 8 runs) than a replacement player.
In reality, for those 100 PA, we have no idea how many runs or wins our player contributed to. On the average, or after an infinite number of 100 PA trials, his results would have produced an extra 5 runs and 1/2 win, but in one 100 PA trial, that exact result is unlikely, just like in 100 flips of a coin, exactly 50 heads and tails is an unlikely though “mean” or “average” event. Perhaps 15 or those 20 singles didn’t result in a single run being produced. Perhaps all 4 of his HR were hit after his team was down by 5 or 10 runs and they were meaningless. On the other hand, maybe 10 of those hits were game winning hits in the 9th inning. Similarly, of those 60 outs, what if 10 times there was a runner on third and 0 or 1 out, and our player struck out every single time? Alternatively, what if he drove in the runner 8 out of 10 times with an out, and half the time that run amounted to the game winning run? WAR would value those 10 outs exactly the same in either case.
You see where I’m going here? Context is ignored in WAR (for a good reason, which I’ll get to in a minute), yet context is everything in an MVP discussion. Let me repeat that: Context is everything in an MVP discussion. An MVP is about the “hero” nature of a player’s seasonal performance. How much did he contribute to his team’s wins and to a lesser extent, what did those wins mean or produce (hence, the “must be on a contending team” argument). Few rational people are going to consider a player MVP-quality if little of his performance contributed to runs and wins no matter how “good” that performance was in a vacuum. No one is going to remember a 4 walk game when a team loses in a 10-1 blowout. 25 HR with most of them occurring in losing games, likely through no fault of the player? Ho-hum. 20 HR, where 10 of them were in the latter stages of a close game and directly led to 8 wins? Now we’re talking possible MVP! .250 wOBA in clutch situations but .350 overall? Choker and bum, hardly an MVP.
I hope you are getting the picture. While there are probably several reasonable ways to define an MVP and reasonable and smart people can legitimately debate about whether it is Trout, Miggy, Kershaw or Goldy, I think that most reasonable people will agree that an MVP has to have had some – no a lot – of articulable performance contributing to actual, real-life runs and wins, otherwise that “empty WAR” is merely a tree falling in the forest with no one to hear it.
So what is WAR good for and why was it “invented?” Mostly it was invented as a way to combine all aspects of a player’s performance – offense, defense, base running, etc. – on a common scale. It was also invented to be able to estimate player talent and to project future performance. For that it is nearly perfect. The reason it ignores context is because we know that context is not part of a player’s skill set to any significant degree. Which also means that context-non-neutral performance is not predictive – if we want to project future performance, we need a metric that strips out context – hence WAR.
But, for MVP discussions? It is a terrible metric for the aforementioned reasons. Again, regardless of how you define MVP caliber performance, almost everyone is in agreement that it includes and needs context, precisely that which WAR disdains and ignores. Now, obviously WAR will correlate very highly with non-context-neutral performance. That goes without saying. It would be unlikely that a player who is a legitimate MVP candidate does not have a high WAR. It would be equally unlikely that a player with a high WAR did not specifically contribute to lots of runs and wins and to his team’s success in general. But that doesn’t mean that WAR is a good metric to use for MVP considerations. Batting average correlates well with overall offensive performance and pitcher wins correlate well with good pitching performance, but we would hardly use those two stats to determine who was the better overall batter or pitcher. And to say, for example, that Trout is the proper MVP and not Cabrera because Trout was 1 or 2 WAR better than Miggy, without looking at context, is an absurd and disingenuous argument.
So, is there a good or at least a better metric than WAR for MVP discussions? I don’t know. WPA perhaps. WPA in winning games only? WPA with more weight for winning games? RE27? RE27, again, adjusted for whether the team won or lost or scored a run or not? It is not really important what you use for these discussions by why you use them. It is not so much that WAR is a poor metric for determining an MVP. It is using WAR without understanding what it means and why it is a poor choice for an MVP discussion in and of itself, that is the mistake. As long as you understand what each metric means (including traditional mundane ones like RBI, runs, etc.), how it relates to the player in question and the team’s success, feel free to use whatever you like (hopefully a combination of metrics and statistics) – just make sure you can justify your position in a rational, logical, and accurate fashion.