There’s been much research and many articles over the years with respect to hitter (and other) aging curves. (I even came across in a Google search a fascinating component aging curve for PGA golfers!) I’ve publicly and privately been doing aging curves for 20 years. So has Tango Tiger. Jeff Zimmerman has also been prolific in this regard. Others have contributed as well. You can Google them if you want.
Most of the credible aging curves use some form of the delta method which is described in this excellent series on aging by the crafty n’er do well, MGL. If you’re too lazy to look it up, the delta method basically is this, from the article:
The “delta method” looks at all players who have played in back-to-back years. Many players have several back-to-back year “couplets,” obviously. For every player, it takes the difference between their rate of performance in Year I and Year II and puts that difference into a “bucket,” which is defined by the age of the player in those two years….
When we tally all the differences in each bucket and divide by the number of players, we get the average change from one age to the next for every player who ever played in at least one pair of back-to-back seasons in. So, for example, for all players who played in their age 29 and 30 seasons, we get the simple average of the rate of change in offensive performance between 29 and 30.
That’s really the only way to do an aging curve, as far as I know, unless you want to use an opaque statistical method like J.C Bradbury did back in 2009 (you can look that up too). One of the problems with aging curves, which I also discuss in the aforementioned article, and one that comes up a lot in baseball research, is survivorship bias. I’ll get to that in a paragraph or two.
Let’s say we want to use the delta method to compute the average change in wOBA performance from age 29 to 30. To do that, we look at all players who played in their age 29 and age 30 years, record each player’s difference, weight it by some number of PA (maybe the lesser of the two – either year 1 or year 2, maybe the harmonic mean of the two, or maybe weight them all equally – it’s hard to say), and then take the simple weighted average of all the differences. For example, say we have two players. Player A has a .300 wOBA in his age 29 season in 100 PA and a .290 wOBA in his age 30 season in 150 PA. Player B is .320 in year one in 200 PA and .300 in year two in 300 PA. Using the delta method we get a difference of -.010 (a decline) for player A weighted by, say, 100 PA (the lesser of 100 and 150), and a difference of -.020 for Player B in 200 PA (also the lesser of the two PA). So we have an average decline in our sample of (10 * 100 + 20 * 200) / (300), or 16.67 points of wOBA decline. We would do the same for all age intervals and all players and if we chain them together we get an aging curve for the average MLB player.
There are issues with that calculation, such as our choice to weight each player’s difference by the “lesser of the two PA,” what it means to compute “an average decline” for that age interval (since it includes all types of players, part-time, full-time, etc.) and especially what it means when we chain every age interval together to come up with an aging curve for the average major league player when it’s really a compendium of a whole bunch of players all with different career lengths at different age intervals.
Typically when we construct an aging curve, we’re not at all looking at the careers of any individual players. If we do that, we end up with severe selective sampling and survivorship problems. I’m going to ignore all of these issues and focus on survivorship bias only. It has the potential to be extremely problematic, even when using the delta method.
Let’s say that a player is becoming a marginal player for whatever reason, perhaps it is because he is at the end of his career. Let’s also say that we have a bunch of players like that and their true talent is a wOBA of .280. If we give them 300 PA, half will randomly perform better than that and half will randomly perform worse than that simply because 300 PA is just a random sample of their talent. In fact, we know that the random standard deviation of wOBA in 300 trials is around 25 points in wOBA, such that 5% of our players, whom we know have a true talent of .280, will actually hit .230 or less by chance alone. That’s a fact. There’s nothing they or anyone else can do about it. No player has an “ability” to fluctuate less than random variance tells is in any specific number of PA. There might be something about them that creates more variance on the average, but it is mathematically impossible to have less (actually the floor is a bit higher than that because of varying opponents and conditions).
Let’s assume that all players who hit less than .230 will retire or be cut – they’ll never play again, at least not in the following season. That is not unlike what happens in real life when a marginal player has a bad season. He almost always gets fewer PA the following season than he would have gotten had he not had an unlucky season. In fact, not playing at all is just a subset of playing less – both are examples of survivorship bias and create problems with aging curves. Let’s see what happens to our aging interval with these marginal players when 5% of them don’t play the next season.
We know that this entire group of players are .280 hitters because we said so. If 5% of them hit, on average, .210, then the other 95% must have hit .284 since the whole group must hit .280 – that’s their true talent. This is just a typical season for a bunch of .280 hitters. Nothing special going on here. We could have split them up any way we wanted, as long as in the aggregate they hit at their true talent level.
Now let’s say that these hitters are in their age 30 season and they are supposed to decline by 10 points in their age 31 season. If we do an aging calculation on these players in a typical pair of seasons we absolutely should see .280 in the first year and .270 in the second. In fact, if we let all our players play a random or a fixed number of PA in season two, that is exactly what we would see. It has to be. It is a mathematical certainty, given everything we stated. However survivorship bias screws up our numbers and results in an incorrect aging value from age 30 to age 31. Let’s try it.
Only 95% of our players play in season two, so 5% drop out of our sample, at least from age 30 to age 31. There’s nothing we can do about that. When we compute a traditional aging curve using the delta method, we only use numbers from pairs of years. We can never use the last year of a player’s career as the first year in a year pairing. We don’t have any information about that player’s next season. We can use a player’s last year, say, at age 30 in an age 29 to 30 pairing but not in a 30 to 31 pairing. Remember that the delta method always uses age pairings for each player in the sample.
What do those 95% hit in season one? Remember they are true .280 hitters. Well, they don’t hit .280. I already said that they hit .284. That is because they got a little lucky. The ones that got really unlucky to balance out the lucky ones, are not playing in season two, and thus dropped out of our aging curve sample. What do these true .280 players (who hit .284) hit in season two? Season two is an unbiased sample of their true talent. We know that their true talent was .280 in season one and we know that from age 30 to age 31 all players will lose 10 points in true talent because we said so. So they will naturally hit .270 in year two.
What does our delta method calculation tell us about how players age from age 30 to age 31? It tells us they lose 14 points in wOBA and not 10! It’s giving us a wrong answer because of survivorship bias. Had those other 5% of players played, they would have also hit .270 in year two and when we add everyone up, including the unlucky players, we would come up with the correct answer of a 10-point loss from age 30 to age 31 (the unlucky players would have improved in year two by 60 points).
One way to avoid this problem (survivorship bias will always make it look like players lose more or gain less as they age because the players that drop out from season to season always, on the average, got unlucky in season one) is to ignore the last season of a player’s career in our calculations. That’s fine and dandy, but survivorship bias exists in every year of a player’s career. As I wrote earlier, dropping out is just a small subset of this bias. Every player that gets unlucky in one season will see fewer PA in his next season, which creates the same kind of erroneous results. For example, if the 5% of unlucky players did play in season two, but only got 50 PA whereas the other 95% of slightly lucky players got 500 PA, we would still come up with a decline of more than 10 points of wOBA – again an incorrect answer.
To correct for this survivorship bias, which really wreaks havoc with aging curves, a number of years ago, I decided to add a phantom year for players after their last season of action. For that year, I used a projection – our best estimate of what they would have done had they been allowed to play another year. That reduced the survivorship bias but it didn’t nearly eliminate it because, as I said, every player suffers from it in reduced PA for unlucky players and increased PA for lucky ones, in their subsequent seasons.
Not only that, but we get the same effect within years. If two players have .300 wOBA true talents, but player A hits worse than .250 by luck alone in his first month (which will happen more than 16% of the time) and player B hits .350 or more, who do you think will get more playing time for the remainder of the season even though we know that they have the same talent, and that both, on the average, will hit exactly .300 for the remainder of the season?
I finally came up with a comprehensive solution based on the following thought process: If we were conducting an experiment, how would we approach the question of computing aging intervals? We would record every player’s season one (which would be an unbiased sample of his talent, so no problem so far) and then we would guarantee that every player would get X number of PA the next season, preferably something like 500 or 600 to create large samples of seasonal data. We would also give everyone a large number of PA in all season ones too, but it’s not really necessary.
How do we do that? We merely extend season two data using projections, just as I did in adding phantom seasons after a player’s career was over (or he missed a season in the middle of his career). Basically I’m doing the same thing, whether I’m adding 600 PA to a player who didn’t play (the phantom season) or I’m adding 300 PA to a player who only had 300 PA in season two. By doing this I am completely eliminating survivorship bias. Of course this correction method lives or dies with how accurate the projections are but even a simple projection system like Marcel will suffice when dealing with a large number of players of different talent levels. Now let’s get to the results.
I looked at all players from 1977 to 2016 and I park and league adjusted their wOBA for each season. Essentially I am using wOBA+. I also only looked at seasonal pairs (with a minimum of 10 PA in each season) where the player played on the same team. I didn’t have to do that, but my sample was large enough that I felt that the reduction in sample size was worth getting rid of any park biases even though I was dealing with park- adjusted numbers.
Using the delta method with no survivorship bias other than ignoring the last year of every player’s career, this is the aging curve I arrived at after chaining all of the deltas. This is the typical curve you will see in most of the prior research.
1977-2016 Aging Curve using Delta Method Without Correcting for Survivorship Bias
Here is the same curve after completing all season two’s with projections. For example, let’s say that a player is projected to hit .300 in his age 30 season and he hits .250 in only 150 PA (his manager benches him because he’s hit so poorly). His in-season projection would change because of the .250. It might now be .290. So I complete a 600 PA season by adding 450 PA of .290 hitting to the 150 PA of .250 hitting for a complete season of .280 in 600 PA.
If that same player hits .320 in season two in 620 PA then I add nothing to his season two data. Only players with less than 600 PA have their seasons completed with projections. How do I weight the season pairs? Without any completion correction, as in the first curve above, I weighted each season pair by the harmonic mean of the two PA. With correction, as in the second curve above, I weighted each pair by the number of PA in season one. This corrects for intra-season survivorship bias in season one as well.
1977-2016 Aging Curve using Delta Method and Correcting for Survivorship Bias
You can see that in the first curve, uncorrected for survivorship bias, players gain around 40 points in wOBA from age 21 to age 27, seven points per year, plateau from age 27 to 28, then decline by also around seven points a year after that. In the second curve, after we correct for survivorship bias, we have a slightly quicker ascension from age 21 to 26, more than eight points per year, a plateau from age 26 to age 27, then a much slower decline at around 3 points per year.
Keep in mind that these curves represent all players from 1977 to 2016. It is likely that aging has changed significantly from era to era due to medical advances, PED use and the like. In fact, if we limit our data to 2003 and later, after the so called steroid era, we get an uncorrected curve that plateaus between ages 24-28 and then declines by an average of 9 points a year from age 28 to 41.
In my next installment I’ll do some survivorship corrections for components like strikeout and walk percentage.