There’s been much research and many articles over the years with respect to hitter (and other) aging curves. (I even came across in a Google search a fascinating component aging curve for PGA golfers!) I’ve publicly and privately been doing aging curves for 20 years. So has Tango Tiger. Jeff Zimmerman has also been prolific in this regard. Others have contributed as well. You can Google them if you want.

Most of the credible aging curves use some form of the *delta method* which is described in this excellent series on aging by the crafty *n’er do well,* MGL. If you’re too lazy to look it up, the *delta method* basically is this, from the article:

The “delta method” looks at all players who have played in back-to-back years. Many players have several back-to-back year “couplets,” obviously. For every player, it takes the difference between their rate of performance in Year I and Year II and puts that difference into a “bucket,” which is defined by the age of the player in those two years….

When we tally all the differences in each bucket and divide by the number of players, we get the average change from one age to the next for every player who ever played in at least one pair of back-to-back seasons in. So, for example, for all players who played in their age 29 and 30 seasons, we get the simple average of the rate of change in offensive performance between 29 and 30.

That’s really the only way to do an aging curve, as far as I know, unless you want to use an opaque statistical method like J.C Bradbury did back in 2009 (you can look that up too). One of the problems with aging curves, which I also discuss in the aforementioned article, and one that comes up a lot in baseball research, is *survivorship bias.* I’ll get to that in a paragraph or two.

Let’s say we want to use the *delta method *to compute the average change in wOBA performance from age 29 to 30. To do that, we look at all players who played in their age 29 and age 30 years, record each player’s difference, weight it by some number of PA (maybe the lesser of the two – either year 1 or year 2, maybe the harmonic mean of the two, or maybe weight them all equally – it’s hard to say), and then take the simple weighted average of all the differences. For example, say we have two players. Player A has a .300 wOBA in his age 29 season in 100 PA and a .290 wOBA in his age 30 season in 150 PA. Player B is .320 in year one in 200 PA and .300 in year two in 300 PA. Using the *delta method* we get a difference of -.010 (a decline) for player A weighted by, say, 100 PA (the lesser of 100 and 150), and a difference of -.020 for Player B in 200 PA (also the lesser of the two PA). So we have an average decline in our sample of (10 * 100 + 20 * 200) / (300), or 16.67 points of wOBA decline. We would do the same for all age intervals and all players and if we *chain* them together we get an *aging curve *for the average MLB player.

There are issues with that calculation, such as our choice to weight each player’s difference by the “lesser of the two PA,” what it means to compute “an average decline” for that age interval (since it includes all types of players, part-time, full-time, etc.) and especially what it means when we chain every age interval together to come up with an aging curve for the *average major league player* when it’s really a compendium of a whole bunch of players all with different career lengths at different age intervals.

Typically when we construct an aging curve, we’re not at all looking at the careers of any individual players. If we do that, we end up with severe selective sampling and survivorship problems. I’m going to ignore all of these issues and focus on survivorship bias only. It has the potential to be extremely problematic, even when using the *delta method*.

Let’s say that a player is becoming a marginal player for whatever reason, perhaps it is because he is at the end of his career. Let’s also say that we have a bunch of players like that and their true talent is a wOBA of .280. If we give them 300 PA, half will randomly perform better than that and half will randomly perform worse than that simply because 300 PA is just a random sample of their talent. In fact, we know that the random standard deviation of wOBA in 300 trials is around 25 points in wOBA, such that 5% of our players, whom we *know* have a true talent of .280, will actually hit .230 or less by chance alone. That’s a fact. There’s nothing they or anyone else can do about it. No player has an “ability” to fluctuate *less* than random variance tells is in any specific number of PA. There might be something about them that creates *more* variance on the average, but it is mathematically impossible to have *less* (actually the floor is a bit higher than that because of varying opponents and conditions).

Let’s assume that all players who hit less than .230 will retire or be cut – they’ll never play again, at least not in the following season. That is not unlike what happens in real life when a marginal player has a bad season. He almost always gets fewer PA the following season than he would have gotten had he not had an unlucky season. In fact, not playing at all is just a subset of playing less – both are examples of survivorship bias and create problems with aging curves. Let’s see what happens to our aging interval with these marginal players when 5% of them don’t play the next season.

We know that this entire group of players are .280 hitters because we said so. If 5% of them hit, on average, .210, then the other 95% must have hit .284 since the whole group *must *hit .280 – that’s their true talent. This is just a typical season for a bunch of .280 hitters. Nothing special going on here. We could have split them up any way we wanted, as long as in the aggregate they hit at their true talent level.

Now let’s say that these hitters are in their age 30 season and they are supposed to decline by 10 points in their age 31 season. If we do an aging calculation on these players in a typical pair of seasons we absolutely should see .280 in the first year and .270 in the second. In fact, if we let all our players play a random or a fixed number of PA in season two, that is exactly what we *would* see. It has to be. It is a mathematical certainty, given everything we stated. However survivorship bias screws up our numbers and results in an incorrect aging value from age 30 to age 31. Let’s try it.

Only 95% of our players play in season two, so 5% drop out of our sample, at least from age 30 to age 31. There’s nothing we can do about that. When we compute a traditional aging curve using the delta method, we only use numbers from *pairs of years*. We can never use the last year of a player’s career *as the first year in a year pairing*. We don’t have any information about that player’s next season. We can use a player’s last year, say, at age 30 in an *age 29 to 30 *pairing but not in a *30 to 31 *pairing. Remember that the *delta method *always uses age pairings for each player in the sample.

What do those 95% hit in season one? Remember they are true .280 hitters. Well, they don’t hit .280. I already said that they hit .284. That is because they got a little lucky. The ones that got really unlucky to *balance out *the lucky ones, are not playing in season two, and thus dropped out of our aging curve sample. What do these true .280 players (who hit .284) hit in season two? Season two is an unbiased sample of their true talent. We know that their true talent was .280 in season one and we know that from age 30 to age 31 all players will lose 10 points in true talent because we said so. So they will naturally hit .270 in year two.

What does our *delta method *calculation tell us about how players age from age 30 to age 31? It tells us they lose 14 points in wOBA and not 10! It’s giving us a wrong answer because of survivorship bias. Had those other 5% of players played, they would have also hit .270 in year two and when we add everyone up, including the unlucky players, we would come up with the correct answer of a 10-point loss from age 30 to age 31 (the unlucky players would have *improved* in year two by 60 points).

One way to avoid this problem (survivorship bias will always make it look like players lose more or gain less as they age because the players that drop out from season to season always, on the average, got unlucky in season one) is to ignore the last season of a player’s career in our calculations. That’s fine and dandy, but survivorship bias exists in every year of a player’s career. As I wrote earlier, dropping out is just a small subset of this bias. Every player that gets unlucky in one season will see fewer PA in his next season, which creates the same kind of erroneous results. For example, if the 5% of unlucky players *did* play in season two, but only got 50 PA whereas the other 95% of slightly lucky players got 500 PA, we would still come up with a decline of more than 10 points of wOBA – again an incorrect answer.

To correct for this survivorship bias, which really wreaks havoc with aging curves, a number of years ago, I decided to add a *phantom* year for players after their last season of action. For that year, I used a projection – our best estimate of what they would have done had they been allowed to play another year. That reduced the survivorship bias but it didn’t nearly eliminate it because, as I said, every player suffers from it in reduced PA for unlucky players and increased PA for lucky ones, in their subsequent seasons.

Not only that, but we get the same effect within years. If two players have .300 wOBA true talents, but player A hits worse than .250 by luck alone in his first month (which will happen more than 16% of the time) and player B hits .350 or more, who do you think will get more playing time for the remainder of the season even though *we know* that they have the same talent, and that both, on the average, will hit exactly .300 for the remainder of the season?

I finally came up with a comprehensive solution based on the following thought process: If we were conducting an experiment, how would we approach the question of computing aging intervals? We would record every player’s *season one* (which would be an unbiased sample of his talent, so no problem so far) and then we would guarantee that every player would get X number of PA the next season, preferably something like 500 or 600 to create large samples of seasonal data. We would also give everyone a large number of PA in all *season ones* too, but it’s not really necessary.

How do we do that? We merely extend *season two* data using projections, just as I did in adding *phantom* seasons after a player’s career was over (or he missed a season in the middle of his career). Basically I’m doing the same thing, whether I’m adding 600 PA to a player who didn’t play (the *phantom* season) or I’m adding 300 PA to a player who only had 300 PA in season two. By doing this I am completely eliminating survivorship bias. Of course this correction method lives or dies with how accurate the projections are but even a simple projection system like Marcel will suffice when dealing with a large number of players of different talent levels. Now let’s get to the results.

I looked at all players from 1977 to 2016 and I park and league adjusted their wOBA for each season. Essentially I am using wOBA+. I also only looked at seasonal pairs (with a minimum of 10 PA in each season) where the player played on the same team. I didn’t have to do that, but my sample was large enough that I felt that the reduction in sample size was worth getting rid of any park biases even though I was dealing with park- adjusted numbers.

Using the delta method with *no survivorship bias *other than ignoring the last year of every player’s career, this is the aging curve I arrived at after chaining all of the *deltas.* This is the typical curve you will see in most of the prior research.

**1977-2016 Aging Curve using Delta Method Without Correcting for Survivorship Bias**

Here is the same curve after completing all *season two’s *with projections. For example, let’s say that a player is projected to hit .300 in his age 30 season and he hits .250 in only 150 PA (his manager benches him because he’s hit so poorly). His *in-season* projection would change because of the .250. It might now be .290. So I *complete* a 600 PA season by adding 450 PA of .290 hitting to the 150 PA of .250 hitting for a complete season of .280 in 600 PA.

If that same player hits .320 in season two in 620 PA then I add nothing to his season two data. Only players with less than 600 PA have their seasons *completed* with projections. How do I weight the season pairs? Without any *completion correction, *as in* *the first curve above, I weighted each season pair by the harmonic mean of the two PA. With correction, as in the second curve above, I weighted each pair by the number of PA in season one. This corrects for *intra-season *survivorship bias in season one as well.

**1977-2016 Aging Curve using Delta Method and Correcting for Survivorship Bias**

You can see that in the first curve, *uncorrected for survivorship bias*, players gain around 40 points in wOBA from age 21 to age 27, seven points per year, plateau from age 27 to 28, then decline by also around seven points a year after that. In the second curve, *after we correct for survivorship bias*, we have a slightly quicker ascension from age 21 to 26, more than eight points per year, a plateau from age 26 to age 27, then a *much slower decline* at around 3 points per year.

Keep in mind that these curves represent all players from 1977 to 2016. It is likely that aging has changed significantly from era to era due to medical advances, PED use and the like. In fact, if we limit our data to 2003 and later, after the so called *steroid era*, we get an *uncorrected curve* that plateaus between ages 24-28 and then declines by an average of 9 points a year from age 28 to 41.

In my next installment I’ll do some survivorship corrections for components like strikeout and walk percentage.

“Of course this correction method lives or dies with how accurate the projections are but even a simple projection system like Marcel will suffice when dealing with a large number of players of different talent levels.”

Let’s look at that.

Your phantom data comes from a projection system which has an aging curve built in already, right? Start with the thought experiment for age 30-31. The only variable is age, and all the dropouts at age 31 are added back in as average performers (according to the known age curve of the thought experiment). You “know” the real talent and the real decline, such that the both the continuing and the missing players would have performed at .270 wOBA. In the thought experiment you compute the delta including the phantom years, and you get back what you built in, a 10 point average decline.

What happens when you switch to the real data for 1977-2016? Effectively you are doing something which is kind of Bayesian – the aging curve in your projections for the phantom performances get weighted together with the observed aging curve for the survivors’ performances. How heavily the aging curve assumed by the projection influences the results depends mostly on what the falloff rate really is – off the top of my head, 30-40% of the PA might disappear from age 30 to 31, so projections would provide about 30+% of the year 2 “performance” and also the delta.

Do we have confidence that you have found the “real” aging curves? I think that depends on how robust the results would be for the deltas if you re-computed them on new samples. The robustness of the result will be somewhat sensitive to how accurate the age adjustments were in the projection to begin with, and also sensitive to how much phantom data was used to compute the delta for a particular age. ( I expect your method uses a much higher proportion of phantom data in ages 35-36 than for ages 25-26).

Also your method is asymmetrical, and that may matter for getting a correct curve for the younger ages – most 23 year old hitters will not have any year 1 (age 22) data in the majors – isn’t this the exact reverse of the problem for older players? In the majors presumably we will be biased toward better, luckier performers composing the year 1/age 22 side of the delta. Just as players past their peak with declining ability are more likely to drop out out of the sample when they have some bad luck, so also for these players aging toward their peak, their improved ability at 23 means they can stick around and get major league PA (“drop in”) without needing as much good luck as they would have at 22. The graph you show “with corrections” for 21 to 27 is not much different from the uncorrected graph – I think you might see still steeper curves for these ages if you used a similar approach to impute missing year 1 data from from good minor league equivalencies.

I don’t know how easy it would be for you to do, but could you split the sample in two (e.g. for age 30-31, consider only players who were 30 in an odd numbered year for the first sample, construct your corrected aging curve, then revise the projection system to use that aging curve, and run the process on sample 2: players who were 30 in an even numbered year)? If you’re close to having the “real” aging curves now, I think the curves from the two samples should be very similar…

“every player’s season one (which would be an unbiased sample of his talent)”

Season one still has the in-season effects you mentioned, though? Lucky rookies get to stay up, unlucky ones are sent down. An even bigger effect for rookies than veterans, I’m sure, since managers probably overweight the MLB data over the MiLB history, and rookies have no MLB history.

So I think the curve for young players could use further correction.

[…] prime lasted from ages 27-31, or thereabouts. Nowadays it’s closer to 25-29. (Here are some aging curve studies.) Either way, 27-31 or 25-29, Castro is right smack in the middle of it. Age 27 is a […]

[…] one uses the delta model aging curve, harmonic mean, or a multiple regression technique to design an aging curve, Braun is already […]