Note: I updated the pinch hitting data to include a larger sample (previously I went back to 2008. Now, 2000).

Note: It was pointed out by a commenter below and another one on Twitter that you can’t look only at innings where the #9 and #1 batters batted (eliminating innings where the #1 hitter led off), as Russell did in his study, and which he uses to support his theory (he says that it is the best evidence). That creates a huge bias, of course. It eliminates all PA in which the #9 hitter made the last out of an inning or at least an out was made while he was at the plate. In fact, the wOBA for a #9 hitter, who usually bats around .300, is .432 in innings where he and the #1 hitter bat (after eliminating so many PA in which an out was made). How that got past Russell, I have no idea. Perhaps he can explain.

Recently, Baseball Prospectus published an article by one of their regular writers, Russell Carleton (aka Pizza Cutter), in which he examined whether the so-called “times through the order” penalty (TTOP) was in fact a function of how many times a pitcher has turned over the lineup in a game or whether it was merely an artifact of a pitcher’s pitch count. In other words, is it pitcher fatigue or batter familiarity (the more the batter sees the pitcher during the game, the better he performs) which causes this effect?

It is certainly possible that most or all of the TTOP is really due to fatigue, as “times through the order” is clearly a proxy for pitch count. In any case, after some mathematic gyrations that Mr. Carleton is want to do (he is the “Warning: Gory Mathematical Details Ahead” guy) in his articles, he concludes unequivocally that there is no such thing as a TTOP – that it is really a PCP or Pitch Count Penalty effect that makes a pitcher less and less effective as he goes through the order and it has little or nothing to do with batter/pitcher familiarity. In fact, in the first line of his article, he declares, “There is no such thing as the ‘times through the order’ penalty!”

If that is true, this is a major revelation which has slipped through the cracks in the sabermetric community and its readership. I don’t believe it is, however.

As one of the primary researchers (along with Tom Tango) of the TTOP, I was taken quite aback by Russell’s conclusion, not because I was personally affronted (the “truth” is not a matter of opinion), but because my research suggested that pitch count or fatigue was likely not a significant part of the penalty. In my BP article on the TTOP a little over 2 years ago, I wrote this: “…the TTOP is not about fatigue. It is about familiarity. The more a batter sees a pitcher’s delivery and repertoire, the more likely he is to be successful against him.” What was my evidence?

First, I looked at the number of pitches thrown going into the second, third, and fourth times through the order. I split that up into two groups—a low pitch count and a high pitch count. Here are those results. The numbers in parentheses are the average number of pitches thrown going into that “time through the order.”

Times Through the Order |
Low Pitch Count |
High Pitch Count |

1 | .341 | .340 |

2 | .351 (28) | .349 (37) |

3 | .359 (59) | .359 (72) |

4 | .361 (78) | .360 (97) |

If Russell’s thesis were true, you should see a little more of a penalty in the “high pitch count” column on the right, which you don’t. The penalty appears to be the same regardless of whether the pitcher has thrown few or many pitches. To be fair, the difference in pitch count between the two groups is not large and there is obviously sample error in the numbers.

The second way I examined the question was this: I looked only at individual batters in each group who had seen few or many pitches in their prior PA. For example, I looked at batters in their second time through the order who had seen fewer than three pitches in their first PA, and also batters who saw more than four pitches in their first PA. Those were my two groups. I did the same thing for each time through the order. Here are those results. The numbers in parentheses are the average number of pitches seen in the prior PA, for every batter in the group combined.

Times Through the Order |
Low Pitch Count each Batter |
High Pitch Count each Batter |

1 | .340 | .340 |

2 | .350 (1.9) | .365 (4.3) |

3 | .359 (2.2) | .361 (4.3) |

As you can see, if a batter sees more pitches in his first or second PA, he performs better in his next PA than if he sees fewer pitches. The effect appears to be much greater from the first to the second PA. This lends credence to the theory of “familiarity” and not pitcher fatigue. It is unlikely that 2 or 3 extra pitches would cause enough fatigue to elevate a batter’s wOBA by 8.5 points per PA (the average of 15 and 2, the “bonuses” for seeing more pitches during the first and second PA, respectively).

So how did Russell come to his conclusion and is it right or wrong? I believe he made a fatal flaw in his methodology which led him to a faulty conclusion (that the TTOP does not exist).

Among other statistical tests, here is the primary one which led Russell to conclude that the TTOP is a mirage and merely a product of pitcher fatigue due to an ever-increasing pitch count:

This time, I tried something a little different. If we’re going to see a TTOP that is drastic, the place to look for it is as the lineup turns over. I isolated all cases in which a pitcher was facing the ninth batter in the lineup for the second time and then the first batter in the lineup for the third time. To make things fair, neither hitter was allowed to be the pitcher (this essentially limited the sample to games in AL parks), and the hitters needed to be faced

in the same inning. Now, because the leadoff hitter is usually a better hitter, we need to control for that. I created a control variable for all outcomes using the log odds ratio method, which controls for the skills of the batter, as well as that of the pitcher. I also controlled for whether or not the pitcher had the platoon advantage in either case.

First of all, there was no reason to limit the data to “the same inning”. Regardless of whether the pitcher faces the 9^{th} and 1^{st} batters in the same inning or they are split up (the 9 hitter makes the last out), since one naturally follows the other, they will always have around the same pitch count, and the leadoff hitter will always be one time through the order ahead of the number nine hitter.

Anyway, what did Russell find? He found that TTOP was not a predictor of outcome. In other words, that the effect on the #9 hitter was the same as the #1 hitter, even though the #1 hitter had faced the pitcher one more time than the #9 hitter.

I thought about this for a long time and I finally realized why that would be the case even if there *was* a “times order” penalty (mostly) independent of pitch count. Remember that in order to compare the effect of TTO on that #9 and #1 hitter, he had to control for the overall quality of the hitter. The last hitter in the lineup is going to be a much worse hitter overall than the leadoff hitter, on the average, in his sample.

So the results should look something like this if there were a true TTOP: Say the #9 batters are normally .300 wOBA batters, and the leadoff guys are .330. In this situation, the #9 batters should bat around .300 (during the second time through the order we see around a normal wOBA) but the leadoff guys should bat around .340 – they should have a 10 point wOBA bonus for facing the pitcher for the third time.

Russell, without showing us the data (he should!), presumably gets something like .305 for the #9 batters (since the pitcher has gone essentially 2 ½ times through the lineup, pitch count-wise) and the leadoff hitters should hit .335, or 5 points above their norm as well (maybe .336 since they are facing a pitcher with a few more pitches under his belt than the #9 hitter).

So if he gets those numbers, .335 and .305, is that evidence that there is *no* TTOP? Do we need to see numbers like .340 and .300 to support the TTOP theory rather than the PCP theory? I submit that even if Russell sees numbers like the former ones, that is *not *evidence that there is no TTOP and it’s all about the pitch count. I believe that Russell made a fatal error.

Here is where he went wrong:

Remember that he uses the log-odds method to computer the baseline numbers, or what he would expect from a given batter-pitcher matchup, based on their overall season numbers. In this experiment, there is no need to do that, since both batters, #1 and #9, are facing the same pitcher the same number of times. All he has to do is use each batter’s seasonal numbers to establish the base line.

But where do those base lines come from? Well, it is likely that the #1 hitters are mostly #1 hitters throughout the season and that #9 hitters usually hit at the bottom of the order. #1 hitters get around 150 more PA than #9 hitters over a full season. Where do those extra PA come from? Some of them come from relievers of course. But many of them come from facing the starting pitcher more often per game than those bottom-of-the-order guys. In addition, #9 hitters sometimes are removed for pinch hitters late in a game against a starter such that they lose even more of those 3^{rd} and 4^{th} time through the order PA’s. Here is a chart of the mean TTO per game versus the starting pitcher for each batting slot:

Batting Slot |
Mean TTO/game |

1 | 2.15 |

2 | 2.08 |

3 | 2.02 |

4 | 1.98 |

5 | 1.95 |

6 | 1.91 |

7 | 1.86 |

8 | 1.80 |

9 | 1.77 |

(By the way, if Russell’s thesis is true, bottom of the order guys have it even easier, since they are *always* batting when the pitcher has a higher pitch count, per time through the order. Also, this is the *first* time you have been introduced to the concept that the top of the order batters have it a little easier than the bottom of the order guys, and that switching spots in the order can affect overall performance because of the TTOP or PCP.)

What that does is result in the baseline for the #1 hitter being higher than for the #9 hitter, because the baseline includes more pitcher TTOP (more times facing the starter for the 3rd and 4th times). That makes it look like the #1 hitter is not getting his advantage as compared to the #9 hitter, or at least he is only getting a partial advantage in Russell’s experiment.

In other words, the #9 hitter is really a true .305 hitter and the #1 hitter is really a true .325 hitter, even though their seasonal stats suggest .300 and .330. The #9 hitters are being hurt by not facing starters late in the game compared to the average hitter and the #1 hitters are being helped by facing starters for the 3rd and 4th times more often than the average hitter.

So if #9 hitters are really .305 hitters, then the second time through the order, we expect them to hit .305, if the TTOP is true. If the #1 hitters are really .325 hitters, despite hitting .330 for the whole season, we expect them to hit .335 the third time through the order, if the TTOP is true. And that is exactly what we see (presumably).

But when Russell sees .305 and .335 he concludes, “no TTOP!” He sees what he thinks is a true .300 hitter hitting .305 after the pitcher has thrown around 65 pitches and what he thinks is a true .330 hitter hitting .335 after 68 or 69 pitches. He therefore concludes that both hitters are being affected equally even though one is batting for the second time and the other for the third time – thus, there is no TTOP!

As I have shown, those numbers are perfectly consistent with a TTOP of around 8-10 points per times through the order, which is exactly what we see.

Finally, I ran one other test which I think can give us more evidence one way or another. I looked at pinch hitting appearances against starting pitchers. If the TTOP is real and pitch count is not a significant factor in the penalty, we should see around the same performance for pinch hitters regardless of the pitcher’s pitch count, since the pinch hitter always faces the pitcher for the first time and the first time only. In fact, this is a test that Russell probably should have run. The only problem is sample size. Because there are relatively few pinch hitting PA versus starting pitchers, we have quite a bit of sample error in the numbers. I split the sample of pinch hitting appearances up into 2 groups: Low pitch count and high pitch count.

Here is what I got:

PH TTO |
Overall |
Low Pitch Count |
High Pitch Count |

2 | .295 (PA=4901) | .295 (PA=2494) | .293 (PA=2318) |

3 | .289 (PA=10774) | .290 (PA=5370) | .287 (PA=5404) |

I won’t comment on the fact that the pinch hitters performed a little better against pitchers with a low pitch count (the differences are not nearly statistically significant) other than to say that there is no evidence that pitch count has any influence on the performance of pinch hitters who are naturally facing pitchers for the first and only time. Keep in mind that the times through the order (the left column) is a good proxy for pitch count in and of itself and we also see no evidence that *that* makes a difference in terms of pinch hitting performance. In other words, if pitch count significantly influenced pitching effectiveness, we should see pinch hitters overall performing better when the pitcher is in the midst of his 3^{rd} time through the order as opposed to the 2^{nd} time (his pitch count would be around 30-35 pitches higher). We don’t. In fact, we see a worse performance (the difference is not statistically significant – one SD is 8 points of wOBA).

I have to say that it is difficult to follow Russell’s chain of logic and his methodology in many of his articles because he often fails to “show his work” and he uses somewhat esoteric and opaque statistical techniques only. In this case, I believe that he made a fatal mistake in his methodology as I have described above which led him to the erroneous conclusion that, “The TTOP does not exist.” I believe that I have shown fairly strong evidence that the penalty that we see pitchers incur as the game wears on is mostly or wholly as a result of the TTO and not due to fatigue caused by an increasing pitch count.

I look forward to someone doing additional research to support one theory or the other.

Pretty clear there is enough noise here to say that we do not yet know whether there there is a familiarity effect or not. I.e., you’re both wrong.

To be honest, I have never eliminated the possibility that fatigue plays a factor. I say as much in all of my articles. So I don’t think it is fair to say that, “I am wrong.”

The evidence points in the direction of TTOP and not pitch count as I have explained. None of the evidence in my research points towards pitch count as the primary culprit. That is all we can do in science is to conjecture in the direction of the evidence until if and when further (and hopefully better) evidence points us in another direction.

I don’t think this question is intractable. I think with more and better research we will be more certain as to the cause of the penalty.

You point out that Carlton only counts PA where the 9 and 1 hitters bat in the same inning. Won’t that artificially remove many PAs where the 9 hitter makes an out (while not removing any where he reaches base)? That is basically cherry-picking PAs in which the 9 hitter is more likely to reach base safely, which would make his wOBA slightly higher and make up at least some of any TTOP advantage that the 1 hitter gets for seeing a pitcher the third time.

Someone brought this up on Twitter, and I thought it was a good point. I re-ran without the restriction that it be in the same inning. (Introduces a slightly different bias, but I think there’s an argument that it’s a less onerous one.) All significant findings disappeared, so I’m less convinced of that particular line of analysis, but I was more convinced to begin with by the second line of analysis, which was the 10-18 vs. 19-27 equations.

Fair enough. Can you run a regression with pinch hitters and pitch count? I don’t know that that would “clear up” the issue because of sample size problems, but it would certainly provide some solid evidence, no?

Great catch!

[…] on Baseball asks Does familiarity breed contempt or is fatigue not the best pillow? FanGraphs Previews the Best and Worst Team Defenses for 2016. BeyondTheBoxScore says It could […]

[…] Does familiarity breed contempt or is fatigue not the best pillow? […]

[…] than it is to do with the number of times through the order, but I won’t get into that here: Lichtman posted a thorough rebuttal to the notion on his site if you’re interested in the mathematical critique). This theory didn’t seem to be a […]