Archive for the ‘Bullpen Management’ Category

Last night in game 4 of the 2017 World Series, the Astros manager, A.J. Hinch, sort of a sabermetric wunderkind, at least as far as managers go (the Astros are one of the more, if not the most, analytically oriented teams), brought in their closer, Ken Giles, to pitch the 9th in a tie game. This is standard operating procedure for the sabemetrically inclined team – bring in your best pitcher in a tie game in the 9th inning or later, especially if you’re the home team, where you’ll never have the opportunity to protect a lead. The reasoning is simple: You want to guarantee that you’ll use your best pitcher in the 9th or later inning, in a high leverage situation (in the 9+ inning of a tie game, the LI is always at least 1.73 to start the inning).

So what’s the problem? Hinch did exactly what he was supposed to do. It is more or less the optimal move, although it depends a bit on the quality of that closer against the batters he’s going to face, as opposed to the alternative (as well as other bullpen considerations). In this case, it was Giles versus, say, Devenski. Let’s look at their (my) normalized (4.00 is average) runs allowed per 9 inning projections:

Devenski: 3.37

That’s a very good reliever. That’s closer quality although not elite closer quality.

Giles: 2.71

That is an elite closer. In fact, I have Giles as the 6th best closer in baseball. The gap between the two pitchers is pretty substantial, .66 runs per 9 innings. For one inning with a leverage index (LI) of 2.0, that translates to a 1.5% win expectancy (WE) advantage for Giles over Devenski. As one-decision “swings” (the difference between the optimal and a sub-optimal move) go, that’s considered huge. Of course, if you are going to use Giles later in the game anyway if you stay with Devenski for another inning or two, if the game goes that long, you get some of that WE back. Not all of it (because he may not get to pitch), but some of it. Anyway, that’s not really the issue I want to discuss.

Why were many of the so-called sabermetric writers (they often know just enough about sabermetrics or mathematical/logical thinking in general to be “dangerous,” although that’s a bit unfair on my part – let’s just say they know enough to be “right” much of the time, but “wrong” some of the time) aghast, or at least critical, of this seemingly correct move?

First, it was due to the result of course, which belies the fact that these are sabermetric writers. The first thing they teach you in sabermetrics 101 is not to be results oriented. For the most part, the results of a decision have virtually no correlation with the “correctness” of the decision itself. Sure, some of them will claim that they thought or even publicly said beforehand that it was the wrong move, and some of them are not lying – but it doesn’t really matter. That’s only one reason why lots of people were complaining of this move – maybe even the secondary reason (or not the reason at all), especially for the saber-writers.

The primary reason (again, at least stated – I’m 100% certain that the result strongly influenced nearly all of the detractors) was that these naysayers had little or no confidence in Giles going into this game. He must have had a bad season, right, despite my stellar projection? After all, good projection systems use 3, 4 or more years of data along with a healthy dose of regression, especially with relievers who never have a large sample size of innings pitched or batters faced. Occasionally you can have a great projection for a player who had a mediocre or poor season, and that projection will be just as reliable as any other (because the projection model accurately includes the current season, but doesn’t give it as much weight as nearly all fans and media do). So what were Giles’ 2017 numbers?

Only a 2.30 ERA and 2.39 FIP in a league where the average ERA was 4.37! His career ERA and FIP are 2.43 and 2.25, and he throws 98 mph. He’s a great pitcher. One of the best. There’s little doubt that’s true. But….

He’s thrown terribly thus far in the post-season. That is, his results have been poor. In 7.2 IP his ERA is 11.74. Of course he’s also struck out 10 and has a BABIP of .409. But he “looked terrible” these naysayers keep saying. Well, no shit. When you give up 10 runs in 7.2 innings on the biggest stage in sports, you’re pretty much going to “look bad.” Is there any indication, other than having poor results, that there’s “something wrong with Giles?” Given that his velocity is fine (97.9 so far) and that Hinch saw fit to remove Devenski who was “pitching well” and insert Giles in a critical situation, I think we can say with some certainty that there is no indication that anything is wrong with him. In fact, the data, such as his 12 K/9 rate, normal velocity, and an “unlucky” .409 BABIP, all suggest that there is nothing “wrong with him.” But honestly, I’m not here to discuss that kind of thing. I think it’s a futile and silly discussion. I’ve written many times how the notion that you can just tell (or that a manager can tell – which is not the case here, since Hinch was the one who decided to use him!) when a player is hot or cold by observing him is one of the more silly myths in sports, at least in baseball, and I have reams of data-driven evidence to support that assertion.

What I’m interested in discussing right now, is, “What do the data say?” How do we expect a reliever to pitch after 6 or 7 innings or appearances in which he’s gotten shelled? It doesn’t have to be 7 IP of course, but for research like this, it doesn’t matter. Whatever you find in 7 IP you’re going to find in 5 IP or in 12 IP, assuming you have large enough sample sizes and you don’t get really unlucky with a Type I or II error. The same goes for what constitutes getting shelled compared to how you perceive or define “getting shelled.” With research like this, it doesn’t matter. Again, you’re going to get the same answer whether you define getting shelled (or pitching brilliantly) by wOBA against, runs allowed, hard hit balls, FIP, etc. It also doesn’t matter what thresholds you set – you’ll also likely get the same answer.

Here’s what I did to answer this question – or at least to shed some light on it. I looked at all relievers over the last 10 years and split them up into three groups, depending on how they pitched in all 6-game sequences. Group I pitched brilliantly over a 6-game span. The criteria I set was a wOBA against less than .175. Group III were pitchers who got hammered over a 6-game stretch, at least as far as wOBA was concerned (of course in large samples you will get equivalent RA for these wOBA). They allowed a wOBA of at least .450.  Group II was all the rest. Here are what the groups looked like:

Group Average wOBA against Equivalent RA9
I .130 Around 0
II .308 Around 3
III .496 Around 10

 

Then I looked at their very next appearance. Again, I could have looked at their next 2 or 3 appearances but it wouldn’t make any difference (other than increasing the sample size – at the risk of the “hot” or “cold” state wearing off).

 

Group Average wOBA against wOBA next appearance
I .130 .307
II .308 .312
III .496 .317

 

While we certainly don’t see a large carryover effect, we do appear to see some effect. The relievers who have been throwing brilliantly continue to pitch 10 points better than the ones who have been getting hammered. 10 points in wOBA is equivalent to about .3 runs per 9 innings, so that would make a pitcher like Giles closer to Devenski, but still not quite there. But wait! Are these groups of pitchers of the same quality? No. The ones who were pitching brilliantly belong to a much better pool of pitchers than the ones who were getting hammered. Much better. This should not be surprising. I already assumed that when doing the research. How much better? Let’s look at their seasonal numbers (those will be a little biased because we already established that these groups pitched brilliantly or terribly for some period of time in the same season).

Group Average wOBA against wOBA next appearance Season wOBA
I .130 .307 .295
II .308 .312 .313
III .496 .317 .330

 

As you can see our brilliant pitchers are much better than our terrible ones. Even if we were able to back out the bias (say, by looking at last year’s wOBA), we still get .305 for the brilliant relievers and .315 for the hammered ones, based on the previous season’s numbers. In fact, we’ll use those instead.

Group Average wOBA against wOBA next appearance Prior season wOBA
I .130 .307 .305
II .308 .312 .314
III .496 .317 .315

 

Now that’s brilliant. We do have some sample error. The number of PA in the “next appearance” for group’s I and III are around 40,000 each (SD of wOBA = 2 points). However, look at the “expected” wOBA against, which is essentially the pitcher talent (Giles’ and Devenski’s projections) compared to their actual. They are almost identical. Regardless of how a reliever has pitched in his last 6 appearances, he pitches exactly as his normal projection would suggest on that 7th appearance. The last 6 IP has virtually no predictive value even at the extremes. I don’t want to hear, “Well he really (really, really) been getting hammered – what about that big shot?”.  Allowing a .496 wOBA is getting really, really, really hammered, and .130 is throwing almost no-hit baseball, so we’ve already looked at the extremes!

So, as you can clearly see, and exactly what you should have expected, if you really knew about sabermetrics (unlike some of these so-called saber-oriented writers and pundits who like to cherry pick the sabermetric principles that suit their narratives and biases), is that 7 IP of pitching compared to 150 or more, is almost worthless information. The data don’t lie.

But you just know that something is wrong with Giles, right? You can just tell. You are absolutely certain that he’ll continue to pitch badly. You just knew that he was going to implode again last night (and you haven’t been wrong about that 90% of the time in your previous feelings). It’s all bullshit folks. But if it makes you feel smart or happy, it’s fine by me. I have nothing invested in all of this. I’m just trying to find the truth. It’s the nature of my personality. That makes me happy.

Advertisements

In response to my two articles on whether pitcher performance over the first 6 innings is predictive of their 7th inning performance (no), a common response from saber and non-saber leaning critics and commenters goes something like this:

No argument with the results or general method, but there’s a bit of a problem in selling these findings. MGL is right to say that you can’t use the stat line to predict inning number 7, but I would imagine that a lot of managers aren’t using the stat line as much as they are using their impression of the pitcher’s stuff and the swings the batters are taking.

You hear those kinds of comments pretty often even when a pitcher’s results aren’t good, “they threw the ball pretty well,” and “they didn’t have a lot of good swings.”

There’s no real way to test this and I don’t really think managers are particularly good at this either, but it’s worth pointing out that we probably aren’t able to do a great job capturing the crucial independent variable.

That is actually a comment on The Book Blog by Neil Weinberg, one of the editors of Beyond the Box Score and a sabermetric blog writer (I hope I got that somewhat right).

My (edited) response on The Book Blog was this:

Neil I hear that refrain all the time and with all due respect I’ve never seen any evidence to back it up. There is plenty of evidence, however, that for the most part it isn’t true.

If we are to believe that managers are any good whatsoever at figuring out which pitchers should stay and which should not, one of two things must be true:

1) The ones who stay must pitch well, especially in close games. That simply isn’t true.

2) The ones who do not stay would have pitched terribly. In order for that to be the case, we must be greatly under-estimating the TTO penalty. That strains credulity.

Let me explain the logic/math in # 2:

We have 100 pitchers pitching thru 6 innings. Their true talent is 4.0 RA9. 50 of them stay and 50 of them go, or some other proportion – it doesn’t matter.

We know that those who stay pitch to the tune of around 4.3. We know that. That’s what the data say. They pitch at the true talent plus the 3rd TTOP, after adjusting for the hitters faced in the 7th inning.

If we are to believe that managers can tell, to any extent whatsoever, whether a pitcher is likely to be good or bad in the next inning or so, then it must be true that the ones who stay will pitch better on the average then the ones who do not, assuming that the latter were allowed to stay in the game of course.

So let’s assume that those who were not permitted to continue would have pitched at a 4.8 level, .5 worse than the pitchers who were deemed fit to remain.

That tells us that if everyone were allowed to continue, they would pitch collectively at a 4.55 level, which implies a .55 rather than a .33 TTOP.

Are we to believe that the real TTOP is a lot higher than we think, but is depressed because managers know when to take pitchers out such that the ones they leave in actually pitch better than all pitchers would if they were all allowed to stay?

Again, to me that seems unlikely.

Anyway, here is some new data which I think strongly suggests that managers and pitching coaches have no better clue than you or I as to whether a pitcher should remain in a game or not. In fact, I think that the data suggest that whatever criteria they are using, be it runs allowed, more granular performance like K, BB, and HR, or keen, professional observation and insight, it is simply not working at all.

After 6 innings, if a game is close, a manager should make a very calculated decision as far as whether or not he should remove his starter. That decision ought to be based primarily on whether the manager thinks that his starter will pitch well in the 7th and possibly beyond, as opposed to one of his back-end relievers. Keep in mind that we are talking about general tendencies which should apply in close games going into the 7th inning. Obviously every game may be a little different in terms of who is on the mound, who is available in the pen, etc. However, in general, when the game is close in the 7th inning and the starter has already thrown 6 full, the decision to yank him or allow him to continue pitching is more important than when the game is not close.

If the game is already a blowout, it doesn’t matter much whether you leave in your starter or not. It has little effect on the win expectancy of the game. That is the whole concept of leverage. In cases where the game is not close, the tendency of the manager should be to do whatever is best for the team in the next few games and in the long run. That may be removing the starter because he is tired and he doesn’t want to risk injury or long-term fatigue. Or it may be letting his starter continue (the so-called “take one for the team” approach) in order to rest his bullpen. Or it may be to give some needed work to a reliever or two.

Let’s see what managers actually do in close and not-so-close games when their starter has pitched 6 full innings and we are heading into the 7th, and then how those starters actually perform in the 7th if they are allowed to continue.

In close games, which I defined as a tied or one-run game, the starter was allowed to begin the 7th inning 3,280 times and he was removed 1,138 times. So the starter was allowed to pitch to at least 1 batter in the 7th inning of a close game 74% of the time. That’s a pretty high percentage, although the average pitch count for those 3,280 pitcher-games was only 86 pitches, so it is not a complete shock that managers would let their starters continue especially when close games tend to be low scoring games. If a pitcher is winning or losing 2-1 or 3-2 or 1-0 or the game is tied 0-0, 1-1, 2-2, and the starter’s pitch count is not high, managers are typically loathe to remove their starter. In fact, in those 3,280 instances, the average runs allowed for the starter through 6 innings was only 1.73 runs (a RA9 of 2.6) and the average number of innings pitched beyond 6 innings was 1.15.

So these are presumably the starters that managers should have the most confidence in. These are the guys who, regardless of their runs allowed, or even their component results, like BB, K, and HR, are expected to pitch well into the 7th, right? Let’s see how they did.

These were average pitchers, on the average. Their seasonal RA9 was 4.39 which is almost exactly league average for our sample, 2003-2013 AL. They were facing the order for the 3rd time on the average, so we expect them to pitch .33 runs worse than they normally do if we know nothing about them.

These games are in slight pitcher’s parks, average PF of .994, and the batters they faced in the 7th were worse than average, including a platoon adjustment (it is almost always the case that batters faced by a starter in the 7th are worse than league average, adjusted for handedness). That reduces their expected RA9 by around .28 runs. Combine that with the .33 run “nick” that we expect from the TTOP and we expect these pitchers to pitch at a 4.45 level, again knowing nothing about them other than their seasonal levels and attaching a generic TTOP penalty and then adjusting for batter and park.

Surely their managers, in allowing them to pitch in a very close game in the 7th know something about their fitness to continue – their body language, talking to their catcher, their mechanics, location, past experience, etc. All of this will help them to weed out the ones who are not likely to pitch well if they continue, such that the ones who are called on to remain in the game, the 74% of pitchers who face this crossroad and move on, will surely pitch better than 4.45, which is about the level of a near-replacement reliever.

In other words, if a manager thought that these starters were going to pitch at a 4.45 level in such a close game in the 7th inning, they would surely bring in one of their better relievers – the kind of pitchers who typically have a 3.20 to 4.00 true talent.

So how did these hand-picked starters do in the 7th inning? They pitched at a 4.70 level. The worst reliever in any team’s pen could best that by ½ run. Apparently managers are not making very good decisions in these important close and late game situations, to say the least.

What about in non-close game situations, which I defined as a 4 or more run differential?

73% of pitchers who pitch through 6 were allowed to continue even in games that were not close. No different from the close games. The other numbers are similar too. The ones who are allowed to continue averaged 1.29 runs over the first 6 innings with a pitch count of 84, and pitched an average of 1.27 innings more.

These guys had a true talent of 4.39, the same as the ones in the close games – league average pitchers, collectively. They were expected to pitch at a 4.50 level after adjusting for TTOP, park and batters faced. They pitched at a 4.78 level, slightly worse than our starters in a close game.

So here we have two very different situations that call for very different decisions, on the average. In close games, managers should (and presumably think they are) be making very careful decision about whom to pitch in the 7th, trying to make sure that they use the best pitcher possible. In not-so-close games, especially blowouts, it doesn’t really matter who they pitch, in terms of the WE of the game, and the decision-making goal should be oriented toward the long-term.

Yet we see nothing in the data that suggests that managers are making good decisions in those close games. If we did, we would see much better performance from our starters than in not-so-close games and good performance in general. Instead we see rather poor performance, replacement level reliever numbers in the 7th inning of both close and not-so-close games. Surely that belies the, “Managers are able to see things that we don’t and thus can make better decisions about whether to leave starters in or not,” meme.

Let’s look at a couple more things to further examine this point.

In the first installment of these articles I showed that good or bad run prevention over the first 6 innings has no predictive value whatsoever for the 7th inning. In my second installment, there was some evidence that poor component performance, as measured by in-game, 6-inning FIP had some predictive value, but not good or great component performance.

Let’s see if we can glean what kind of things managers look at when deciding to yank starters in the 7th or not.

In all games in which a starter allows 1 or 0 runs through 6, even though his FIP was high, greater than 4, suggesting that he really wasn’t pitching such a great game, his manager let him continue 78% of the time, which was more than the 74% overall that starters pitched into the 7th.

In games where the starter allowed 3 or more runs through 6 but had a low FIP, less than 3, suggesting that he pitched better than his RA suggest, managers let them continue to pitch just 55% of the time.

Those numbers suggest that managers pay more attention to runs allowed than component results when deciding whether to pull their starter in the 7th. We know that that is not a good decision-making process as the data indicate that runs allowed have no predictive value while component results do, at least when those results reflect poor performance.

In addition, there is no evidence that managers can correctly determine who should stay and who to pull in close games – when that decision matters the most. Can we put to rest, for now at least, this notion that managers have some magical ability to figure out which of their starters has gas left in their tank and which do not? They don’t. They really, really, really don’t.

Note: “Guy,” a frequent participant on The Book Blog, pointed out an error I have been making in calculating the expected RA9 for starters. I have been using their season RA9 as the baseline, and then adjusting for context. That is wrong. I must consider the RA9 of the first 6 innings and then subtract that from the seasonal RA9. For example if a group of pitchers has a RA9 for the season of 4.40 and they have a RA9 of 1.50 for the first 6 innings, if they average 150 IP for the season, our baseline adjusted expectation for the 7th inning, not considering any effects from pitch count, TTOP, manager’s decision to let them continue, etc., is 73.3 (number of runs allowed over 150 IP for the season) minus 1 run for 6 innings, or 72.3 runs over 144 innings, which is an expected RA9 of 4.52, .12 runs higher than the seasonal RA9 of 4.40.

The same goes for the starters who have gotten shelled through 6. Their adjusted expected RA9 for any other time frame, e.g., the 7th inning, is a little lower than 4.40 if 4.40 is their full-season RA9. How much lower depends on the average number of runs allowed in those 6 innings. If it is 4, then we have 73.3 – 4, or 69.3, divided by 144, times 9, or 4.33.

So I will adjust all my numbers to the tune of .14 runs up for dealing pitchers and .07 down for non-dealing pitchers. The exact adjustments might vary a little from these, depending on the average number of runs allowed over the first 6 innings in the various groups of pitchers I looked at.

The other day I wrote that pitcher performance though 6 innings, as measured solely by runs allowed, is not a good predictor of performance in the 7th inning. Whether a pitcher is pitching a shutout or has allowed 4 runs thus far, his performance in the 7th is best projected mostly by his full-season true talent level plus a times through the order penalty of around .33 runs per 9 innings (the average batter faced in the 7th inning appears for the 3rd time). Pitch count has a small effect on those late inning projections as well.

Obviously if you have allowed no or even 1 run through 6 your component results will tend to be much better than if you have allowed 3 or 4 runs, however there is going to be some overlap. Some small proportion of 0 or 1 run starters will have allowed a HR, 6 or 7 walks and hits, and few if any strikeouts. Similarly, some small percentage of pitchers who allow 3 or 4 runs through 6 will have struck out 7 or 8 batters and only allowed a few hits and walks.

If we want to know whether pitching ”well” or not through 6 innings has some predictive value for the 7th (and later) inning, it is better to focus on things that reflect the pitcher’s raw performance than simply runs allowed. It is an established fact that pitchers have little control over whether their non-HR batted balls fall for hits or outs or whether their hits and walks get “clustered” to produce lots of runs or are spread out such that few if any runs are scored.

It is also established that the components most under control by a pitcher are HR, walks, and strikeouts, and that pitchers who excel at the K, and limit walks and HR tend to be the most talented, and vice versa. It also follows that when a pitcher strikes out a lot of batters in a game and limits his HR and walks total that he is pitching “well,” regardless of how many runs he has allowed – and vice versa.

Accordingly, I have extended my inquiry into whether pitching “well” or not has some predictive value intra-game to focus on in-game FIP rather than runs allowed.  My intra-game FIP is merely HR, walks, and strikeouts per inning, using the same weights as are used in the standard FIP formula – 13 for HR, 3 for walks and 2 for strikeouts.

So, rather than defining dealing as allowing 1 or fewer runs through 6 and not dealing as 3 or more runs, I will define the former as an FIP through 6 innings below some maximum threshold and the latter as above some minimum threshold. Although I am not nearly convinced that managers and pitching coaches, and certainly not the casual fan, look much further than runs allowed, I think we can all agree that they should be looking at these FIP components instead.

Here is the same data that I presented in my last article, this time using FIP rather than runs allowed to differentiate pitchers who have been pitching very well through 6 innings or not.

Pitchers who have been dealing or not through 6 innings – how they fared in the 7th

Starters through 6 innings Avg runs allowed through 6 # of Games RA9 in the 7th inning
Dealing (FIP less than 3 through 6) 1.02 5,338 4.39
Not-dealing (FIP greater than 4) 2.72 3,058 5.03

The first thing that should jump out at you is while our pitchers who are not pitching well do indeed continue to pitch poorly, our dealing pitchers, based upon K, BB, and HR rate over the first 6 innings, are not exactly breaking the bank either in the 7th inning.

Let’s put some context into those numbers.

Pitchers who have been dealing or not through 6 innings – how they fared in the 7th

Starters through 6 innings True talent level based on season RA9 Expected RA9 in 7th RA9 in the 7th inning
Dealing (FIP less than 3 through 6) 4.25 4.50 4.39
Not-dealing (FIP greater than 4) 4.57 4.62 5.03

As you can see, our new dealing pitchers are much better pitchers. They normally allow 4.25 runs per game during the season. Yet they allow 4.39 runs in the 7th despite pitching very well through 6, irrespective of runs allowed (and of course they allow few runs too). In other words, we have eliminated those pitchers who allowed few runs but may have actually pitched badly or at least not as well as their meager runs allowed would suggest. All of these dealing pitchers had some combination of high K rates, and low BB and HR rates through 6 innings. But still, we see only around .1 runs per 9 in predictive value – not significantly different from zero or none.

On the other hand, pitchers who have genuinely been pitching badly, at least in terms of some combination of a low K rate and high BB and HR rates, do continue to pitch around .4 runs per 9 innings worse than we would expect given their true talent level and the TTOP.

There is one other thing that is driving some of the difference. Remember that in our last inquiry we found that pitch count was a factor in future performance. We found that while pitchers who only had 78 pitches through 6 innings pitched about as well as expected in the 7th, pitchers with an average of 97 pitches through 6 performed more than .2 runs worse than expected.

In our above 2 groups, the dealing pitchers averaged 84 pitches through 6 and the non-dealing 88, so we expect some bump in the 7th inning performance of the latter group because of a touch of fatigue, at least as compared to the dealing group.

So when we use a more granular approach to determining whether pitchers have been dealing through 6, there is not any evidence that it has much predictive value – the same thing we concluded when we looked at runs allowed only. These pitchers only pitches .11 runs per 9 better than expected.

On the other hand, if pitchers have been pitching poorly for 6 innings, as reflected in the components in which they exert the most control, K, BB, and HR rates, they do in fact pitch worse than expected, even after accounting for a slight elevation in pitch count as compared to the dealing pitchers. That decrease in performance is about .4 runs per 9.

I also want to take this time to state that based on this data and the data from my previous article, there is little evidence that managers are able to identify when pitchers should stay in the game or should be removed. We are only looking at pitchers who were chosen to continue pitching in the 7th inning by their managers and coaches. Yet, the performance of those pitchers is worse than their seasonal numbers, even for the dealing pitchers. If managers could identify those pitchers who were likely to pitch well, whether they had pitched well in prior innings or not, clearly we would see better numbers from them in the 7th inning. At best a dealing pitcher is able to mitigate his TTOP, and a non-dealing pitcher who is allowed to pitch the 7th pitches terribly, which does not bode well for the notion that managers know whom to pull and and whom to keep in the game.

For example, in the above charts, we see that dealing pitchers threw .14 runs per 9 worse than their seasonal average – which also happens to be exactly at league average levels. The non-dealing pitchers, who were also deemed fit to continue by their managers, pitched almost ½ run worse than their seasonal performance and more than .6 runs worse than the league average pitcher. Almost any reliever in the 7th inning would have been a better alternative than either the dealing or non-dealing pitchers. Once again, I have yet to see some concrete evidence that the ubiquitous cry from some of the sabermetric naysayers, “Managers know more about their players’ performance prospects than we do,” has any merit whatsoever.

Note: “Guy,” a frequent participant on The Book Blog, pointed out an error I have been making in calculating the expected RA9 for starters. I have been using their season RA9 as the baseline, and then adjusting for context. That is wrong. I must consider the RA9 of the first 6 innings and then subtract that from the seasonal RA9. For example if a group of pitchers has a RA9 for the season of 4.40 and they have a RA9 of 1.50 for the first 6 innings, if they average 150 IP for the season, our baseline adjusted expectation for the 7th inning, not considering any effects from pitch count, TTOP, manager’s decision to let them continue, etc., is 73.3 (number of runs allowed over 150 IP for the season) minus 1 run for 6 innings, or 72.3 runs over 144 innings, which is an expected RA9 of 4.52, .12 runs higher than the seasonal RA9 of 4.40.

The same goes for the starters who have gotten shelled through 6. Their adjusted expected RA9 for any other time frame, e.g., the 7th inning, is a little lower than 4.40 if 4.40 is their full-season RA9. How much lower depends on the average number of runs allowed in those 6 innings. If it is 4, then we have 73.3 – 4, or 69.3, divided by 144, times 9, or 4.33.

So I will adjust all my numbers to the tune of .14 runs up for dealing pitchers and .07 down for non-dealing pitchers. The exact adjustments might vary a little from these, depending on the average number of runs allowed over the first 6 innings in the various groups of pitchers I looked at.

Almost everyone, to a man, thinks that a manager’s decision as to whether to allow his starter to pitch in the 6th, 7th, or 8th (or later) innings of an important game hinges, at least in part, on whether said starter has been dealing or getting banged around thus far in the game.

Obviously there are many other variables that a manager can and does consider in making such a decision, including pitch count, times through the order (not high in a manager’s hierarchy of criteria, as analysts have been pointing out more and more lately), the quality and handedness of the upcoming hitters, and the state of the bullpen, both in term of quality and availability.

For the purposes of this article, we will put aside most of these other criteria. The two questions we are going to ask is this:

  • If a starter is dealing thus far, say, in the first 6 innings, and he is allowed to continue, how does he fare in the very next inning? Again, most people, including almost every baseball insider, (player, manager, coach, media commentator, etc.), will assume that he will continue to pitch well.
  • If a starter has not been dealing, or worse yet, he is achieving particularly poor results, these same folks will usually argue that it is time to take him out and replace him with a fresh arm from the pen. As with the starter who has been dealing, the presumption is that the pitcher’s bad performance over the first, say, 6 innings, is at least somewhat predictive of his performance in the next inning or two. Is that true as well?

Keep in mind that one thing we are not able to look at is how a poorly performing pitcher might perform if he were left in a game, even though he was removed. In other words, we can’t do the controlled experiment we would like – start a bunch of pitchers, track how they perform through 6 innings and then look at their performance through the next inning or two.

So, while we have to assume that, in some cases at least, when a pitcher is pitching poorly and his manager allows him to pitch a while longer, that said manager still had some confidence in the pitcher’s performance over the remaining innings, we also must assume that if most people’s instincts are right, the dealing pitchers through 6 innings will continue to pitch exceptionally well and the not-so dealing pitchers will continue to falter.

Let’s take a look at some basic numbers before we start to parse them and do some necessary adjustments. The data below is from the AL only, 2003-2013.

 

 Pitchers who have been dealing or not through 6 innings – how they fared in the 7th

Starters through 6 innings # of Games RA9 in the 7th inning
Dealing (0 or 1 run allowed through 6) 5,822 4.46
Not-dealing (3 or more runs allowed through 6) 2,960 4.48

First, let me explain what “RA9 in the 7th inning” means: It is the average number of runs allowed by the starter in the 7th inning extrapolated to 9 innings, i.e. runs per inning in the 7th multiplied by 9. Since the starter is often removed in the middle of the 7th inning whether has been dealing or not, I calculated his runs allowed in the entire inning by adding together his actual runs allowed while he was pitching plus the run expectancy of the average pitcher when he left the game, scaled to his talent level and adjusted for time through the order, based on the number of outs and base runners.
For example, let’s say that a starter who is normally 10% worse than a league average pitcher allowed 1 run in the 7th inning and then left with 2 outs and a runner on first base. He would be charged with allowing 1 plus (.231 * 1.1 * 1.08) runs or 1.274 runs in the 7th inning. The .231 is the average run expectancy for a runner on first base and 2 outs, the 1.1 multiplier is because he is 10% worse than a league average pitcher, and the 1.08 multiplier is because most batters in the 7th inning are appearing for the 3rd time (TTOP). When all the 7th inning runs are tallied, we can convert them into a runs per 9 innings or the RA9 you see in the chart above.

At first glance it appears that whether a starter has been dealing in prior innings or not has absolutely no bearing on how he is expected to pitch in the following inning, at least with respect to those pitchers who were allowed to remain in the game past the 6th inning. However, we have different pools of pitchers, batters, parks, etc., so the numbers will have to be parsed to make sure we are comparing apples to apples.

Let’s add some pertinent data to the above chart:

Starters through 6 RA9 in the 7th Seasonal RA9
Dealing 4.46 4.29
Not-dealing 4.48 4.46

As you can see, the starters who have been dealing are, not surprisingly, better pitchers. However, interestingly, we have a reverse hot and cold effect. The pitchers who have allowed only 1 run or less through 6 innings pitch worse than expected in the 7th inning, based on their season-long RA9. Many of you will know why – the times through the order penalty. If you have not read my two articles on the TTOP, and I suggest you do, each time through the order, a starting pitcher fares worse and worse, to the tune of about .33 runs per 9 innings each time he faces the entire lineup. In the 7th inning, the average TTO is 3.0, so we expect our good pitchers, the ones with the 4.29 RA9 during the season, to average around 4.76 RA9 in the 7th inning (the 3rd time though the order, a starter pitches about .33 runs per 9 worse than he pitches overall, and the seasonal adjustment – see the note above – adds another .14 runs). They actually pitch to the tune of 4.46 or .3 runs better than expected after considering the TTOP. What’s going on there?

Well, as it turns out, there are 3 contextual factors that depress a dealing starter’s results in the 7th inning that have nothing to do with his performance in the 6 previous innings:

  • The batters that a dealing pitcher is allowed to face are 5 points lower in wOBA than the average batter that each faces over the course of the season, after adjusting for handedness. This should not be surprising. If any starting pitcher is allowed to pitch the 7th inning, it is likely that the batters in that inning are slightly less formidable or more advantageous platoon-wise, than is normally the case. Those 5 points of wOBA translate to around .17 runs per 9 innings, reducing our expected RA9 to 4.59.
  • The parks in which we find dealing pitchers are not-surprisingly, slightly pitcher friendly, with an average PF of .995, further reducing our expectation of future performance by .02 runs per 9, further reducing our expectation to 4.57.
  • The temperature in which this performance occurs is also slightly more pitcher friendly by around a degree F, although this would have a de minimus effect on run scoring (it takes about a 10 degree difference in temperature to move run scoring by around .025 runs per game).

So our dealing starters pitch .11 runs per 9 innings better than expected, a small effect, but nothing to write home about, and well within the range of values that can be explained purely by chance.

What about the starters who were not dealing? They out-perform their seasonal RA9 plus the TTOP by around .3 runs per 9. The batters they face in the 7th inning are 6 points worse than the average league batter after adjusting for the platoon advantage, and the average park and ambient temperature tend to slightly favor the hitter. Adjusting their seasonal RA9 to account for the fact that they pitched poorly through 6 (see my note at the beginning of this article), we get an expectation of 4.51. So these starters fare almost exactly as expected (4.48 to 4.51) in the 7th inning, after adjusting for the batter pool, despite allowing 3 or more runs for the first 6 innings. Keep in mind that we are only dealing with data from around 9,000 BF. One standard deviation in “luck” is around 5 points of wOBA which translates to around .16 runs per 9.

It appears to be quite damning that starters who are allowed to continue after pitching 6 stellar or mediocre to poor innings pitch almost exactly as (poorly as) expected – their normal adjusted level plus .33 runs per 9 because of the TTOP – as if we had no idea how well or poorly they pitched in the prior 6 innings.

Score one for simply using a projection plus the TTOP to project how any pitcher is likely to pitch in the middle to late innings, regardless of how well or poorly they have pitched thus far in the game. Prior performance in the same game has almost no bearing on that performance. If anything, when a manager allows a dealing pitcher to continue pitching after 6 innings, when facing the lineup for the 3rd time on the average, he is riding that pitcher too long. And, more importantly, presumably he has failed to identify anything that the pitcher might be doing, velocity-wise, mechanics-wise, repertoire-wise, command-wise, results-wise, that would suggest that he is indeed on that day and will continue to pitch well for another inning or so.

In fact, whether pitchers have pitched very well or very poorly or anything in between for the first 6 innings of a game, managers and pitching coaches seem to have no ability to determine whether they are likely to pitch well if they remain in the game. The best predictor of 7th inning performance for any pitcher who is allowed to remain in the game, is his seasonal performance (or projection) plus a fixed times through the order penalty. The TTOP is approximately .33 runs per 9 innings for every pass through the order. Since the second time through the order is roughly equal to a pitcher’s overall performance, starting with the 3rd time through the lineup we expect that starter to pitch .33 runs worse than he does overall, again, regardless of how he has pitched thus far in the game. The 4th time TTO, we expect a .66 drop in performance. Pitchers rarely if ever get to throw to the order for the 5th time.

Fatigue and Pitch Counts

Let’s look at fatigue using pitch count as a proxy, and see if that has any effect on 7th inning performance for pitchers who allowed 3 or more runs through 6 innings. For example, if a pitcher has not pitched particularly well, should we allow him to continue if he has a low pitch count?

Pitch count and 7th inning performance for non-dealing pitchers:

Pitch count through 6 Expected RA9 Actual RA9
Less than 85 (avg=78) 4.56 4.70
Greater than 90 (avg=97) 4.66 4.97

 

Expected RA9 accounts for the pitchers’ adjusted seasonal RA9 plus the pool of batters faced in the 7th inning including platoon considerations, as well as park and weather. The latter 2 affect the numbers minimally. As you can see, pitchers who had relatively high pitch counts going into the 7th inning but were allowed to pitch for whatever reasons despite allowing at least 3 runs thus far, fared .3 runs worse than expected, even after adjusting for the TTOP. Pitchers with low pitch counts did only about .14 runs worse than expected, including the TTOP. Those 20 extra pitches appear to account for around .17 runs per 9, not a surprising result. Again, please keep in mind that we dealing with limited sample sizes, so these small differences are inferential suggestions and are not to be accepted with a high degree of certainty. They do point us in a certain direction, however, and one which comports with our prior expectation – at least my prior expectation.

What about if a pitcher has been dealing and he also has a low pitch count going into the 7th inning. Very few managers, if any, would remove a starter who allowed zero or 1 run through 6 innings and has only thrown 65 or 70 pitchers. That would be baseball blasphemy. Besides the affront to the pitcher (which may be a legitimate concern, but one which is beyond the scope of this article), the assumption by nearly everyone is that the pitcher will continue to pitch exceptionally well. After all, he is not at all tired and he has been dealing! Let’s see if that is true – that these starters continue to pitch well, better than expected based on their projections or seasonal performance plus the TTOP.

Pitch count and 7th inning performance for dealing pitchers:

Pitch count through 6 Expected RA9 Actual RA9
Less than 80 (avg=72) 4.75 4.50
Greater than 90 (avg=96) 4.39 4.44

Keep in mind that these pitchers normally allow 4.30 runs per 9 innings during the entire season (4.44 after doing the seasonal adjustment). The reason the expected RA9 is so much higher for pitchers with a low pitch count is primarily due to the TTOP. For pitchers with a high pitch count, the batters they face in the 7th are 10 points less in wOBA than league average, thus the 4.39 expected RA9, despite the usual .3 to .35 TTOP.

Similar to the non-dealing pitchers, fatigue appears to play a factor in a dealing pitcher’s performance in the 7th. However, in either case, low-pitch or high-pitch, their performance through the first 6 innings has little bearing on their 7th inning performance. With no fatigue they out-perform their expectation by .25 runs per 9. The fatigued pitchers under-performed their overall season-long adjusted talent plus the usual TTOP by .05 runs per 9.

Again, we see that there is little value to taking out a pitcher who has been getting a little knocked around or leaving in a pitcher who has been dealing for 6 straight innings. Both groups will continue to perform at around their expected full-season levels plus any applicable TTOP, with a slight increase in performance for a low-pitch count pitcher and a slight decrease for a high-pitch count pitcher. The biggest increase we see, .25 runs, is for pitchers who were dealing and had very low pitch counts.

What about if we increase our threshold to pitchers who allow 4 or more runs over 6 innings and those who are pitching a shutout?

Starters through 6 Seasonal RA9 Expected RA9 7th inning RA9
Dealing (shutouts only) 4.23 4.62 4.70
Not-dealing (4 or more runs) 4.62 4.81 4.87

Here, we see no predictive value in the first 6 innings of performance. In fact, for some reason starters pitching a shutout pitched slightly worse than expected in the 7th inning, after adjusting for the pool of batters faced and the TTOP.

How about the holy grail of starters who are expected to keep lighting it up in the 7th inning – starters pitching a shutout and with a low pitch count? These were true talent 4.25 pitchers facing better than average batters in the 7th, mostly for the third time in the game, so we expect a .3 bump or so for the TTOP. Our expected RA9 was 4.78 after making all the adjustments, and the actual was 4.61. Nothing much to speak of. Their dealing combined with a low pitch count had a very small predictive value in the 7th. Less than .2 runs per 9 innings.

Conclusion

As I have been preaching for what seems like forever – and the data are in accordance – however a pitcher is pitching through X innings in a game, at least as measured by runs allowed, even at the extremes, has very little relevance with regard to how he is expected to pitch in subsequent innings. The best marker for whether to pull a pitcher or not seems to be pitch count.

If you want to know the most likely result, or the mean expected result at any point in the game, you should mostly ignore prior performance in that game and use a credible projection plus a fixed times through the order penalty, which is around .33 runs per 9 the 3rd time through, and another .33 the 4th time through. Of course the batters faced, park, weather, etc. will further dictate the absolute performance of the pitcher in question.

Keep in mind that I have not looked at a more granular approach to determining whether a pitcher has been pitching extremely well or getting shelled, such as hits, walks, strikeouts, and the like. It is possible that such an approach might yield a subset of pitching performance that indeed has some predictive value within a game. For now, however, you should be pretty convinced that run prevention alone during a game has little predictive value in terms of subsequent innings. Certainly a lot less than what most fans, managers, and other baseball insiders think.

This is a follow up to my article on baseballprospectus.com about starting pitcher times through the order penalties (TTOP).

Several readers wondered whether pitchers who throw lots of fastballs (or one type of pitch) have a particularly large penalty as opposed to pitchers who throw more of a variety of pitches. The speculation was that it would be harder or take longer for a batter to acclimate himself to a pitcher who has lots of different pitches in his arsenal. As well, since most starters tend to throw more fastballs the first time through the order, those pitchers who follow that up with more off-speed pitches for the remainder of the game would have an advantage over those pitchers who continue to throw mostly fastballs.

First I split all the starters up into 3 groups: One, over 75% fastballs, two, under 50% fastballs, and three, all the rest. The data is from 2002-2012. I downloaded pitcher pitch type data from fangraphs.com. The results will amaze you.

FB %

N (Pitcher Seasons)

Overall

First Time

Second Time

Third Time

Fourth Time

Second Minus First

Third Minus Second

Fourth Minus Third

> 75%

159

.357

.341

.363

.376

.348

.027

.020

-.013

< 50%

359

.352

.346

.349

.360

.361

.003

.015

.010

All others

2632

.359

.346

.361

.370

.371

.015

.015

.013

Pitchers who throw mostly fastballs lose 35 points in wOBA against by the third time facing the lineup. Those with a much lower fastball frequency only lose 24 points. Interestingly, the former group reverts back to better than normal levels the fourth time (I don’t know why that is, but I’ll return to that issue later), but the latter group continues to suffer a penalty as do all the others. Keep in mind that the fourth time numbers are small samples for the first two groups, and that fourth time TBF are only around 15% of first time TBF (i.e., starters don’t often make it past the third time through the order) .

The takeaway here is that a starter’s pitch repertoire is extremely important in terms of how long he should be left in the game and whether he should start or relieve (we already knew the latter, right?). If we look at columns three and four, we can get an idea as to the difference between a pitcher as a starter and as a reliever, at least as far as times through the order is concerned (there are other considerations, such as velocity – e.g., when a pitcher is a short reliever, he can usually throw harder). The mostly fastball group is 16 points (around .5 runs per 9 innings) more effective the first time through the order than overall, while the low frequency fastball group only has a 6 point (.20 RA9) advantage. Keep in mind that some of that first time through the order advantage for all groups is due to the “first inning” effect (see my original article on BP).

Next I split the pitchers into four groups based on the number of pitches they threw at least 10% of the time. The categories of pitches (from the FG database) were fast balls, sliders, cutters, curve balls, change ups, splitters, and knuckle balls.

# Pitches in Repertoire (> 10%)

N (Pitcher Seasons)

Overall

First Time

Second Time

Third Time

Fourth Time

Second Minus First

Third Minus Second

Fourth Minus Third

1

41

.359

.344

.370

.375

.303

.027

.009

-.061

2

1000

.358

.343

.359

.371

.366

.016

.018

.007

3

1712

.361

.349

.362

.371

.372

.013

.015

.014

4

378

.351

.340

.351

.360

.368

.011

.013

.019

This is even more interesting. It appears that the fewer pitches you have in your repertoire, the more that batters become quickly familiar with you, we we might expect. One-pitch pitchers lose 36 points by the third time through the order, while four-pitch pitchers lose only 24 points. The fourth time through the order is exactly the opposite. Against one-pitch pitchers, pitchers gain 61 points (small sample size warning – 639 PA). Again, I have no idea why. Maybe fastball pitchers are able to ramp it up in the later innings, or maybe they start throwing more off-speed pitches. A pitch f/x analysis would shed some more light on this issue. Against the four-pitch pitchers, batters gain 19 points the fourth time around compared to the third. If we weight and combine the third and fourth times in order to increase our sample sizes, we get this:

# Pitches in Repertoire (> 10%)

N (Pitcher Seasons)

Overall

First Time

Second Time

Third and Fourth Times

Second Minus First

Third+ Minus Second

1

41

.359

.344

.370

.364

.027

-.001

2

1000

.358

.343

.359

.370

.016

.017

3

1712

.361

.349

.362

.371

.013

.015

4

378

.351

.340

.351

.361

.011

.015

Again, we see the largest, by far, second time penalty for the one-pitch pitchers (27 points), and a gradually decreasing penalty for two, three, and four-pitch pitchers (16, 13, and 11). Interestingly, they all have around the same penalty the third time and later, other than the one-pitch pitchers, who essentially retain their quality or even get a bit better, although this is driven by their large fourth time advantage, as you saw in the previous table.

It is not clear that you should take your one-pitch starters out early and leave in those who have multiple pitches in their weaponry. In fact, it may be the opposite. While the one-pitch pitchers would do well if they only face the order one time (and so would the two-pitch starters actually), once you allow them to stay in the game for the second go around, you might as well keep them in there as long as they are not fatigued, at least as compared to the multiple-pitch starters. Starters with more than one pitch appear to get 10-15 points worse each time through the order even though they don’t have the large penalty between the first and second time, as the one-pitch pitchers do. Remember, for the last two tables, a pitch is considered part of a starter’s repertoire if he throws it at least 10% of the time.

I’ll now split the pitchers into four groups again based on how many pitches they throw, but this time, the cutoff for a “pitch” will be 15% rather than 10%. The number of pitchers who throw four pitches at least 15% of the time each are too few for the their numbers to be meaningful, so I’ll throw them in with the three pitch pitchers. I’ll also combine the third and fourth times through the order again.

# Pitches in Repertoire (> 15%)

N (Pitcher Seasons)

Overall

First Time

Second Time

Third and Fourth Times

Second Minus First

Third+ Minus Second

1

447

.358

.342

.362

.364

.027

-.001

2

1954

.359

.346

.361

.370

.016

.017

3+

742

.355

.347

.352

.371

.013

.015

The three and four-pitch starters are better overall by three or four points of wOBA (.11 RA9). The first time through the order, however, the one-pitch starters are better by 5 points or so (.15 RA9). The second time around, the one-pitch pitchers fare the worst, but by the third and fourth times through the order, they are once again the best (by 6 or 7 points, or .22 RA9). It is difficult to say what the optimal use of these starters would look like. At the very least, these numbers should give a manager/team more information in terms of estimating a starter’s penalty at various points in the game, based on his pitch repertoire.

I’ll try one more thing: Two groups. The first group are pitchers who throw at least 80% of one type of pitch, excluding knuckleballers. These are truly one-pitch pitchers. The second group throw three (or more) pitches at least 20% of the time each. These are truly three-pitch pitchers. Let’s see the contrast.

# Pitches in Repertoire

N (Pitcher Seasons)

Overall

First Time

Second Time

Third and Fourth Times

Second Minus First

Third+ Minus Second

1 (> 80%)

47

.360

.343

.367

.370

.025

.009

3+ (> 20%)

104

.353

.350

.357

.357

.008

.009

It certainly looks like the 42 one-pitch pitchers (47 is the number of pitcher seasons) would be much better off as relievers, facing each batter only one time. They are not very good overall, and after only one go around, they are 25 points (.85 RA9) worse than the first time facing the lineup! The three-pitch pitchers suffer only a small (8 point) penalty after the first time through the order. Both groups actually suffer the same penalty from the second to the third (and more)  time through the order (9 points).

So who are these 42 pitchers who are ill-suited to being a starter? Perhaps they are swingmen or emergency starters. I looked at all pitchers who started at least one game – not just regular starters. Here is the complete list from 2002 to 2012. The numbers after the names are the number of TBF faced as starters and as relievers.

Mike Timlin 20, 352

Kevin Brown 206, 68

Ben Diggins 114, 0

Jarrod Wahburn 847, 0

Mike Crudale 9, 199

Grant Balfour 17, 94

Shane Loux 69, 69

Jimmy Anderson 180, 3

Kirk Reuter 620, 0

Jaret Wright 768, 0

Logan Kensing 55, 11

Tanyon Sturze 57, 277

Chris Young 156, 0

Nate Bump 33, 286

Bartolo Colon 2683, 49

Carlos Silva 876, 10

Aaron Cook 3337, 0

Cal Eldred 12, 141

Rick Bauer 21, 281

Mike Smith 18, 0

Shawn Estes 27, 0

Troy Percival 4, 146

Andrew Miller 306, 0

Luke Hochevar 12, 41

Luke Hudson 13, 0

Dana Eveland 15, 13

Denny Bautista 9, 38

Dennis Safarte 81, 274

Roberto Hernandez 548, 0

Mike Pelfrey 812, 0

Daniel Cabrera 881, 0

Frankie de la Cruz 15, 37

Mark Mulder 3, 9

Ty Taubenheim 27, 0

Brad Kilby 7, 58

Darren Oliver 17, 264

Justin Masterson 1794, 4

Luis Mendosa 60, 0

Ross Detwiler 627, 51

Cesar Ramos 11, 109

Josh Stinson 17, 21

Ross Detwiler 627, 51

Many of these pitchers barely had a cup of coffee in the majors. Others were emergency starters, swingmen, or they changed roles at some point in their careers. Others were simply mediocre or poor starting pitchers, like Kirk Reuter, Jarrod Washburn, Mike Pelfrey, Carlos Silva, and Daniel Cabrera, while others were good or even excellent starters, like Kevin Brown, Mark Mulder, and Bartolo Colon.

I think the lesson is clear. Unless a team has a compelling reason to make a one-pitch pitcher a starter (perhaps they are an extreme sinker-baller, like Brown, Cook, and Masterson), they should probably only relieve. If a team is going to use a swingman for an occasional start or a reliever for an emergency start, they would do well to use a two or three-pitch pitcher or limit him to one time through the order.

If we remove the swingmen and emergency starters as well as those pitchers who faced fewer than 50 batters in a season, we get this:

# Pitches in Repertoire

N (Pitcher Seasons)

Overall

First Time

Second Time

Third and Fourth Times

Second Minus First

Third+ Minus Second

1 (> 80%)

28

.353

.336

.364

.365

.028

.004

3+ (> 20%)

104

.353

.350

.357

.357

.008

.009

Even if we only look at regular starters with one primary pitch other than a knuckleball, we still see a huge penalty after the first time facing the order. In fact, the second time penalty (compared to the first) is worse than when we include the swingmen and emergency starters. Although these pitchers overall are as good as multiple-pitch starters, they still would have been much better off as short relievers.

Here is that updated list of starters once we remove the ones who rarely start. These guys as a whole should probably have been short relievers.

Cook

Miller

Colon

Diggins

Silva

Young

Cabrera

Wright

Washburn

Anderson

Masterson

Brown

Rueter

Kensing

Mendoza

Pelfrey

Hernandez

Detwiler

You might think that the one-pitch starters in the above list who are good or at least had one or two good seasons might not necessarily be good candidates for short relief. You would be wrong. These pitchers had huge second to first penalties and pitched much better the first time through the order than overall. Here is the same chart as before, but only including above-average starters for that season.

# Pitches in Repertoire

N (Pitcher Seasons)

Overall

First Time

Second Time

Third and Fourth Times

Second Minus First

Third+ Minus Second

1 (> 80%)

11

.328

.307

.332

.332

.039

-.013

3+ (> 20%)

35

.321

.318

.323

.323

.004

.003

Here are those pitchers who pitched very well overall, but were lights out the first time facing the lineup. Remember that these pitchers were above average in the season or seasons that they went into this bucket – they were not necessarily good pitchers throughout their careers or even in any other season.

Kevin Brown

Jarrod Washburn

Jaret Wright

Chris Young

Bartolo Colon

Carlos Silva

Justin Masterson

Ross Detwiler

Interestingly, the very good multiple-pitch pitchers had very small penalties each time through the order. These are probably the only kind of starters we want to go deep into games! Here is that list of starters.

Sonnanstine

B. Myers

Pavano

Sabathia

Billingsley

Carpenter

Hamels

Haren

F. Garcia

Iwakuma

Shields

J. Contreras

Beckett

Duchscherer

Gabbard

K. Rogers

Buehrle

M. Clement

Halladay

R. Hernandez

T. Hunter

Finally, in case you are  interested, here are the numbers for all of the one-pitch knuckleballers that I have been omitting in some of the tables thus far:

Knuckle Ballers Only

N

First Time

Second Time

Third+ Time

Second Minus First

Third+ Minus Second

20 .321 .354  .345 .034 -.006

Where are all the knuckle ball relievers? Although we don’t have tremendous sample sizes here (3024 second time TBF), so we have to take the numbers with a grain of salt, it looks like they are brilliant the first time through the order but once a batter has seen a knuckleballer one time, he does pretty well against him thereafter (although we do see a 6 point rebound the third time and later through the order).

I think that more research, especially using the pitch f/x data, is needed. However, I think that teams can use the information above to make more informed decisions about what roles pitchers should occupy and when to take out a starter during a game.