Richard Nichols (@RNicholsLV on Twitter) sent me this link. These are notes that the author, Lee Judge, a Royals blogger for the K.C. Star, took during the season. They reflect thoughts and comments from players, coaches, etc. I thought I’d briefly comment on each one. Hope you enjoy!

Random, but interesting, things about baseball – Lee Judge

▪ If a pitcher does not have a history of doubling up on pickoff throws (two in a row) take a big lead, draw a throw and then steal on the next pitch.

Of course you can do that. But how many times can you get away with it? Once? If the pitcher or one of his teammates or coaches notices it, he’ll pick you off the next time by “doubling up.” Basically by exploiting the pitcher’s non-random and thus exploitable strategy, the runner becomes exploitable himself. A pitcher, of course, should be picking a certain percentage of the time each time he goes into the set position, based on the likelihood of the runner stealing and the value of the steal attempt. That “percentage” must be randomized by the pitcher and it “resets” each time he throws a pitch or attempts a pickoff.

By “randomize” I mean the prior action, pick or no pick, cannot affect the percentage chance of a pick. If a pitcher is supposed to pick 50% prior to the next pitch he must do so whether he’s just attempted a pickoff 0, 1, 2, or 10 times in a row. The runner can’t know that a pickoff is more or less likely based on how many picks were just attempted. In fact you can tell him, “Hey every time I come set, there’s a 50% (or 20%, or whatever) chance I will attempt to pick you off,” and there’s nothing he can do to exploit that information.

For example, if he decides that he must throw over 50% of the time he comes set (in reality the optimal % changes with the count), then he flips a mental coin (or uses something – unknown to the other team – to randomize his decision, with a .5 mean). What will happen on the average is that he won’t pick half the time, 25% of the time he’ll pick once only, 12.5% of the time he’ll pick exactly twice, 25% of the time he’ll pick at least twice, etc.

Now, the tidbit from the player or coach says, “does not have a history of doubling up.” I’m not sure what that means. Surely most pitchers when they do pick, will pick once sometimes and twice sometimes, etc. Do any pitchers really never pick more than once per pitch? If they do, I would guess that it’s because the runner is not really a threat and the one-time pick is really a pick with a low percentage. If a runner is not much of a threat to run, then maybe the correct pick percentage is 10%. If that’s the case, then they will not double-up 99% of the time and correctly so. That cannot be exploited, again, assuming that a 10% rate is optimal for that runner in that situation. So while it may look like they never double up, they do in fact double up 1% of the time, which is correct and cannot be exploited (assuming the 10% is correct for that runner and in that situation).

Basically what I’m saying is that this person’s comment is way to simple and doesn’t really mean anything without putting it into context as I explain above.

▪ Foul balls with two strikes can indicate a lack of swing-and-miss stuff; the pitcher can get the batters to two strikes, but then can’t finish them off.

Not much to say here. Some pitchers have swing-and-miss stuff and others don’t, and everything in-between. You can find that out by looking at…uh…their swing-and-miss percentages (presuming a large enough sample size to give you some minimum level of certainty). Foul balls with two strikes? That’s just silly. A pitcher without swing-and-miss stuff will get more foul balls and balls in play with two strikes. That’s a tautology. He’ll also get more foul balls and balls in play with no strikes, one strike, etc.

▪ Royals third-base coach Mike Jirschele will walk around the outfield every once in a while just to remind himself how far it is to home plate and what a great throw it takes to nail a runner trying to score.

If my coach has to do that I’m not sure I want him coaching for me. That being said, whatever little quirks he has or needs to send or hold runners the correct percentage of time is fine by me. I don’t know that I would be teaching or recommending that to my coaches – again, not that there’s anything necessarily wrong with it.

Bottom line is that he better know the minimum percentages that runners need to be safe in any given situation (mostly # of outs) – i.e. the break-even points – and apply them correctly to the situation (arm strength and accuracy etc.) in order to make optimal decisions. I would surely be going over those numbers with my coaches from time to time and then evaluating his sends and holds to make sure he’s not making systematic errors or too many errors in general.

▪ For the most part, the cutter is considered a weak contact pitch; the slider is considered a swing-and-miss pitch.

If that’s confirmed by pitch f/x, fine. If it’s not, then I guess it’s not true. Swing-and-miss is really just a subset of weak contact and weak contact is a subset of contact which is a subset of a swing. The result of a swing depends on the naked quality of the pitch, where it is thrown, and the count. So while for the most part (however you want to define that – words are important!) it may be true, surely it depends on the quality of each of the pitches, on what counts they tend to be thrown, how often they are thrown at those counts, and the location they are thrown to. Pitches away from the heart of the plate tend to be balls and swing-and-miss pitches. Pitches nearer the heart tend to be contacted more often, everything else being equal.

▪ With the game on the line and behind in the count, walk the big-money guys; put your ego aside and make someone else beat you.

Stupid. Just. Plain. Stupid. Probably the dumbest thing a pitcher or manager can think/do in a game. I don’t even know what it means and neither do they. So tie game in the 9th, no one on base, 0 outs, count is 1-0. Walk the batter? That’s what he said! I can think of a hundred stupid examples like that. A pitcher’s approach changes with every batter and every score, inning, outs, runners, etc. A blanket statement like that, even as a rule of thumb, is Just. Plain. Dumb. Any interpretation of that by players and coaches can only lead to sub-optimal decisions – and does. All the time. Did I say that one is stupid?

▪ A pitcher should not let a hitter know what he’s thinking; if he hits a batter accidentally he shouldn’t pat his chest to say “my bad.” Make the hitter think you might have drilled him intentionally and that you just might do it again.

O.K. To each his own.

▪ Opposition teams are definitely trying to get into Yordano Ventura’s head by stepping out and jawing with him; anything to make him lose focus.

If he says so. I doubt much of that goes on in baseball. Not that kind of game. Some, but not much.

▪ In the big leagues, the runner decides when he’s going first-to-third; he might need a coach’s help on a ball to right field — it’s behind him — but if the play’s in front of him, the runner makes the decision.

Right, we teach that in Little League (a good manager that is). You teach your players that they are responsible for all base running decisions until they get to third. Then it’s up to the third base coach. It’s true that the third base coach can and should help the runner on a ball hit to RF, but ultimately the decision is on the runner whether to try and take third.

Speaking of taking third, while the old adage “don’t make the first or third out at third base” is a good rule of thumb, players should know that it doesn’t mean, “Never take a risk on trying to advance to third.” It means the risk has to be low (like 10-20%), but that the risk can be twice as high with 0 outs as with 2 outs. So really, the adage should be, “Never make the third out at third base, but you can sometimes make the first out at third base.”

You can also just forget about the first out part of that adage. Really, the two-out break-even point is almost exactly in between the first-out and one-out one. In other words, with no outs, you need to be safe at third around 80% of the time, with one out, around 70%, and with two outs around 90%. Players should be taught that and not just the “rule of thumb.” They should also be taught that the numbers change with trailing runners, the pitcher, and who the next batter or batters are. For example, with a trailing runner, making the third out is really bad but making the first out where the trailing runner can advance is a bonus.

▪ Even in a blowout there’s something to play for; if you come close enough to make the other team use their closer, maybe he won’t be available the next night.

I’m pretty sure the evidence suggests that players play at their best (more or less) regardless of the score. That makes sense under almost any economic or cognitive theory of behavior since players get paid big money to have big numbers. Maybe they do partially because managers and coaches encourage them to do so with tidbits like that. I don’t know.

Depending on what they mean by blowout, what they’re saying is that, say you have a 5% chance of winning a game down six runs in the late innings. Now say you have a 20% chance of making it a 3-run or less game, and that means that the opponent closer comes into the game. And say that him coming into the game gives you another 2% chance of winning tomorrow because he might not be available, and an extra 1% the day after that (if it’s the first game in a series). So rather than a 5% win expectancy, you actually have a 5% plus 20% * 3% or, 5.6% WE. Is that worth extra effort? To be honest, a manager and coach is supposed to teach his players to play hard (within reason) regardless of the score for two reasons: One, because it makes for better habits when the game is close and two, at exactly what point is the game a blowout (Google the sorites paradox)?

▪ If it’s 0-2, 1-2 and 2-2, those are curveball counts and good counts to run on. That’s why pitchers often try pickoffs in those counts.

On the other hand, 0-2 is not a good count to run on because of the threat of the pitchout. As it turns out, the majority of SB attempts (around 68%) occur at neutral counts. Only around 16% of all steal attempts occur at those pitchers’ counts. So whoever said that is completely wrong.

Of course pitchers should (and do) attempt more pickoffs the greater the chance of a steal attempt. That also tends to make it harder to steal (hence the game theory aspect).

That being said, some smart people (e.g., Professor Ted Turocy of Chadwick Baseball Bureau) believe that there is a Nash equilibrium between the offense and defense with respect to base stealing (for most players – not at the extremes) such that neither side can exploit the other by changing their strategy. I don’t know if it’s true or not. I think Professor Turocy may have a paper on this. You can check it out on the web or contact him.

▪ Don’t worry about anyone’s batting average until they have 100 at-bats.

How about “Don’t worry about batting average…period.” In so many ways this is wrong. I would have to immediately fire whoever said that if it was a coach, manager or executive.

▪ It’s hard to beat a team three times in a row; teams change starting pitchers every night and catching three different pitchers having a down night is not the norm.

Whoever said this should be fired sooner than the one above. As in, before they even finished that colossally innumerate sentence.

▪ At this level, “see-it-and-hit” will only take you so far. The best pitchers are throwing so hard you have to study the scouting reports and have some idea of what’s coming next.

If that’s your approach at any level you have a lot to learn. That goes for 20 or 50 years ago the same as it does today. If pitchers were throwing maybe 60 mph not so much I guess. But even at 85 you definitely need to know what you’re likely to get at any count and in any situation from that specific pitcher. Batters who tell you that they are “see-it-and-hit-it” batters are lying to you or to themselves. There is no such thing in professional baseball. Even the most unsophisticated batter in the world knows that at 3-0, no outs, no runners on, his team is down 6 runs, he’s likely to be getting 100% fastballs.

▪ If a pitcher throws a fastball in a 1-1 count, nine out of 10 times, guess fastball. But if it’s that 10th time and he throws a slider instead, you’re going to look silly.

WTF? If you go home expecting your house to be empty but there are two giraffes and a midget, you’re going to be surprised.

▪ Good hitters lock in on a certain pitch, look for it and won’t come off it. You can make a guy look bad until he gets the pitch he was looking for and then he probably won’t miss it.

Probably have to fire this guy too. That’s complete bullshit. Makes no sense from a game-theory perspective or from any perspective for that matter. So just never throw him that pitch right? Then he can’t be a good hitter. But now if you never throw him the pitch he’s looking for, he’ll stop looking for it, and will instead look for the alternative pitch you are throwing him. So you’ll stop throwing him that pitch and then…. Managers and hitting coaches (and players) really (really) need a primer on game theory. I am available for the right price.

▪ According to hitting coach Dale Sveum, hitters should not give pitchers too much credit; wait for a mistake and if the pitcher makes a great pitch, take it. Don’t start chasing great pitches; stick to the plan and keep waiting for that mistake.

Now why didn’t I think of that!

▪ The Royals are not a great off-speed hitting club, so opposition pitchers want to spin it up there.

Same as above. Actually, remember this: You cannot tell how good or bad a player or team is at hitting any particular pitch by looking at the results. You can only tell by how often they get each type of pitch. Game theory tells us that the results of all the different pitches (type, location, etc.) will be about the same to any hitter. What changes depending on that hitter’s strengths and weaknesses are the frequencies. And this whole, “Team is good/bad at X” is silly. It’s about the individual players of course. I’m pretty sure there was at least one hitter on the team who is good at hitting off-speed.

Also, never evaluate or define “good hitting” based on batting average which most coaches and managers do even in 2016. I don’t have to tell you, dear sophisticated reader, that. However, you should also not define good or bad hitting on a pitch level based on OPS or wOBA (presumably on contact) either. You need to include pitches not put into play and you need to incorporate count. For example, at a 3-ball count there is a huge premium on not swinging at a ball. Your result on contact is not so important. At 2-strike counts, not taking a strike is also especially important. Whenever you see pitch level numbers without including balls not swung at, or especially only on balls put into play (which is usually the case), be very wary of those numbers. For example, a good off-speed hitting player will tend to have good strike zone recognition (and not necessarily good results on contact) skills because many more off-speed pitches are thrown in pitchers’ counts and out of the strike zone.

▪ According to catcher Kurt Suzuki, opposition pitchers should not try to strike out the Royals. Kansas City hitters make contact and a pitcher that’s going for punchouts might throw 100 pitches in five innings.

Wait. If they are a good contact team, doesn’t that mean that you can try and strike them out without running up your pitch count? Another dumb statement. Someone should tell Mr. Suzuki that pitch framing is really important.

▪ If you pitch down in the zone you can use the whole plate; any pitch at the knees is a pretty good pitch (a possible exception is down-and-in to lefties). If you pitch up in the zone you have to hit corners.

To some extent that’s true though it’s (a lot) more complicated than that. What’s probably more important is that when pitching down in the zone you want to pitch more away and when pitching up in the zone more inside. By the way, is it true lefties like (hit better) the down-and-in pitch more than righties? No, it is not. Where does that pervasive myth come from? Where do all the hundreds of myths that players, fans, coaches, managers, and pundits think are true come from?

▪ If you pitch up, you have to be above the swing path.

Not really sure what that means? Above the swing “path?” Swing path tends to follow the pitch so that doesn’t make too much sense. “Path” implies angle of attack and to say “above” or “below” an angle of attach doesn’t really make sense. Maybe he means, “If you are going to pitch high, pitch really high?” Or, “If the batter tends to be a high ball hitter, pitch really high?”

▪ Numbers without context might be meaningless; or worse — misleading

I don’t know what that means. Anything might be misleading or worthless without context. Words, numbers, apple pie, dogs, cats…

▪ All walks are not equal: a walk at the beginning of an inning is worth more than a walk with two outs, a walk to Jarrod Dyson is worth more than a walk to Billy Butler.

Correct. I might give this guy one of the other guys’ (that I fired) jobs. Players, especially pitchers (but batters and fielders too), should always know the relative value of the various offensive events depending on the batter, pitcher, score, inning, count, runners, etc., and then tailor their approach to those values. This is one of the most important things in baseball.

▪ So when you look at a pitcher’s walks, ask yourself who he walked and when he walked them.

True. Walks should be weighed towards bases open, 2 outs, sluggers, close games, etc. If not, and the sample is large, then the pitcher is likely either doing something wrong or he has terrible command/control or both. For example, Greg Maddux went something like 10 years before he walked his first pitcher.

▪ When a pitcher falls behind 2-0 or 3-1, what pitch does he throw to get back in the count? Can he throw a 2-0 cutter, sinker or slider, or does he have to throw a fastball down the middle and hope for the best?

All batters, especially in this era of big data, should be acutely aware of a pitcher’s tendencies against their type of batter in any given situation and count. One of the most important ones is, “Does he have enough command of his secondary pitches (and how good is his fastball even when the batter knows it’s coming) to throw them in hitter’s counts, especially the 3-2 count?”

▪ Hitters who waggle the bat head have inconsistent swing paths.

I never heard that before. Doubt it is anything useful.

▪ The more violent the swing, the worse the pitch recognition. So if a guy really cuts it loose when he swings and allows his head to move, throw breaking stuff and change-ups. If he keeps his head still, be careful.

Honestly, if that’s all you know about a batter, someone is not doing their homework. And again, there’s game theory that must be accounted for and appreciated. Players, coaches and managers are just terrible at understanding this very important part of baseball especially the batter/pitcher matchup. If you think you can tell a pitcher to throw a certain type of pitch in a certain situation (like if the batter swings violently throw him off-speed), then surely the batter can and will know that too. If he does, which he surely will – eventually – then he basically knows what’s coming and the pitcher will get creamed!

Advertisements

There’s been much research and many articles over the years with respect to hitter (and other) aging curves. (I even came across in a Google search a fascinating component aging curve for PGA golfers!) I’ve publicly and privately been doing aging curves for 20 years. So has Tango Tiger. Jeff Zimmerman has also been prolific in this regard. Others have contributed as well. You can Google them if you want.

Most of the credible aging curves use some form of the delta method which is described in this excellent series on aging by the crafty n’er do well, MGL. If you’re too lazy to look it up, the delta method basically is this, from the article:

The “delta method” looks at all players who have played in back-to-back years. Many players have several back-to-back year “couplets,” obviously. For every player, it takes the difference between their rate of performance in Year I and Year II and puts that difference into a “bucket,” which is defined by the age of the player in those two years….

When we tally all the differences in each bucket and divide by the number of players, we get the average change from one age to the next for every player who ever played in at least one pair of back-to-back seasons in. So, for example, for all players who played in their age 29 and 30 seasons, we get the simple average of the rate of change in offensive performance between 29 and 30.

That’s really the only way to do an aging curve, as far as I know, unless you want to use an opaque statistical method like J.C Bradbury did back in 2009 (you can look that up too). One of the problems with aging curves, which I also discuss in the aforementioned article, and one that comes up a lot in baseball research, is survivorship bias. I’ll get to that in a paragraph or two.

Let’s say we want to use the delta method to compute the average change in wOBA performance from age 29 to 30. To do that, we look at all players who played in their age 29 and age 30 years, record each player’s difference, weight it by some number of PA (maybe the lesser of the two – either year 1 or year 2, maybe the harmonic mean of the two, or maybe weight them all equally – it’s hard to say), and then take the simple weighted average of all the differences. For example, say we have two players. Player A has a .300 wOBA in his age 29 season in 100 PA and a .290 wOBA in his age 30 season in 150 PA. Player B is .320 in year one in 200 PA and .300 in year two in 300 PA. Using the delta method we get a difference of -.010 (a decline) for player A weighted by, say, 100 PA (the lesser of 100 and 150), and a difference of -.020 for Player B in 200 PA (also the lesser of the two PA). So we have an average decline in our sample of (10 * 100 + 20 * 200) / (300), or 16.67 points of wOBA decline. We would do the same for all age intervals and all players and if we chain them together we get an aging curve for the average MLB player.

There are issues with that calculation, such as our choice to weight each player’s difference by the “lesser of the two PA,” what it means to compute “an average decline” for that age interval (since it includes all types of players, part-time, full-time, etc.) and especially what it means when we chain every age interval together to come up with an aging curve for the average major league player when it’s really a compendium of a whole bunch of players all with different career lengths at different age intervals.

Typically when we construct an aging curve, we’re not at all looking at the careers of any individual players. If we do that, we end up with severe selective sampling and survivorship problems. I’m going to ignore all of these issues and focus on survivorship bias only. It has the potential to be extremely problematic, even when using the delta method.

Let’s say that a player is becoming a marginal player for whatever reason, perhaps it is because he is at the end of his career. Let’s also say that we have a bunch of players like that and their true talent is a wOBA of .280. If we give them 300 PA, half will randomly perform better than that and half will randomly perform worse than that simply because 300 PA is just a random sample of their talent. In fact, we know that the random standard deviation of wOBA in 300 trials is around 25 points in wOBA, such that 5% of our players, whom we know have a true talent of .280, will actually hit .230 or less by chance alone. That’s a fact. There’s nothing they or anyone else can do about it. No player has an “ability” to fluctuate less than random variance tells is in any specific number of PA. There might be something about them that creates more variance on the average, but it is mathematically impossible to have less (actually the floor is a bit higher than that because of varying opponents and conditions).

Let’s assume that all players who hit less than .230 will retire or be cut – they’ll never play again, at least not in the following season. That is not unlike what happens in real life when a marginal player has a bad season. He almost always gets fewer PA the following season than he would have gotten had he not had an unlucky season. In fact, not playing at all is just a subset of playing less – both are examples of survivorship bias and create problems with aging curves. Let’s see what happens to our aging interval with these marginal players when 5% of them don’t play the next season.

We know that this entire group of players are .280 hitters because we said so. If 5% of them hit, on average, .210, then the other 95% must have hit .284 since the whole group must hit .280 – that’s their true talent. This is just a typical season for a bunch of .280 hitters. Nothing special going on here. We could have split them up any way we wanted, as long as in the aggregate they hit at their true talent level.

Now let’s say that these hitters are in their age 30 season and they are supposed to decline by 10 points in their age 31 season. If we do an aging calculation on these players in a typical pair of seasons we absolutely should see .280 in the first year and .270 in the second. In fact, if we let all our players play a random or a fixed number of PA in season two, that is exactly what we would see. It has to be. It is a mathematical certainty, given everything we stated. However survivorship bias screws up our numbers and results in an incorrect aging value from age 30 to age 31. Let’s try it.

Only 95% of our players play in season two, so 5% drop out of our sample, at least from age 30 to age 31. There’s nothing we can do about that. When we compute a traditional aging curve using the delta method, we only use numbers from pairs of years. We can never use the last year of a player’s career as the first year in a year pairing. We don’t have any information about that player’s next season. We can use a player’s last year, say, at age 30 in an age 29 to 30 pairing but not in a 30 to 31 pairing. Remember that the delta method always uses age pairings for each player in the sample.

What do those 95% hit in season one? Remember they are true .280 hitters. Well, they don’t hit .280. I already said that they hit .284. That is because they got a little lucky. The ones that got really unlucky to balance out the lucky ones, are not playing in season two, and thus dropped out of our aging curve sample. What do these true .280 players (who hit .284) hit in season two? Season two is an unbiased sample of their true talent. We know that their true talent was .280 in season one and we know that from age 30 to age 31 all players will lose 10 points in true talent because we said so. So they will naturally hit .270 in year two.

What does our delta method calculation tell us about how players age from age 30 to age 31? It tells us they lose 14 points in wOBA and not 10! It’s giving us a wrong answer because of survivorship bias. Had those other 5% of players played, they would have also hit .270 in year two and when we add everyone up, including the unlucky players, we would come up with the correct answer of a 10-point loss from age 30 to age 31 (the unlucky players would have improved in year two by 60 points).

One way to avoid this problem (survivorship bias will always make it look like players lose more or gain less as they age because the players that drop out from season to season always, on the average, got unlucky in season one) is to ignore the last season of a player’s career in our calculations. That’s fine and dandy, but survivorship bias exists in every year of a player’s career. As I wrote earlier, dropping out is just a small subset of this bias. Every player that gets unlucky in one season will see fewer PA in his next season, which creates the same kind of erroneous results. For example, if the 5% of unlucky players did play in season two, but only got 50 PA whereas the other 95% of slightly lucky players got 500 PA, we would still come up with a decline of more than 10 points of wOBA – again an incorrect answer.

To correct for this survivorship bias, which really wreaks havoc with aging curves, a number of years ago, I decided to add a phantom year for players after their last season of action. For that year, I used a projection – our best estimate of what they would have done had they been allowed to play another year. That reduced the survivorship bias but it didn’t nearly eliminate it because, as I said, every player suffers from it in reduced PA for unlucky players and increased PA for lucky ones, in their subsequent seasons.

Not only that, but we get the same effect within years. If two players have .300 wOBA true talents, but player A hits worse than .250 by luck alone in his first month (which will happen more than 16% of the time) and player B hits .350 or more, who do you think will get more playing time for the remainder of the season even though we know that they have the same talent, and that both, on the average, will hit exactly .300 for the remainder of the season?

I finally came up with a comprehensive solution based on the following thought process: If we were conducting an experiment, how would we approach the question of computing aging intervals? We would record every player’s season one (which would be an unbiased sample of his talent, so no problem so far) and then we would guarantee that every player would get X number of PA the next season, preferably something like 500 or 600 to create large samples of seasonal data. We would also give everyone a large number of PA in all season ones too, but it’s not really necessary.

How do we do that? We merely extend season two data using projections, just as I did in adding phantom seasons after a player’s career was over (or he missed a season in the middle of his career). Basically I’m doing the same thing, whether I’m adding 600 PA to a player who didn’t play (the phantom season) or I’m adding 300 PA to a player who only had 300 PA in season two. By doing this I am completely eliminating survivorship bias. Of course this correction method lives or dies with how accurate the projections are but even a simple projection system like Marcel will suffice when dealing with a large number of players of different talent levels. Now let’s get to the results.

I looked at all players from 1977 to 2016 and I park and league adjusted their wOBA for each season. Essentially I am using wOBA+. I also only looked at seasonal pairs (with a minimum of 10 PA in each season) where the player played on the same team. I didn’t have to do that, but my sample was large enough that I felt that the reduction in sample size was worth getting rid of any park biases even though I was dealing with park- adjusted numbers.

Using the delta method with no survivorship bias other than ignoring the last year of every player’s career, this is the aging curve I arrived at after chaining all of the deltas. This is the typical curve you will see in most of the prior research.

1977-2016 Aging Curve using Delta Method Without Correcting for Survivorship Bias

curve1

 

Here is the same curve after completing all season two’s with projections. For example, let’s say that a player is projected to hit .300 in his age 30 season and he hits .250 in only 150 PA (his manager benches him because he’s hit so poorly). His in-season projection would change because of the .250. It might now be .290. So I complete a 600 PA season by adding 450 PA of .290 hitting to the 150 PA of .250 hitting for a complete season of .280 in 600 PA.

If that same player hits .320 in season two in 620 PA then I add nothing to his season two data. Only players with less than 600 PA have their seasons completed with projections. How do I weight the season pairs? Without any completion correction, as in the first curve above, I weighted each season pair by the harmonic mean of the two PA. With correction, as in the second curve above, I weighted each pair by the number of PA in season one. This corrects for intra-season survivorship bias in season one as well.

1977-2016 Aging Curve using Delta Method and Correcting for Survivorship Bias

curve2

 

You can see that in the first curve, uncorrected for survivorship bias, players gain around 40 points in wOBA from age 21 to age 27, seven points per year, plateau from age 27 to 28, then decline by also around seven points a year after that. In the second curve, after we correct for survivorship bias, we have a slightly quicker ascension from age 21 to 26, more than eight points per year, a plateau from age 26 to age 27, then a much slower decline at around 3 points per year.

Keep in mind that these curves represent all players from 1977 to 2016. It is likely that aging has changed significantly from era to era due to medical advances, PED use and the like. In fact, if we limit our data to 2003 and later, after the so called steroid era, we get an uncorrected curve that plateaus between ages 24-28 and then declines by an average of 9 points a year from age 28 to 41.

In my next installment I’ll do some survivorship corrections for components like strikeout and walk percentage.

Now that Adam Eaton has been traded from the White Sox to the Nationals much has been written about his somewhat unusual “splits” in his outfield defense as measured by UZR and DRS, two of the more popular batted-ball defensive metrics. In RF, his career UZR per 150 games is around +20 runs and in CF, -8 runs. He has around 100 career games in RF and 300 in CF. These numbers do not include “arm runs” as I’m going to focus only on range and errors in this essay. If you are not familiar with UZR or DRS you can do some research on the net or just assume that they are useful metrics for quantifying defensive performance and for projecting defense.

In 2016 Eaton was around -13 in CF and +20 in RF. DRS was similar but with a narrower (but still unusual) spread. We expect that a player who plays at both CF and the corners in a season or within a career will have a spread of around 5 or 6 runs between CF and the corners (more between CF and RF than between CF and LF). For example, a CF’er who has a UZR of zero and thus is exactly average among all CF’ers, will have a UZR at around +5.5 at the corners, again a bit more in RF than LF (LF’ers are better fielders than RF’ers).

This has nothing to do with how “difficult” each position is (that is hard to define anyway – you could even make the argument that the corner positions are “harder” than CF), as UZR and DRS are calculated as runs above or below the average fielder at that position. It merely means that the average CF’er is a better fielder than the average corner OF’er by around 5 or 6 runs. Mostly they are faster. The reason teams put their better fielder in CF is not because it is an inherently more “difficult” position but because it gets around twice the number of opportunities per game than the corner positions such that you can leverage talent in the OF.

Back to Eaton. He appears to have performed much better in RF than we would expect given his performance in CF (or vice versa) or even overall. Does this mean that he is better suited to RF (and perhaps LF, where he hasn’t played much in his career) or that the big, unusual gap we see is just a random fluctuation, or somewhere in the middle as is often (usually) the case? Should the Nationals make every effort to play him in RF and not CF? After all, their current RF’er, Harper, has unusual splits too, but in the opposite direction – his career CF UZR is better than his career RF UZR! Or perhaps the value they’re getting from Eaton is diminished if they’re going to play him in CF rather than RF.

How could it be that a fielder could have such unusual defensive splits and it be solely or mostly due to chance only? The same reason a hitter can have unusual but random platoon splits or a pitcher can have unusual but random home/road or day/night splits. A metric like UZR or DRS, like almost all metrics, contains a large element of chance, or noise if you will. That noise comes from two sources – one is because the data and methodology are far from perfect and two is that actual defensive performance can fluctuate randomly (or for reasons we are just not aware of) from one time period to another – from play to play, game to game, or position to position, for various reasons or for no reason at all.

To the first point, just because our metric “says” that a player was +10 in UZR that does not necessarily mean that he performed exactly that well. In reality, he might have performed at a +15 level or he might have performed at a 0 or even a -10 level. It’s more likely of course that he performed at +5 than +20 or 0, but because of the limits of our data and methodology, the +15 is an estimate of his performance. To the second point, actual fielding performance, even if we could measure it precisely, like hitting and pitching, is subject to random fluctuations for reasons known (or at least speculated) and unknown to us. On one play a player can get a great jump and make a spectacular play and on another that same player can take a bad route, get a bad jump, the ball can pop out of his glove, etc. Some days fielders probably feel better than others. Etc.

So whenever we compare one time period to another or one position to another, even ones which require similar, perhaps even identical, skills, like in the OF, it is possible, even likely, that we are going to get different results by chance alone, or at least because of the two dynamics I explained above (don’t get hung up on the words “luck”, “chance” or “random”). Statistics tell us that those random differences will be more and more unlikely the further away we get from what is expected (e.g., we expect that play in CF will be 5 or 6 runs “worse” than play in RF or LF), however, statistics also tells us that any difference, even large ones like we see with Eaton (or more), can and do occur by chance alone.

At the same time, it is possible, maybe even likely, that a player could somehow be more suited to RF (or LF) than CF, or vice versa. So how do we determine how much of an unusual “split” in OF defense, for example, is likely chance and how much is likely “skill?” In other words, what would we expect future defense to be in RF and in CF for a player with unusual RF/CF splits? Remember that future performance always equates to an estimate of talent, more or less. For example, if we find strong evidence that almost all of these unusual splits are due to chance alone (virtually no skill), then we must assume that in the future the player with the unusual splits will revert to normal splits in any future time frame. In the case of Eaton that would mean that we would construct an OF projection based on all of his OF play, adjusted for position, and then do the normal adjustment for our CF or RF projection, such that his RF projection will be around 7 runs greater than his CF projection rather than the 20 run or more gap that we see in his past performance.

To examine this question, I looked at all players who played at least 20 games in CF and RF or LF from 2003 through 2015. I isolated those with various unusual splits. I also looked at all players to establish a baseline. At the same time, I crafted a basic one-season Marcel-like projection from that CF and corner performance combined. The way I did that was to adjust the corners to represent CF by subtracting 4 runs from LF UZR and 7 runs from RF UZR. Then I regressed that number based on the number of total games in that one season, added in an aging factor (-.5 runs for players under 27 and -1.5 runs for players 27 and older), and the resulting number was a projection for CF.

We can then take that number and add 4 runs for a LF projection and 7 runs for a RF projection. Remember these are range and errors only (no arm). So, for example, if a player was -10 in CF per 150 in 50 games and +3 in RF in 50 games, his projection would be:

Subtract 7 runs from his RF UZR to convert into “CF UZR”, so it’s now -4. Average that with his -10 UZR in CF, which gives him a total of -7 runs in 100 games. I am using 150 games as the 50% regression point so we regress this player 150/(150+100) or 60% toward a mean of -3 (because these are players who play both CF and corner, they are below average CF’ers). That comes out to -1.6. Add in an aging factor, say -.5 for a 25-year old and we get a projection of -2.1 for CF. That would mean a projection of +1.9 in LF, a +4 run adjustment and +4.9 in RF, a +7 run adjustment, assuming normal “splits.”

So let’s look at some numbers. To establish a baseline and test (and calibrate) our projections, let’s look at all players who played CF and LF or RF in season one (min 20 games in each) and then their next season in either CF or the corners:

UZR season one UZR season two Projected UZR
LF or RF +6.0 (N games=11629) 2.1 (N=42866) 2.1
CF -3.0 (N=9955) -.8 (23083) -.9

 

The spread we see in column 2, “UZR season one” is based on the “delta method”. It is expected to be a little wider than the normal talent spread we expect between CF and LF/RF which is around 6 runs. That is because of selective sampling. Players who do well at the corners will tend to also play CF and players who play poorly in CF will tend to get some play at the corners. The spread we see in column 3, “UZR season two” does not mean anything per se. In season two these are not necessarily players who played both positions again (they played either one or the other or both). All it means is that of players who played both positions in season one, they are 2.1 runs above average at the corners and .8 runs below average in CF, in season two.

Now let’s look at the same table for players like Eaton, who had larger than normal splits between a corner position and CF. I used a threshold of at least a 10-run difference (5.5 is typical). There were 254 players who played at least 20 games in CF and in RF or LF in one season and then played in LF in the next season, and 138 players who played in CF and LF or RF in one season and in RF in the next.

UZR season one UZR season two Projected UZR
LF or RF +12.7 (N games=4924) 1.4
CF -12.3 (N=4626) .3

 

For now, I’m leaving the third column, their UZR in season two, empty. These are players who appeared to be better suited at a corner position than in CF. If we assume that these unusual splits are merely noise, a random fluctuation, and that we expect them to have a normal split in season two, we can use the method I describe above to craft a projection for them. Notice the small split in the projections. The projection model I am using creates a CF projection and then it merely adds +4 runs for LF and +7 for RF. Given a 25-run split in season one rather than a normal 6-run split, we might assume that these players will play better, maybe much better, in RF or LF than in CF, in season two. In other words, there is a significant “true talent defensive split” in the OF. So rather than 1.4 in LF or RF (our projection assumes a normal split), we might see a performance of +5, and instead of .3 in CF, we might see -5, or something like that.

Remember that our projection doesn’t care how the CF and corner OF UZR’s are distributed in season one. It assumes static talent and just converts corner UZR to CF UZR by subtracting 4 or 7 runs. Then when it finalizes the CF projection, it assumes we can just add 4 runs for a LF projection and 7 runs for a RF one. It treats all OF positions the same, with a static conversion, regardless of the actual splits. The projection assumes that there is no such thing as “true talent OF splits.”

Now let’s see how well the projection does with that assumption (no such thing as “true talent OF defensive splits”). Remember that if we assume that there is “something” to those unusual splits, we expect our CF projection to be too high and our LF/RF projection to be too low.

UZR season one UZR season two Projected UZR
LF or RF +12.7 (N games=4924) .9 (N=16857) 1.4
CF -12.3 (N=4626) .8 (N=10250) .3

 

We don’t see any evidence of a “true talent OF split” when we compare projected to actual. In fact, we see the opposite effect, which is likely just noise (our projection model is pretty basic and not very precise). Instead of seeing better than expected defense at the corners as we might expect from players like Eaton who had unusually good defense at the corners compared to CF in season one, we see slightly worse than projected defense. And in CF, we see slightly better defense than projected even though we might have expected these players to be especially unsuited to CF.

Let’s look at players, unlike Eaton, who have “reverse” splits. These are players who in at least 20 games in both CF and LF or RF, had a better UZR in CF than at the corners.

UZR season one UZR season two Projected UZR
LF or RF -4.8 (N games=3299) 1.4 (N=15007) 2.4
CF 7.8 (N=3178) -4.4 (N=6832) -2.6

 

Remember, the numbers in column two, season one UZR “splits” are based on the delta method. Therefore, every player in our sample had a better UZR in CF than in LF or RF and the average difference was 12.6 runs (in favor of CF) whereas we expected an average difference of minus 6 runs or so (in favor of LF/RF). The “delta method” just means that I averaged all of the players’ individual differences weighted by the lesser of their games, either in CF or LF/RF.

Again, according to the “these unusual splits must mean something” (in terms of talent and what we expect in the next season) theory, we expect these players to significantly exceed their projection in CF and undershoot it at the corners. Again, we don’t see that. We see that our projections are high for both positions; in fact we overshoot more in CF than in RF/LF exaclty the opposite of what we would expect if there were any significance to these unusual splits. Again we see no evidence of a “true talent split in OF defense.”

For players with unusual splits in OF defense, we see that a normal projection at CF or at the corners suffices. We treat LF/RF/CF UZR exactly the same making static adjustments regardless of the direction and magnitude of the empirical splits. What about the idea that, “We don’t know what to expect with a player like Eaton?” I don’t really know what that means, but we hear it all the time when we see numbers that look unusual or “trendy” or appear to follow a “pattern.” Does that mean we expect there to be more fluctuation in season two UZR? Perhaps even though on the average they revert to normal spreads, we see a wider spread of results in these players who exhibit unusual splits in season one. Let’s look at that in our final analysis.

When we look at all players who played CF and LF/RF in season one, remember the average spread was 9 runs, +6 at the corners and -3 in CF. In season two, 28% of the players who played RF or LF had a UZR greater than +10 and 26% in CF had a UZR of -10 or worse. The standard deviation of the distribution in season two UZR was 13.9 runs for LF/RF and 15.9 in CF

What about our players like Eaton? Can we expect more players to have a poor UZR in CF and a great one at a corner? No. 26% of these players had a UZR greater than +10 and 25% had a UZR less than -10 on CF, around the same as all “dual” players in season one. In fact we get a smaller spread with these players with unusual splits as we would expect given that their means in CF and at the corners are actually closer together (look at the tables above). The standard deviation of the distribution in season two UZR for these players was 13.2 runs for LF/RF and 15.3 in CF, slightly smaller than for all “dual” players combined.

In conclusion, there is simply nothing to write about when it comes to Eaton’s or anyone else’s unusual outfield UZR or DRS splits. If you want to estimate their UZR going forward simply adjust and combine all of their OF numbers and do a normal projection. It doesn’t matter if they have -16 in LF and +20 in CF, 0 runs in CF only, or +4 runs in LF only. It’s all the same thing with exactly the same projection and exactly the same distribution of results the next season.

As far as we can tell there is simply no such thing (to any significant or identifiable degree) as an outfielder who is more suited to one OF position than another. There is outfield defense – period. It doesn’t matter where you are standing in the OF. The ability to catch line drives and fly balls in the OF is more or less the same whether you are standing in the middle or on the sides of the OF (yes it could take some time to get used to a position if you are unfamiliar with it). If you are good in one location you will be good at another, and if you are bad at one location you will be bad at another. Your UZR or DRS might change in a somewhat predictable fashion depending upon what position, CF, LF, or RF is being measured, but that’s only because the players you are measured against (those metrics are relative) differ in their average ability to catch fly balls and line drives. More importantly, when you see a player who has an unusual “split” in their outfield numbers, like Eaton, you will be tempted to think that they are intrinsically better at one position than another and that the unusual split will tend to continue in the future. When you see really large splits you will be tempted even more. Remember the words in this paragraph and remember this analysis to avoid being fooled by randomness into drawing faulty conclusions, as all human beings, even smart ones, are wont to do.

In Game 7 of the World Series anyone who was watching the top of the 9th inning probably remembers Javier Baez attempting a (safety squeeze – presumably) bunt on a 3-2 count with 1 out and Jason Heyward on 3rd base. You also remember that Baez struck out on a foul ball, much to the consternation of Cubs fans.

There was plenty of noise on social media criticizing Maddon (or Baez, if he did that on his own) for such an unusual play (you rarely see position players bunt on 2-strike counts, let alone with a 3-2 count and let alone with a runner on 3rd) and of course because it failed and eventually led to a scoreless inning. I was among those screaming bloody murder on Twitter and continuing my long-running criticism of Maddon’s dubious (in my opinion) post-season in-game tactics dating back to his Tampa days. I did, however, point out that I didn’t know off the top of my head (and it was anything but obvious or trivial to figure out) what the “numbers” were but that I was pretty sure it was a bad strategy.

Some “prima facia” evidence that it might be bad play, as I also tweeted, was, “When have you ever seen a play like that in baseball game?” That doesn’t automatically mean that it’s a bad play, but it is evidence nonetheless. And the fact that it was a critical post-season game meant nothing. If was correct to do it in that game it would be correct to do it in any game – at least in the late innings of a tie or 1-run game.

Anyway, I decided to look at some numbers although it’s not an easy task to ascertain whether in fact this was a good, bad, or roughly neutral (or we just don’t know) play. I turned to Retrosheet as I often do, and looked at what happens when a generic batter (who isn’t walked, which probably eliminates lots of good batters) does not bunt (which is almost all of the time of course) on a 3-2 count with 1 out, runner on third base and no runner on first, in a tie game or one in which the batting team was ahead, in the late innings, when the infield would likely be playing in to prevent a run from scoring on a ground ball. This is what I found:

The runner scores around 28% of the time overall. There were 33% walks (pitcher should be pitching a bit around the batter in this situation), 25% strikeouts and 25% BIP outs. When the ball is put in play, which occurs 42% of the time, the runner scores 63% of the time.

Now let’s look at what happens when a pitcher simply bunts the ball on a 3-2 count in a sacrifice situation. We’ll use that as a proxy for what Baez might do when trying to bunt in this situation. Pitchers are decent bunters overall (although they don’t run well on a bunt) and Baez is probably an average bunter at best for a position player. In fact, Baez has a grand total of one sacrifice hit in his entire minor and major league career so he may be an poor bunter – but to give him and Maddon the benefit of the doubt we’ll assume that he is as good at bunting as your typical NL pitcher.

On a 3-2 count in a sac situation when the pitcher is still bunting, he strikes out 40% of the time and walks 22% of the time. Compare that to the hitter who swings away at 3-2, runner on 3rd and 1 out where he K’s 25% of the time and walks 33% of the time. Of those 40% strikeouts, lots are bunt fouls. In fact, pitchers strike out on a foul bunt with a 3-2 count 25% % of the time. The rest, 15%, are called strikes and missed bunt attempts. It’s very easy to strike out on a foul bunt when you have two strikes, even when there are 3 balls (and you can take some close pitches).

How often does the run score on a 3-2 bunt attempt with a runner on 3rd such as in the Baez situation? From that data we can’t tell because we’re only looking at 3-2 bunts from pitchers with no runner on 3rd so we have make some inferences.

The pitcher puts the ball in play 36% of the time when bunting on a 3-2 count. How often would a runner score if there were a runner on 3rd? We’ll have to make some more inferences. In situations where a batter attempts a squeeze (either a suicide or safety – for the most part, we can’t tell from the Retrosheet data), the runner scores 80% of the time when the ball in bunted in play. So let’s assume the same with our pitchers/Baez. So 36% of the time the ball is put in play on a 3-2 bunt, 80% of the time the run scores. That’s a score rate of 29% – around the same as when swinging away.

So swinging away, the run scores 28% of the time. With a bunt attempt the run scores 29% of the time, so it would appear to be a tie with no particular strategy a clear winner. But wait….

When the run doesn’t score, the batter who is swinging away at 3-2 walks 33% of the time while the pitcher who is attempting a bunt on a 3-2 pitch walks only 25% of the time. But, we won’t count that as an advantage for the batter swinging away. The BB difference is likely due to the fact that pitchers are pitching around batters in that situation and they are going right after pitchers on 3-2 counts in sacrifice situations. In a situation like Baez’ the pitcher is going to issue more than 25% walks since he doesn’t mind the free pass and he is not going to groove one. So we’ll ignore the difference in walks. But wait again….

When a run scores on a squeeze play the batter is out 72% of the time and ends up mostly on first 28% of the time (a single, error, or fielder’s choice). When a run scores with a batter swinging away on a 3-2 count, the batter is out only 36% of the time. 21% of those are singles and errors and 15% are extra base hits including 10% triples and 5% HR.

So even though the run scores with both bunting and hitting away on a 3-2 count around the same percentage of the time, the batter is safe, including walks, hits, errors and fielder’s choices, only 26% of the time when bunting and 50% when swinging away. Additionally, when the batter swinging away gets a hit, 20% are triples and 6% are HR. So even though the runner on third scores around the same percentage of time whether swinging away or bunting on that 3-2 count, when the run does score, the batter who is swinging away reaches base safely (with some extras base hits including HR) more than twice as often as the batter who is bunting

I’m going to say that the conclusion is that while the bunt attempt was probably not a terrible play, it was still the wrong strategy given that it was the top of the inning. The runner from third will probably score around the same percentage of the time whether Baez is bunting or swinging away, but when the run does score, Baez is going to be safe a much higher percentage of the time, including via the double, triple or HR, leading to an additional run scoring significantly more often than with the squeeze attempt.

I’m not giving a pass to Maddon on this one. That would be true regardless of whether the bunt worked or not – of course.

Addendum: A quick estimate is that an additional run (or more) will score around 12% more often when swinging away. An extra run in the top of the 9th, going from a 1-run lead to a 2-run lead,  increases a team’s chances of winning by 10% (after that every additional run is worth half the the value of the preceding run). So we get an extra 1.2% (10% times 12%) in win expectancy from swinging away rather than bunting via the extra hits that occur when the ball is put into play.

 

 

Let me explain game theory wrt sac bunting using tonight’s CLE game as an example. Bottom of the 10th, leadoff batter on first, Gimenez is up. He is a very weak batter with little power or on-base skills, and the announcers say, “You would expect him to be bunting.” He clearly is.

Now, in general, to determine whether to bunt or not, you estimate the win expectancies (WE) based on the frequencies of the various outcomes of the bunt, versus the frequencies of the various outcomes of swinging away. Since, for a position player, those two final numbers are usually close, even in late tied-game situations, the correct decision usually hinges on: On the swing side, whether the batter is a good hitter or not, and his expected GDP rate. On the bunt side, how good of a sac bunter is he and how fast is he (which affect the single and ROE frequencies, which are an important part of the bunt WE)?

Gimenez is a terrible hitter which favors the bunt attempt but he is also not a good bunter and slow which favors hitting away. So the WE’s are probably somewhat close.

One thing that affects the WE for both bunting and swinging, of course, is where the third baseman plays before the pitch is thrown. Now, in this game, it was obvious that Gimenez was bunting all the way and everyone seemed fine with that. I think the announcers and probably everyone would have been shocked if he didn’t (we’ll ignore the count completely for this discussion – the decision to bunt or not clearly can change with it).

The announcers also said, “Sano is playing pretty far back for a bunt.” He was playing just on the dirt I think, which is pretty much “in between when expecting a bunt.” So it did seem like he was not playing up enough.

So what happens if he moves up a little? Maybe now it is correct to NOT bunt because the more he plays in, the lower the WE for a bunt and the higher the WE for hitting away! So maybe he shouldn’t play up more (the assumption is that if he is bunting, then the closer he plays, the better). Maybe then the batter will hit away and correctly so, which is now better for the offense than bunting with the third baseman playing only half way. Or maybe if he plays up more, the bunt is still correct but less so than with him playing back, in which case he SHOULD play up more.

So what is supposed to happen? Where is the third baseman supposed to play and what does the batter do? There is one answer and one answer only. How many managers and coaches do you think know the answer (they should)?

The third baseman is supposed to play all the way back “for starters” in his own mind, such that it is clearly correct for the batter to bunt. Now he knows he should play in a little more. So in his mind again, he plays up just a tad bit.

Now is it still correct for the batter to bunt? IOW, is the bunt WE higher than the swing WE given where the third baseman is playing? If it is, of course he is supposed to move up just a little more (in his head).

When does he stop? He stops of course when the WE from bunting is exactly the same as the WE from swinging. Where that is completely depends on those things I talked about before, like the hitting and bunting prowess of the batter, his speed, and even the pitcher himself.

What if he keeps moving up in his mind and the WE from bunting is always higher than hitting, like with most pitchers at the plate with no outs? Then the 3B simply plays in as far as he can, assuming that the batter is bunting 100%.

So in our example, if Sano is indeed playing at the correct depth which maybe he was and maybe he wasn’t, then the WE from bunting and hitting must be exactly the same, in which case, what does the batter do? It doesn’t matter, obviously! He can do whatever he wants, as long as the 3B is playing correctly.

So in a bunt situation like this, assuming that the 3B (and other fielders if applicable) is playing reasonably correctly, it NEVER matters what the batter does. That should be the case in every single potential sac bunt situation you see in a baseball game. It NEVER matters what the batter does. Either bunting or not are equally “correct.” They result in exactly the same WE.

The only exceptions (which do occur) are when the WE from bunting is always higher than swinging when the 3B is playing all the way up (a poor hitter and/or exceptional bunter) OR the WE from swinging is always higher even when the 3B is playing completely back (a good or great hitter and/or poor bunter).

So unless you see the 3B playing all the way in or all the way back and they are playing reasonably optimally it NEVER matters what the batter does. Bunt or not bunt and the win expectancy is exactly the same! And if the 3rd baseman plays all the way in or all the way back and is playing optimally, then it is always correct for the batter to bunt or not bunt 100% of the time.

I won’t go into this too much because the post assumed that the defense was playing optimally, i.e. it was in a “Nash Equilibrium” (as I explained, it is playing in a position such that the WE for bunting and swinging are exactly equal) or it was correctly playing all the way in (the WE for bunting is still equal to or great than for swinging) or all the way back (the WE for swinging is >= that of bunting), but if the defense is NOT playing optimally, then the batter MUST bunt or swing away 100% of the time.

This is critical and amazingly there is not ONE manager or coach in MLB that understands it and thus correctly utilizes a correct bunt strategy or bunt defense.

Note: There is the beginning of a very good discussion about this topic on The Book blog. If this topic interests you, feel free to check it out and participate if you want to.

I’ve been thinking about this for many years and in fact I have been threatening to redo my UZR methodology, in order to try and reduce one of the biggest weaknesses inherent in most if not all of the batted ball advanced defensive metrics.

Here is how most of these metrics work: Let’s say a hard hit ball was hit down the third base line and the third baseman made the play and threw the runner out. He would be credited with an out minus the percentage of time that an average fielder would make the same or similar play, perhaps 40% of the time. So the third baseman would get credit for 60% of a “play” on that ball, which is roughly .9 runs (the difference between the average value of a hit down the 3rd base line and an out) times .6 or .54 runs. Similarly, if he does not make the play, he gets debited with .4 plays or minus .36 runs.

There are all kind of adjustments which can be made, such as park effects, handedness of the batter, speed of the runner, outs and base runners (these affect the positioning of the fielders and therefore the average catch rate), and even the G/F ratio of the pitcher (e.g., a ground ball pitcher’s “hard” hit balls will be a little softer than a fly ball pitcher’s “hard” hit ball).

Anyway here is the problem with this methodology which, as I said, is basic to most if not all of these defensive metrics, and it has to do with our old friend Bayes. As is usually the case, this problem is greater in smaller sample sample sizes. We don’t really, really know the probability of an average fielder making any given play; we can only roughly infer it from the characteristics of the batted ball that we have access to and perhaps from the context that I described above (like the outs, runners, batter hand, park, etc.).

In the above example, a hard hit ground ball down the third base line, I said that the league average catch rate was 40%. Where did I get than number from? (Actually, I made it up, but let’s assume that that is a correct number in MLB over the last few years, given the batted ball location database that we are working with.) We looked at all hard hit balls hit to that approximate location (right down the third base line), according to the people who provide us with the database, and found out that of those 600 some odd balls over the last 4 years, 40% of them were turned into outs by the third baseman on the field.

So what is wrong with giving a third baseman .6 credit when he makes the play and .4 debit when he doesn’t? Well, surely not every single play, if you were to “observe” and “crunch” the play like, say, Statcast would do, is caught exactly 40% of the time. For any given play in that bucket, whether the fielder caught the ball or not, we know that he didn’t really have exactly a 40% chance of catching it if he were an average fielder. You knew that already. That 40% is the aggregate for all of the balls that fit into that “bucket” (“hard hit ground ball right down the third base line”).

Sometimes it’s 30%. Other times it’s 50%. Still other times it is near 0 (like if the 3rd baseman happens to be playing way off the line, and correctly so) or near 100% (like when he is guarding the line and he gets a nice big hop right in front of him), and everything in between.

On the average it is 40%, so you say, well, what are we to do? We can’t possibly tell from the data how much it really varies from that 40% on any particular play, which is true. So the best we can do is assume 40%, which is also true. That’s just part of the uncertainty of the metric. On the average, it’s right, but with error bars. Right? Wrong!

We do have information which helps us to nail down the true catch percentage of the average fielder given that exact same batted ball, at least how it is recorded by the people who provide us with the data. I’m not talking about the above-mentioned adjustments like the speed of the batter, his handedness, or that kind of thing. Sure, that helps us and we can use it or not. Let’s assume that we are using all of these “contextual adjustments” to the best of our ability. There is still something else that can help us to tweak those “league average caught” percentages such that we don’t have to use 40% on every hard hit ground ball down the line. Unfortunately, most metrics, including my own UZR, don’t take advantage of this valuable information even though it is staring us right in the face. Can you guess what it is?

The information that is so valuable is whether the player caught the ball or not! You may be thinking that that is circular logic or perhaps illogical. We are using that information to credit or debit the fielder. How and why would we also use it to change the base line catch percentage – in our example, 40%? In comes Bayes.

Basically what is happening is this: Hard ground ball is hit down the third base line. Overall 40% of those plays are made, but we know that not every play has a 40% chance of being caught because we don’t know where the fielder was positioned and we don’t really know the exact characteristics of the ball which greatly affect its chances of being caught: it was hit hard, but how hard? What kind of a bounce did it take? Did it have spin? Was it exactly down the line or 2 feet from the line (they were all classified as being in the same “location”)? We know the runner is fast (let’s say we created a separate bucket for those batted balls with a fast runner at the plate), but exactly how fast was he? Maybe he was a blazer and he beat it out by an eyelash.

So what does that have to do with whether the fielder caught the ball or not? That should be obvious by now. If the third baseman did not catch the ball, on the average, it should be clear that the ball tended to be one of those balls that were harder to catch than the average ball in that bucket. In other words, the chances that any ball that is caught should or would have been caught by an average fielder is clearly less than 40%. Similarly if a ball was caught, by any fielder, it was more likely to be an easier play than the average ball in that bucket. What we want are conditional probabilities, based on whether the ball was caught or not.

How much easier are the caught balls than the not-caught ones in any given bucket? That’s hard to say. Really hard to say. One would have to have lots of information in order to apply Bayes theorem to better estimate the “catch rate” of a ball in a particular bucket based on whether it is caught or not caught. I can tell you that I think the differences are pretty significant. It mostly depends on the spread (and what the actual distribution looks like) of actual catch rates in any given bucket. That depends on a lot of things. For one thing, the “size” and accuracy of the locations and other characteristics which make up the buckets. For example, if the unique locations were pretty large, say, one “location bucket” is anywhere from down the third base line to 20 feet off the bag (about 1/7 of the total distance from line to line), then the spread of actual catch rates versus the average catch rate in that bucket is going to be huge. Therefore the difference between the true catch rates for caught ball and non-caught ball is going to be large as well.

Speed of the batted ball is important as well. On very hard hit balls, the distribution of actual catch rates within a certain location will tend to be polarized or “bi-modal.” Either the ball will tend to be hit near the fielder and he makes the play or a little bit away from the fielder and he doesn’t. In other words, a catch might have a 75% true catch rate and non-catch, 15%, on the average, even if the overall rate is 40%.

Again, most metrics use the same base line catch rate for catches and non-catches because that seems like the correct and intuitive thing to do. It is incorrect! The problem, of course, is what number to assign to a catch and to a non-catch in any given bucket. How do we figure that out? Well, I haven’t gotten to that point yet, and I don’t think anyone else has either (I could be wrong). I do know, however, that it is guaranteed that if I use 39% for a non-catch and 41% for a catch, in that 40% bucket, I am going to be more accurate in my results, so why not do that? Probably 42/38 is better still. I just don’t know when to stop. I don’t want to go too far so that I end up cutting my own throat.

This is similar to the problem with park factors and MLE’s (among other “adjustments”). We don’t know that using 1.30 for Coors Field is correct but we surely know that using 1.05 is better than 1.00. We don’t know that taking 85% of player’s AAA stats to convert them to a major league equivalency is correct, but we definitely know that 95% is better than nothing.

Anyway, here is what I did today (other than torture myself by watching the Ringling Brothers and…I mean the Republican debates). I took a look at all ground balls that were hit in vector “C” according to BIS and was either caught or went through the infield in less than 1.5 seconds, basically a hard hit ball down the third base line. If you watch these plays, even though I would put them in the same bucket in the UZR engine, it is clear that some are easy to field and others are nearly impossible. You would be surprised at how much variability there is. On paper they “look” almost exactly the same. In reality they can vary from day to night and everything in between. Again, we don’t really care about the variance per se, but we definitely care about the mean catch rates when they are caught and when they are not.

Keep in mind that we can never empirically figure out those mean catch rates like we do when we aggregate all of the plays in the bucket (and then simply use the average catch rate of all of those balls). You can’t figure out the “catch rate” of a group of balls that were caught. It would be 100% right? We are interested in the catch rate of an average fielder when these balls were caught by these particular fielders, for whatever reasons they caught them. Likewise we want to know the league average catch rates of a group of balls that were not caught by these particular fielders for whatever reasons.

We can make these estimates (the catch rates of caught balls and non-caught balls in this bucket) in one of two ways: the first way is probably better and much less prone to human bias. It is also way more difficult to do in practice. We can try and observe all of the balls in this bucket and then try and re-classify them into many buckets according to the exact batted ball characteristics and fielder positioning. In other words, one bucket might be hard hit ground huggers right down the line with the third baseman playing roughly 8 feet off the line. Another might be, well, you get the point. Then we can actually use the catch rates in those sub-buckets.

When we are done, we can figure out the average catch rate on balls that were caught and those that were not, in the entire bucket. If that is hard to conceptualize, try constructing an example yourself and you will see how it works.

As I said, that is a lot of work. You have to watch a lot of plays and try and create lots and lots of sub-buckets. And then, even in the sub-buckets you will have the same situation, although much less problematic. For example, in one of those sub-buckets, a caught ball might be catchable 20% of the time in reality and a non-caught one only 15% – not much to worry about. In the large, original bucket, it might be 25% and 60%, as I said before. And that is a problem, especially for small samples.

Keep in mind that this problem will be mitigated in large samples but it will never go away. It will always overrate a good performance and underrate a bad one. But, in small samples, like even in one season, it will overrate so-called good fielding performance and underrate bad ones. The better the numbers the more they overstate the actual performance. The same is true for bad numbers. This is why I have been saying for years to regress what you see from UZR or DRS, even if you want to estimate “what happened.” (You would have to regress even more if you want to estimate true fielding talent.)

This is one of the problems with simply combining offense and defense to generate WAR. The defensive component needs to be regressed while the offensive one does not (base running needs to be regressed too. It suffers from the same malady as the defensive metrics).

Anyway, I looked at 20 or so plays in one particular bucket and tried to use the second method of estimating true catch rates for catches and non-catches. I simply observed the play and tried to estimate how often an average fielder would have made the play whether it was caught or not.

This is not nearly as easy as you might think. For one thing, guessing an average “catch rate” number like 60% or 70%, even if you’ve watched thousands of games in your life like I have, is incredibly difficult. The 0-10% and 90-100% ones are not that hard. Everything else is. I would guess that my uncertainty is something like 25% on a lot of plays, and my uncertainty on that estimate of uncertainty is also high!

The other problem is bias. When a play is made, you will overrate the true average catch rate (how often an average fielder would have made the play) and vice versa for plays that are not made. Or maybe you will underrate them because you are trying to compensate for the tendency to overrate them. Either way, you will be biased by whether the play was made or not, and remember you are trying to figure out the true catch rate on every play you observe with no regard to whether the play was made or not. (In actuality maybe whether it was made or not can help you with that assessment).

Here is a condensed version of the numbers I got. In that one location, presumably from the third base line to around 6 feet off the line, for ground balls that arrive in less than 1.5 seconds (I have 4 such categories of speed/time for GB), the average catch rate overall was 36%. However, for balls that were caught (and I only looked at 6 random ones), I estimated the average catch rate to be 11% (that varied from 0 to 35%). For balls that were caught (also 6 of them), it was 53% (from 10% to 95%). That is a ridiculously large difference and look at the variation even within those two groups (caught and not-caught). Even though using 11% for non-catches and 53% for catches is better than using 40% for everything, we are still making lots of mistakes within the new caught and not caught buckets!

How does that affect a defensive metric? Let’s look at a hypothetical example: Third baseman A makes 10 plays in that bucket and misses 20. Third baseman B makes 15 and misses 15. B clearly had a better performance, but how much better? Let’s assume that the average fielder makes 26% of the plays in the bucket and the misses are 15% and the catches are 56% (actually a smaller spread than I estimated). Using 15% and 56% yields an overall catch rate of around 26%.

UZR and most of the other metrics will do the calculations this way: Player A’s UZR is 10 * .74 – 20 * .26, or plus 2.2 plays which is around plus 2 runs. Player B is 15 * .74 – 15 * .26, or plus 7.2 plays, which equals plus 6.5 runs.

What about if we use the better numbers, 15% for missed plays and 56% for made ones. Now for Player A we have: 10 * .44 – 20 * .15, or 1.4 plays, which is 1.3 runs. Player B is 3.9 runs. So Player A’s UZR for those 30 plays went from +2 to + 1.3 and Player B went from +6.5 to +3.9. Each player regressed around 35-40% toward zero. That’s a lot!

Now I have to figure out how to incorporate this “solution” to all of the UZR buckets in some kind of fairly elegant way, short of spending hundreds of hours observing plays. Any suggestions would be appreciated.

 

Note: I updated the pinch hitting data to include a larger sample (previously I went back to 2008. Now, 2000).

Note: It was pointed out by a commenter below and another one on Twitter that you can’t look only at innings where the #9 and #1 batters batted (eliminating innings where the #1 hitter led off), as Russell did in his study, and which he uses to support his theory (he says that it is the best evidence). That creates a huge bias, of course. It eliminates all PA in which the #9 hitter made the last out of an inning or at least an out was made while he was at the plate. In fact, the wOBA for a #9 hitter, who usually bats around .300, is .432 in innings where he and the #1 hitter bat (after eliminating so many PA in which an out was made). How that got past Russell, I have no idea.  Perhaps he can explain.

Recently, Baseball Prospectus published an article by one of their regular writers, Russell Carleton (aka Pizza Cutter), in which he examined whether the so-called “times through the order” penalty (TTOP) was in fact a function of how many times a pitcher has turned over the lineup in a game or whether it was merely an artifact of a pitcher’s pitch count. In other words, is it pitcher fatigue or batter familiarity (the more the batter sees the pitcher during the game, the better he performs) which causes this effect?

It is certainly possible that most or all of the TTOP is really due to fatigue, as “times through the order” is clearly a proxy for pitch count. In any case, after some mathematic gyrations that Mr. Carleton is want to do (he is the “Warning: Gory Mathematical Details Ahead” guy) in his articles, he concludes unequivocally that there is no such thing as a TTOP – that it is really a PCP or Pitch Count Penalty effect that makes a pitcher less and less effective as he goes through the order and it has little or nothing to do with batter/pitcher familiarity. In fact, in the first line of his article, he declares, “There is no such thing as the ‘times through the order’ penalty!”

If that is true, this is a major revelation which has slipped through the cracks in the sabermetric community and its readership. I don’t believe it is, however.

As one of the primary researchers (along with Tom Tango) of the TTOP, I was taken quite aback by Russell’s conclusion, not because I was personally affronted (the “truth” is not a matter of opinion), but because my research suggested that pitch count or fatigue was likely not a significant part of the penalty. In my BP article on the TTOP a little over 2 years ago, I wrote this: “…the TTOP is not about fatigue. It is about familiarity. The more a batter sees a pitcher’s delivery and repertoire, the more likely he is to be successful against him.” What was my evidence?

First, I looked at the number of pitches thrown going into the second, third, and fourth times through the order. I split that up into two groups—a low pitch count and a high pitch count. Here are those results. The numbers in parentheses are the average number of pitches thrown going into that “time through the order.”

Times Through the Order Low Pitch Count High Pitch Count
1 .341 .340
2 .351 (28) .349 (37)
3 .359 (59) .359 (72)
4 .361 (78) .360 (97)

 

If Russell’s thesis were true, you should see a little more of a penalty in the “high pitch count” column on the right, which you don’t. The penalty appears to be the same regardless of whether the pitcher has thrown few or many pitches. To be fair, the difference in pitch count between the two groups is not large and there is obviously sample error in the numbers.

The second way I examined the question was this: I looked only at individual batters in each group who had seen few or many pitches in their prior PA. For example, I looked at batters in their second time through the order who had seen fewer than three pitches in their first PA, and also batters who saw more than four pitches in their first PA. Those were my two groups. I did the same thing for each time through the order. Here are those results. The numbers in parentheses are the average number of pitches seen in the prior PA, for every batter in the group combined.

 

Times Through the Order Low Pitch Count each Batter High Pitch Count each Batter
1 .340 .340
2 .350 (1.9) .365 (4.3)
3 .359 (2.2) .361 (4.3)

 

As you can see, if a batter sees more pitches in his first or second PA, he performs better in his next PA than if he sees fewer pitches. The effect appears to be much greater from the first to the second PA. This lends credence to the theory of “familiarity” and not pitcher fatigue. It is unlikely that 2 or 3 extra pitches would cause enough fatigue to elevate a batter’s wOBA by 8.5 points per PA (the average of 15 and 2, the “bonuses” for seeing more pitches during the first and second PA, respectively).

So how did Russell come to his conclusion and is it right or wrong? I believe he made a fatal flaw in his methodology which led him to a faulty conclusion (that the TTOP does not exist).

Among other statistical tests, here is the primary one which led Russell to conclude that the TTOP is a mirage and merely a product of pitcher fatigue due to an ever-increasing pitch count:

This time, I tried something a little different. If we’re going to see a TTOP that is drastic, the place to look for it is as the lineup turns over. I isolated all cases in which a pitcher was facing the ninth batter in the lineup for the second time and then the first batter in the lineup for the third time. To make things fair, neither hitter was allowed to be the pitcher (this essentially limited the sample to games in AL parks), and the hitters needed to be faced in the same inning. Now, because the leadoff hitter is usually a better hitter, we need to control for that. I created a control variable for all outcomes using the log odds ratio method, which controls for the skills of the batter, as well as that of the pitcher. I also controlled for whether or not the pitcher had the platoon advantage in either case.

First of all, there was no reason to limit the data to “the same inning”. Regardless of whether the pitcher faces the 9th and 1st batters in the same inning or they are split up (the 9 hitter makes the last out), since one naturally follows the other, they will always have around the same pitch count, and the leadoff hitter will always be one time through the order ahead of the number nine hitter.

Anyway, what did Russell find? He found that TTOP was not a predictor of outcome. In other words, that the effect on the #9 hitter was the same as the #1 hitter, even though the #1 hitter had faced the pitcher one more time than the #9 hitter.

I thought about this for a long time and I finally realized why that would be the case even if there was a “times order” penalty (mostly) independent of pitch count. Remember that in order to compare the effect of TTO on that #9 and #1 hitter, he had to control for the overall quality of the hitter. The last hitter in the lineup is going to be a much worse hitter overall than the leadoff hitter, on the average, in his sample.

So the results should look something like this if there were a true TTOP: Say the #9 batters are normally .300 wOBA batters, and the leadoff guys are .330. In this situation, the #9 batters should bat around .300 (during the second time through the order we see around a normal wOBA) but the leadoff guys should bat around .340 – they should have a 10 point wOBA bonus for facing the pitcher for the third time.

Russell, without showing us the data (he should!), presumably gets something like .305 for the #9 batters (since the pitcher has gone essentially 2 ½ times through the lineup, pitch count-wise) and the leadoff hitters should hit .335, or 5 points above their norm as well (maybe .336 since they are facing a pitcher with a few more pitches under his belt than the #9 hitter).

So if he gets those numbers, .335 and .305, is that evidence that there is no TTOP? Do we need to see numbers like .340 and .300 to support the TTOP theory rather than the PCP theory? I submit that even if Russell sees numbers like the former ones, that is not evidence that there is no TTOP and it’s all about the pitch count. I believe that Russell made a fatal error.

Here is where he went wrong:

Remember that he uses the log-odds method to computer the baseline numbers, or what he would expect from a given batter-pitcher matchup, based on their overall season numbers. In this experiment, there is no need to do that, since both batters, #1 and #9, are facing the same pitcher the same number of times. All he has to do is use each batter’s seasonal numbers to establish the base line.

But where do those base lines come from? Well, it is likely that the #1 hitters are mostly #1 hitters throughout the season and that #9 hitters usually hit at the bottom of the order. #1 hitters get around 150 more PA than #9 hitters over a full season. Where do those extra PA come from? Some of them come from relievers of course. But many of them come from facing the starting pitcher more often per game than those bottom-of-the-order guys. In addition, #9 hitters sometimes are removed for pinch hitters late in a game against a starter such that they lose even more of those 3rd and 4th time through the order PA’s. Here is a chart of the mean TTO per game versus the starting pitcher for each batting slot:

 

Batting Slot Mean TTO/game
1 2.15
2 2.08
3 2.02
4 1.98
5 1.95
6 1.91
7 1.86
8 1.80
9 1.77

(By the way, if Russell’s thesis is true, bottom of the order guys have it even easier, since they are always batting when the pitcher has a higher pitch count, per time through the order. Also, this is the first time you have been introduced to the concept that the top of the order batters have it a little easier than the bottom of the order guys, and that switching spots in the order can affect overall performance because of the TTOP or PCP.)

What that does is result in the baseline for the #1 hitter being higher than for the #9 hitter, because the baseline includes more pitcher TTOP (more times facing the starter for the 3rd and 4th times). That makes it look like the #1 hitter is not getting his advantage as compared to the #9 hitter, or at least he is only getting a partial advantage in Russell’s experiment.

In other words, the #9 hitter is really a true .305 hitter and the #1 hitter is really a true .325 hitter, even though their seasonal stats suggest .300 and .330. The #9 hitters are being hurt by not facing starters late in the game compared to the average hitter and the #1 hitters are being helped by facing starters for the 3rd and 4th times more often than the average hitter.

So if #9 hitters are really .305 hitters, then the second time through the order, we expect them to hit .305, if the TTOP is true. If the #1 hitters are really .325 hitters, despite hitting .330 for the whole season, we expect them to hit .335 the third time through the order, if the TTOP is true. And that is exactly what we see (presumably).

But when Russell sees .305 and .335 he concludes, “no TTOP!” He sees what he thinks is a true .300 hitter hitting .305 after the pitcher has thrown around 65 pitches and what he thinks is a true .330 hitter hitting .335 after 68 or 69 pitches. He therefore concludes that both hitters are being affected equally even though one is batting for the second time and the other for the third time – thus, there is no TTOP!

As I have shown, those numbers are perfectly consistent with a TTOP of around 8-10 points per times through the order, which is exactly what we see.

Finally, I ran one other test which I think can give us more evidence one way or another. I looked at pinch hitting appearances against starting pitchers. If the TTOP is real and pitch count is not a significant factor in the penalty, we should see around the same performance for pinch hitters regardless of the pitcher’s pitch count, since the pinch hitter always faces the pitcher for the first time and the first time only. In fact, this is a test that Russell probably should have run. The only problem is sample size. Because there are relatively few pinch hitting PA versus starting pitchers, we have quite a bit of sample error in the numbers. I split the sample of pinch hitting appearances up into 2 groups: Low pitch count and high pitch count.

 

Here is what I got:

PH TTO Overall Low Pitch Count High Pitch Count
2 .295 (PA=4901) .295 (PA=2494) .293 (PA=2318)
3 .289 (PA=10774) .290 (PA=5370) .287 (PA=5404)

 

I won’t comment on the fact that the pinch hitters performed a little better against pitchers with a low pitch count (the differences are not nearly statistically significant) other than to say that there is no evidence that pitch count has any influence on the performance of pinch hitters who are naturally facing pitchers for the first and only time. Keep in mind that the times through the order (the left column) is a good proxy for pitch count in and of itself and we also see no evidence that that makes a difference in terms of pinch hitting performance. In other words, if pitch count significantly influenced pitching effectiveness, we should see pinch hitters overall performing better when the pitcher is in the midst of his 3rd time through the order as opposed to the 2nd time (his pitch count would be around 30-35 pitches higher). We don’t. In fact, we see a worse performance (the difference is not statistically significant – one SD is 8 points of wOBA).

 

I have to say that it is difficult to follow Russell’s chain of logic and his methodology in many of his articles because he often fails to “show his work” and he uses somewhat esoteric and opaque statistical techniques only. In this case, I believe that he made a fatal mistake in his methodology as I have described above which led him to the erroneous conclusion that, “The TTOP does not exist.” I believe that I have shown fairly strong evidence that the penalty that we see pitchers incur as the game wears on is mostly or wholly as a result of the TTO and not due to fatigue caused by an increasing pitch count.

I look forward to someone doing additional research to support one theory or the other.

There seems to be an unwritten rule in baseball – not on the field, but in the stands, at home, in the press box, etc.

“You can’t criticize a manager’s decision if it doesn’t directly affect the outcome of the game, if it appears to ‘work’, or if the team goes on to win the game despite the decision.”

That’s ridiculous of course. The outcome of a decision or the game has nothing to do with whether the decision was correct or not. Some decisions may raise or lower a team’s chances of winning from 90% and other decisions may affect a baseline of 10 or 15%.

If decision A results in a team’s theoretical chances of winning of 95% and decision A, 90%, obviously A is the correct move. Choosing B would be malpractice. Equally obvious is if manager chooses B, an awful decision, he is still going to win the game 90% of the time, and based on the “unwritten rule” we rarely get to criticize him. Similarly, if decision A results in a 15% win expectancy (WE) and B results in 10%, A is the clear choice, yet the team still loses most of the time and we get to second guess the manager whether he chooses A or B. All of that is silly and counter-productive.

If your teenager drives home drunk yet manages to not kill himself or anyone else, do you say nothing because “it turned out OK?” I hope not. In sports, most people understand the concept of “results versus process” if they are cornered into thinking about it, but in practice, they just can’t bring themselves to accept it in real time. No one is going to ask Terry Collins in the post-game presser why he didn’t pinch hit for DeGrom in the 6th inning – no one. The analyst – a competent one at least – doesn’t give a hoot what happened after that. None whatsoever. He looks at a decision and if it appears questionable at the time, he tries to determine what the average consequences are – with all known data at the time the decision is made – with the decision or with one or more alternatives. That’s it. What happens after that is irrelevant to the analyst. For some reason this is a hard concept for the average fan – the average person – to apply. As I said, I truly think they understand it, especially if you give obvious examples, like the drunk driving one. They just don’t seem to be able to break the “unwritten rule” in practice. It goes against their grain.

Well, I’m an analyst and I don’t give a flying ***k whether the Mets won, lost, tied, or Wrigley Field collapsed in the 8th inning. The “correctness” of the decision to allow DeGrom to hit or not in the top of the 6th, with runners on second and third, boiled down to this question and this question only:

“What is the average win expectancy (WE) of the Mets with DeGrom hitting and then pitching some number of innings and what is the average WE with a pinch hitter and someone else pitching in place of DeGrom?”

Admittedly the gain, if there is any, from making the decision to bring in a PH and reliever or relievers must be balanced against any known or potential negative consequences for the Mets not related to the game at hand. Examples of these might be: 1) limiting your relief possibilities in the rest of the series or the World Series. 2) Pissing off DeGrom or his teammates for taking him out and thus affecting the morale of the team.

I’m fine with the fans or the manager and coaches including these and other considerations in their decision. I am not fine with them making their decision not knowing how it affects the win expectancy of the game at hand, since that is clearly the most important of the considerations.

My guess is that if we asked Collins about his decision-making process, and he was honest with us, he would not say, “Yeah, I knew that letting him hit would substantially lower our chances of winning the game, but I also wanted to save the pen a little and give DeGrom a chance to….” I’m pretty sure he thought that with DeGrom pitching well (which he usually does, by the way – it’s not like he was pitching well-above his norm), his chances of winning were better with him hitting and then pitching another inning or two.

At this point, and before I get into estimating the WE of the two alternatives facing Collins, letting DeGrom hit and pitch or pinch hitting and bringing in a reliever, I want to discuss an important concept in decision analysis in sports. In American civil law, there is a thing called a summary judgment. When a party in a civil action moves for one, the judge makes his decision based on the known facts and assuming controversial facts and legal theories in a light most favorable to the non-moving party. In other words, if everything that the other party says is true is true (and is not already known to be false) and the moving party would still win the case according to the law, then the judge must accept the motion and the moving party wins the case without a trial.

When deciding whether a particular decision was “correct” or not in a baseball game or other contest, we can often do the same thing in order to make up for an imperfect model (which all models are by the way). You know the old saw in science – all models are wrong, but some are useful. In this particular instance, we don’t know for sure how DeGrom will pitch in the 6th and 7th innings to the Cubs order for the 3rd time, we don’t know for how much longer he will pitch, we don’t know how well DeGrom will bat, and we don’t know who Collins can and will bring in.

I’m not talking about the fact that we don’t know whether DeGrom or a reliever is going to give up a run or two, or whether he or they are going to shut the Cubs down. That is in the realm of “results-based analysis” and I‘ve already explained how and why that is irrelevant. I’m talking about what is DeGrom’s true talent, say in runs allowed per 9 facing the Cubs for the third time, what is a reliever’s or relievers’ true talent in the 6th and 7th, how many innings do we estimate DeGrom will pitch on the average if he stays in the game, and what is his true batting talent.

Our estimates of all of those things will affect our model’s results – our estimate of the Mets’ WE with and without DeGrom hitting. But what if we assumed everything in favor of keeping DeGrom in the game – we looked at all controversial items in a light most favorable to the non-moving party – and it was still a clear decision to pinch hit for him? Well, we get a summary judgment! Pinch hitting for him would clearly be the correct move.

There is one more caveat. If it is true that there are indirect negative consequences to taking him out – and I’m not sure that there are – then we also have to look at the magnitude of the gain from taking him out and then decide whether it is worth it. In order to do that, we have to have some idea as to what is a small and what is a large advantage. That is actually not that hard to do. Managers routinely bring in closers in the 9th inning with a 2-run lead, right? No one questions that. In fact, if they didn’t – if they regularly brought in their second or third best reliever instead, they would be crucified by the media and fans. How much does bringing in a closer with a 2-run lead typically add to a team’s WE, compared to a lesser reliever? According to The Book, an elite reliever compared to an average reliever in the 9th inning with a 2-run lead adds around 4% to the team’s WE. So we know that 4% is a big advantage, which it is.

That brings up another way to account for the imperfection of our models. The first way was to use the “summary judgment” method, or assume things most favorable to making the decision that we are questioning. The second way is to simply estimate everything to the best of our ability and then look at the magnitude of the results. If the difference between decision A and B is 4%, it is extremely unlikely that any reasonable tweak to the model will change that 4% to 0% or -1%.

In this situation, whether we assume DeGrom is going to pitch 1.5 more innings or 1.6 or 1.4, it won’t change the results much. If we assume that DeGrom is an average hitting pitcher or a poor one, it won’t change the result all that much. If we assume that the “times through the order penalty” is .25 runs or .3 runs per 9 innings, it won’t change the results much. If we assume that the relievers used in place of DeGrom have a true talent of 3.5, 3.3, 3.7, or even 3.9, it won’t change the results all that much. Nothing can change the results from 4% in favor of decision A to something in favor of decision B. 4% is just too much to overcome even if our model is not completely accurate. Now, if our results assuming “best of our ability estimates” for all of these things yield a 1% advantage for choosing A, then it is entirely possible that B is the real correct choice and we might defer to the manager in case he knows some things that we don’t or we simply are mistaken in our estimates or we failed to account for some important variable.

Let’s see what the numbers say, assuming “average” values for all of these relevant variables and then again making reasonable assumptions in favor of allowing DeGrom to hit (assuming that pinch hitting for him appears to be correct).

What is the win expectancy with DeGrom batting. We’ll assume he is an average-hitting pitcher or so (I have heard that he is a poor-hitting pitcher). An average pitcher’s batting line is around 10% single, 2% double or triple, .3% HR, 4% BB, and 83.7% out. The average WE for an average team leading by 1 run in the top of the 6th, with runners on second and third, 2 outs, and a batter with this line, is…..

63.2%.

If DeGrom were an automatic out, the WE would be 59.5%. That is the average WE leading off the bottom of the 6th with the visiting team winning by a run. So an average pitcher batting in that spot adds a little more than 3.5% in WE. That’s not wood. What if DeGrom were a poor hitting pitcher?

Whirrrrr……

62.1%.

So whether DeGrom is an average or poor-hitting pitcher doesn’t change the Mets’ WE in that spot all that much. Let’s call it 63%. That is reasonable. He adds 3.5% to the Mets’ WE compared to an out.

What about a pinch hitter? Obviously the quality of the hitter matters. The Mets have some decent hitters on the bench – notably Cuddyer from the right side and Johnson from the left. Let’s assume a league-average hitter. Given that, the Mets’ WE with runners on second and third, 2 outs, and a 1-run lead, is 68.8%. A league-average hitter adds over 9% to the Mets’ WE compared to an out. The difference between DeGrom as a slightly below-average hitting pitcher and a league-average hitter is 5.8%. That means, unequivocally, assuming that our numbers are reasonably accurate, that letting DeGrom hit cost the Mets almost 6% in their chances of winning the game.

That is enormous of course. Remember we said that bringing in an elite reliever in the 9th of a 2-run game, as compared to a league-average reliever, is worth 4% in WE. You can’t really make a worse decision as a manager than reducing your chances of winning by 5.8%, unless you purposely throw the game. But, that’s not nearly the end of the story. Collins presumably made this decision thinking that DeGrom pitching the 6th and perhaps the 7th would more than make up for that. Actually he’s not quite thinking, “Make up for that.” He is not thinking in those terms. He does not know that letting him hit “cost 5.8% in win expectancy” compared to a pinch hitter. I doubt that the average manager knows what “win expectancy” means let alone how to use it in making in-game decisions. He merely thinks, “I really want him to pitch another inning or two, and letting him hit is a small price to pay,” or something like that.

So how much does he gain by letting him pitch the 6th and 7th rather than a reliever. To be honest it is debatable whether he gains anything at all. Not only that, but if we look back in history to see how many innings starters end up pitching, on the average, in situations like that, we will find that it is not 2 innings. It is probably not even 1.5 innings. He was at 82 pitches through 5. He may throw 20 or 25 pitches in the 6th (like he did in the first), in which case he may be done. He may give up a base runner or two, or even a run or two, and come out in the 6th, perhaps before recording an out. At best, he pitches 2 more innings, and once in a blue moon he pitches all or part of the 8th I guess (as it turned out, he pitched 2 more effective innings and was taken out after seven). Let’s assume 1.5 innings, which I think is generous.

What is DeGrom’s expected RA9 for those 2 innings? He has pitched well thus far but not spectacularly well. In any case, there is no evidence that pitching well through 5 innings tells us anything about how a pitcher is going to pitch in the 6th and beyond. What is DeGrom’s normal expected RA9? Steamer, ZIPS and my projection systems say about 83% of league-average run prevention. That is equivalent to a #1 or #2 starter. It is equivalent to an elite starter, but not quite the level of the Kershaw’s, Arrieta’s, or even the Price’s or Sale’s. Obviously he could turn out to be better than that – or worse – but all we can do in these calculations and all managers can do in making these decisions is use the best information and the best models available to estimate player talent.

Then there is the “times through the order penalty.” There is no reason to think that this wouldn’t apply to DeGrom in this situation. He is going to face the Cubs for the third time in the 6th and 7th innings. Research has found that the third time through the order a starter’s RA9 is .3 runs worse than his overall RA9. So a pitcher who allows 83% of league average runs allows 90% when facing the order for the 3rd time. That is around 3.7 runs per 9 innings against an average NL team.

Now we have to compare that to a reliever. The Mets have Niese, Robles, Reed, Colon, and Gilmartin available for short or long relief. Colon might be the obvious choice for the 6th and 7th inning, although they surely could use a combination of righties and lefties, especially in very high leverage situations. What do we expect these relievers’ RA9 to be? The average reliever is around 4.0 to start with, compared to DeGrom’s 3.7. If Collins uses Colon, Reed, Niese or some combination of relievers, we might expect them to be better than the average NL reliever. Let’s be conservative and assume an average, generic reliever for those 1.5 innings.

How much does that cost the Mets in WE? To figure that, we take the difference in run prevention between DeGrom and the reliever(s), multiply by the game leverage and convert it into WE. The difference between a 3.7 RA9 and a 4.0 RA9 in 1.5 innings is .05 runs. The average expected leverage index in the 6th and 7th innings where the road team is up by a run is around 1.7. So we multiply .05 by 1.7 and convert that into WE. The final number is .0085, or less than 1% in win expectancy gained by allowing DeGrom to pitch rather than an average reliever.

That might shock some people. It certainly should shock Collins, since that is presumably his reason for allowing DeGrom to hit – he really, really wanted him to pitch another inning or two. He presumably thought that that would give his team a much better chance to win the game as opposed to one or more of his relievers. I have done this kind of calculation dozens of times and I know that keeping good or even great starters in the game for an inning or two is not worth much. For some reason, the human mind, in all its imperfect and biased glory, overestimates the value of 1 or 2 innings of a pitcher who is “pitching well” as compared to an “unknown entity” (of course we know the expected performance of our relievers almost as well as we know the expected performance of the starter). It is like a manager who brings in his closer in a 3-run game in the 9th. He thinks that his team has a much better chance of winning than if he brings in an inferior pitcher. The facts say that he is wrong, but tell that to a manager and see if he agrees with you – he won’t. Of course, it’s not a matter of opinion – it’s a matter of fact.

Do I need to go any further? Do I need to tweak the inputs? Assuming average values for the relevant variables yields a loss of over 5% in win expectancy by allowing DeGrom to hit. What if we knew that DeGrom were going to pitch two more innings rather than an average of 1.5? He saves .07 runs rather than .05 which translates to 1.2% WE rather than .85%, which means that pinch hitting for him increases the Mets’ chances of winning by 4.7% rather than 5.05%. 4.7% is still an enormous advantage. Reducing your team‘s chances of winning by 4.7% by letting DeGrom hit is criminal. It’s like pinch hitting Jeff Mathis for Mike Trout in a high leverage situation – twice!

What about if our estimate of DeGrom’s true talent is too conservative? What if he is as good as Kershaw and Arrieta? That’s 63% of league average run prevention or 2.6 RA9. Third time through the order and it’s 2.9. The difference between that and an average reliever is 1.1 runs per 9, which translates to a 3.1% WE difference in 1.5 innings. So allowing Kershaw to hit in that spot reduces the Mets chances of winning by 2.7%. That’s not wood either.

What if the reliever you replaced DeGrom with was a replacement level pitcher – the worst pitcher in the major leagues? He allows around 113% league average runs, or 4.6 RA9. Difference between DeGrom and him for 1.5 innings? 2.7% for a net loss of 3.1% by letting him hit rather than pinch hitting for him and letting the worst pitcher in baseball pitch the next 1.5 innings? If you told Collins, “Hey genius, if you pinch hit for Degrom and let the worst pitcher in baseball pitch for another inning and a half instead of DeGrom, you will increase your chances of winning by 3.1%,” what do you think he would say?

What if DeGrom were a good hitting pitcher? What if….?

You should be getting the picture. Allowing him to hit is so costly, assuming reasonable and average values for all the pertinent variables, that even if we are missing something in our model, or some of our numbers are a little off – even if assume everything in the best possible light of allowing him to hit – the decision is a no-brainer in favor of a pinch hitter.

If Collins truly wanted to give his team the best chance of winning the game, or in the vernacular of ballplayers, putting his team in the best position to succeed, the clear and unequivocal choice was to lift DeGrom for a pinch hitter. It’s too bad that no one cares because the Mets ultimately won the game, which they were going to do at least 60% of the time anyway, regardless of whether Collins made the right or wrong decision.

The biggest loser, other than the Cubs, is Collins (I don’t mean he is a loser, as in the childish insult), because every time you use results to evaluate a decision and the results are positive, you deprive yourself of the opportunity to learn a valuable lesson. In this case, the analysis could have and should have been done before the game even started. All managers should know the importance of bringing in pinch hitters for pitchers in high leverage situations in important games, no matter how good the pitchers are or how well they are pitching in the game so far. Maybe someday they will.

As an addendum to my article on platoon splits from a few days ago, I want to give you a simple trick for answering a question about a player, such as, “Given that a player performs X in time period T, what is the average performance we can expect in the future (or present, which is essentially the same thing, or at least a subset of it)?” and want to illustrate the folly of using unusual single-season splits for projecting the future.

The trick is to identify as many players as you can in some period of time in the past (the more, the better, but sometimes the era matters so you often want to restrict your data to more recent years) that conform to the player in question in relevant ways, and then see how they do in the future. That always answers your question as best as it can. The certainty of your answer depends upon the sample size of the historical performance of similar players. That is why it is important to use as many players and as many years as possible, without causing problems by going too far back in time.

For example, say you have a player whom you know nothing about other than that he hit .230 in one season of 300 AB. What do you expect that he will hit next year? Easy to answer. There are thousands of players who have done that in the past. You can look at all of them and see what their collective BA was in their next season. That gives you your answer. There are other more mathematically rigorous ways to arrive at the same answer, but much of the time the “historical similar player method” will yield a more accurate answer, especially when you have a large sample to work with, because it captures all the things that your mathematical model may not. It is real life! You can’t do much better than that!

You can of course refine your “similar players” comparative database if you have more information about the player in question. He is left-handed? Use only left-handers in your comparison. He is 25? Use only 25-year olds. What if you have so much information about the player in question that your “comp pool” starts to be too small to have a meaningful sample size (which only means that the certainty of your answer decreases, but not necessarily the accuracy)? Let’s say that he is 25, left-handed, 5’10” and 170 pounds, he hit .273 in 300 AB, and you want to include all of these things in your comparison. That obviously will not apply to too many players in the past. Your sample size of “comps” will be small. In that case, you can use players between the ages of 24 and 26, between 5’9” and 5’11”, weigh between 160 and 180, and hit .265-283 in 200 to 400 AB. It doesn’t have to be those exact numbers, but as long as you are not biasing your sample compared to the player in question, you should arrive at an accurate answer to your question.

What if we do that with a .230 player in 300 AB? I’ll use .220 to .240 and between 200 and 400 AB. We know intuitively that we have to regress the .230 towards the league average around 60 or 65%, which will yield around .245 as our answer. But we can do better using actual players and actual data. Of course our answer depends on the league average BA for our player in question and the league average BA for the historical data. Realistically, we would probably use something like BA+ (BA as compared to league-average batting average) to arrive at our answer. Let’s try it without that. I looked at all players who batted in that range from 2010-2014 in 200-400 AB and recorded their collective BA the next year. If I wanted to be a little more accurate (for this question it is probably not necessary), I might weight the results in year 2 by the AB in year 1, or use the delta method, or something like that.

If I do that for just 5 years, 2010-2015, I get 49 players who hit a collective .230 in year 1 in an average of 302 AB. The next year, they hit a collective .245, around what we would expect. That answers our question, “What would a .230 hitter in 300 AB hit next year, assuming he were allowed to play again (we don’t know from the historical data what players who were not allowed to play would hit)?”

What about .300 in 400 AB? I looked at all players from .280 to .350 in year 1 and between 300 and 450 AB. They hit a collective .299 in year 1 and .270 in year 2. Again, that answers the question, “What do we expect Player A to hit next year if he hit .300 this year in around 400 AB?”

For Siegrest with the -47 reverse split, we can use the same method to answer the question, “What do we expect his platoon split to be in the future given 230 TBF versus lefties in the past?” That is such an unusual split that we might have to tweak the criteria a little and then extrapolate. Remember that asking the question, “What do we expect Player A to do in the future?” is almost exactly the same thing as asking, “What is his true talent with respect to this metric?”

I am going to look at only one season for pitchers with around 200 BF versus lefties even though Siegrest’s 230 TBF versus lefties was over several seasons. It should not make much difference as the key is the number of lefty batters faced. I included all left-handed pitchers with at least 150 TBF versus LHB who had a reverse wOBA platoon difference of more than 10 points and pitched again the next year. Let’s see how they do, collectively, in the next year.

There were 76 of such pitchers from 2003-1014. They had a collective platoon differential of -39 points, less than Siegrest’s -47 points, in an average of 194 TBF versus LHB, also less than Siegrest’s 231. But, we should be in the ballpark with respect to estimating Siegrest’s true splits using this “in vivo” method. How did they do in the next year, which is a good proxy (an unbiased estimate) for their true splits?

In year 2, they had an average TBF versus lefties of 161, a little less than the previous year, which is to be expected, and their collective platoon splits were plus plus 8.1 points. So they went from -39 to plus 8.1 in one season to the next because one season of reverse splits is mostly a fluke as I explained in my previous article on platoon splits. 21 points is around the average for LHB with > 150 TBF v. lefties in this time period, so these pitchers moved 47 points from year 1 to year 2, out of a total of 60 points from year 1 to league average. That is a 78% regression toward the mean, around what we estimated Siegrest’s regression should be (I think it was 82%). That suggests that our mathematical model is good since it creates around the same result as when we used our “real live players” method.

How much would it take to estimate a true reverse split for a lefty? Let’s look at some more numbers. I’ll raise the bar to lefty pitchers with at least a 20 point reverse split. There were only 57 in those 12 years of data. They had a collective split in year 1 of -47, just like Siegrest, in an average of 191 TBF v. LHB. How did they do in year 2, which is the answer to our question of their true split? Plus 6.4 points. That is a 78% regression, the same as before.

What about pitchers with at least a 25 point reverse split? They averaged -51 points in year 1. Can we get them to a true reverse split?  Nope. Not even close.

What if we raise the sample size bar? I’ll do at least 175 TBF and -15 reverse split in year 1. Only 45 lefty pitchers fit this bill and they had a -43 point split in year 1 in 209 TBF v. lefties. Next year? Plus 2.8 points! Close but no cigar. There is of course an error bar around only 45 pitchers with 170 TBF v. lefties in year 2, but we’ll take those numbers on faith since that’s what we got. That is a 72% regression with 208 TBF v. lefties, which is about what we would expect given that we have a slightly larger sample size than before.

So please, please, please, when you see or hear of a pitcher with severe reverse splits in 200 or so BF versus lefties, which is around a full year for a starting pitcher or 2 or 3 years for a reliever, remember that our best estimate of their true platoon splits, or what his manager should expect when he sends him out there, is very, very different from what those actual one or three year splits suggest when those actual splits are very far away from the norm. Most of that unusual split, in either direction – almost all of it in fact – is likely a fluke. When we say “likely” we also mean that we must assume that it is a fluke and that we must also assume that the true number is the weighted mean of all the possibilities, which are those year 2 numbers, or year 1 (or multiple years) heavily regressed toward the league average.

 

With all the hullaballoo about Utley’s slide last night and the umpires’ calls or non-calls, including the one or ones in NY (whose names, addresses, telephone numbers, and social security numbers should be posted on the internet, according to Pedro Martinez), what was lost – or at least there was much confusion – was a discussion of the specific rule(s) that applies to that exact situation – the take-out slide that is, not whether Utley was safe or not on replay. For that you need to download the 2015 complete rule book, I guess. If you Google certain rule numbers, it takes you to the MLB “official rules” portion of their website in which at least some of the rule numbers appear to be completely different than in the actual current rule book.

In any case, last night after a flurry of tweets, Rob Neyer, from Fox Sports, pointed out the clearly applicable rule (although other rules come close): It is 5.09 (a) (13) in the PDF version of the current rulebook. It reads, in full:

The batter is out when… “A preceding runner shall, in the umpire’s judgment, intentionally interfere with a fielder who is attempting to catch a thrown ball or to throw a ball in an attempt to complete any play;”

That rule is unambiguous and crystal clear. 1) Umpire, in his judgment, determines that runner intentionally interferes with the pivot man. 2) The batter must be called out.

By the way, the runner himself may or may not be out. This rule does not address that. There is a somewhat common misperception that the umpire calls both players out according to this rule. Another rule might require the umpire to call the runner also out on interference even if he arrived before the ball/fielder or the fielder missed the bag – but that’s another story.

Keep in mind that if you ask the umpire, “Excuse me, Mr. umpire, but in your judgment, did you think that the runner intentionally interfered with the fielder,” and his answer is, “Yes,” then he must call the batter out. There is no more judgment. The only judgment allowed in this rule is whether the runner intentionally interfered or not. If the rule had said, “The runner may be called out,” then there would be two levels of judgment, presumably. There are other rules which explicitly say the umpire may do certain things, in which case there is presumably some judgement that goes into whether he decides to do them or not. Sometimes those rules provide guidelines for that judgment (the may part) and sometimes they do not. Anyway, this rule does not provide that may judgment. If umpire thinks is it intentional interference, the batter (not runner) is automatically out.

So clearly the umpire should have called the batter out on that play, unless he could say with a straight face, “In my judgment, I don’t think that Utley intentionally interfered with the fielder.” That is not a reasonable judgment of course. Not that there is much recourse for poor or even terrible judgment. Judgment calls are not reviewable, I don’t think. Perhaps umpires can get together and overturn a poor judgment call. I don’t know.

But that’s not the end of the story. There is a comment to this rule which reads:

“Rule 5.09(a)(13) Comment (Rule 6.05(m) Comment): The objective of this rule is to penalize the offensive team for deliberate, unwarranted, unsportsmanlike action by the runner in leaving the baseline for the obvious purpose of crashing the pivot man on a double play, rather than trying to reach the base. Obviously this is an umpire’s judgment play.”

Now that throws a monkey wrench into this situation. Apparently this is where the (I always thought it was an unwritten rule), “Runner must be so far away from the base that he cannot touch it in order for the ‘automatic double play’ to be called” rule came from. Only it’s not a rule. It is a comment which clearly adds a wrinkle to the rule.

The rule is unambiguous. If the runner interferes with the fielder trying to make the play (whether he would have completed the DP or not), then the batter is out. There is no mention of where the runner has to be or not be. The comment changes the rule. It adds another requirement (and another level of judgment). The runner must have been “outside the baseline” in the umpire’s judgment. In addition, it adds some vague requirements about the action of the runner. The original rule says only that the runner must “intentionally interfere” with the fielder. The comment adds words that require the runner’s actions to be more egregious – deliberate, unwarranted, and unsportsmanlike.

But the comment doesn’t really require that to be the case for the umpire to call the batter out. I don’t think. It says, “The objective of this rule is to penalize the offensive team….” I guess if the comment is meant to clarify the rule, MLB really doesn’t want the umpire to call the batter out unless the requirements in the comment are met (runner out of the baseline and his action was not only intentional but deliberate, unwarranted, and unsportsmanlike, a higher bar than just intentional).

Of course the rule doesn’t need clarification. It is crystal clear. If MLB wanted to make sure that the runner is outside of the baseline and acts more egregiously than just intentionally, then they should change the rule, right? Especially if comments are not binding, which I presume they are not.

Also, the comment starts off with: “The objective of this rule is to…”

Does that mean that this rule is only to be applied in double play situations? What if a fielder at second base fields a ball, starts to throw to first base to retire the batter, and the runner tackles him or steps in front of the ball? Is rule 5.09(a)(13) meant to apply? The comment says that the objective of the rule is to penalize the offensive team for trying to break up the double play. In this hypothetical, there is no double play being attempted. There has to be some rule that applies to this situation? If there isn’t, then MLB should not have written in the comment, “The objective of this rule….”

There is another rule which also appears to clearly apply to a take-out slide at second base, like Utley’s, with no added comments requiring that the runner be out of the baseline, or that his actions be unwarranted and unsportsmanlike. It is 6.01(6). Or 7.09(e) on the MLB web site. In fact, I tweeted this rule last night thinking that it addressed the Utley play 100% and that the runner and the batter should have been called out.

“If, in the judgment of the umpire, a base runner willfully and deliberately interferes with a batted ball or a fielder in the act of fielding a batted ball with the obvious intent to break up a double play, the ball is dead. The umpire shall call the runner out for interference and also call out the batter-runner because of the action of his teammate.”

The only problem there are the words, “interferes with a batted ball or a fielder in the act of fielding a batted ball.” A lawyer would say that the plain meaning of the words precludes this from applying to an attempt to interfere with a middle infielder tagging second base and throwing to first, because he is not fielding or attempting to field a batted ball and the runner is not interfering with a batted ball. The runner, in this case, is interfering with a thrown ball or a fielder attempting to tag second and then make a throw to first.

So if this rule is not meant to apply to a take-out slide at second, what is it meant to apply to? That would leave only one thing really. A ground ball is hit in the vicinity of the runner and he interferes with the ball or a fielder trying to field the ball. But there also must be, “an obvious intent to break up a double play.” That is curious wording. Would a reasonable person consider that an attempt to break up a double play? Perhaps ”obvious intent to prevent a double play.” Using the words break up sure sounds like this rule is meant to apply to a runner trying to take out the pivot man on a potential double play. But then why write “fielding a batted ball” rather than “making a play or a throw?”

A good lawyer working for the Mets would try and make the case that “fielding a batted ball” includes everything that happens after someone actually “fields the batted ball,” including catching and throwing it. In order to do so, he would probably need to find that kind of definition somewhere else in the rule book. It is a stretch, but it is not unreasonable, I don’t think.

Finally, Eric Byrnes on MLB Tonight, had one of the more intelligent and reasonable comments regarding this play that I have ever heard from an ex-player. He said, and I paraphrase:

“Of course it was a dirty slide. But all players are taught to do whatever it takes to break up the DP, especially in a post-season game. Until umpires start calling an automatic double play on slides like that, aggressive players like Utley will continue to do that. I think we’ll see a change soon.”

P.S. For the record, since there was judgment involved, and judgment is supposed to represent fairness and common sense, I think that Utley should not have been ruled safe at second on appeal.

Postscript:

Perhaps comments are binding. From the forward to the rules, on the MLB web site:

The Playing Rules Committee, at its December 1977 meeting, voted to incorporate the Notes/Case Book/Comments section directly into the Official Baseball Rules at the appropriate places. Basically, the Case Book interprets or elaborates on the basic rules and in essence have the same effect as rules when applied to particular sections for which they are intended.