Archive for March, 2014

Yesterday, I posted an article describing how I modeled to some extent a way to tell whether and by how much pitchers may be able to pitch in such a way as to allow fewer or more runs than their components, including the more subtle ones, like balks, SB/CS, WP, catcher PB, GIDP, and ROE suggest.

For various reasons, I suggest taking these numbers with a grain of salt. For one thing, I need to tweak my RA9 simulator to take into consideration a few more of these subtle components. For another, there may be some things that stick with a pitcher from year to year that have nothing to do with his “RA9 skill” but which serve to increase or decrease run scoring, given the same set of components. Two of these are a pitcher’s outfielder arms and the vagueries of his home park, which both have an effect on base runner advances on hits and outs. Using a pitcher’s actual sac flies against will mitigate this, but the sim is also using league averages for base runner advances on hits, which, as I said, can vary from pitchers to pitcher, and tend to persist from year to year (if a pitcher stays on the same team) based on his outfielders and his home park. Like DIPS, it would be better to do these correlations only on pitchers who switch teams, but I fear that the sample would be too small to get any meaningful results.

Anyway, I have a database now of the last 10 years’ differences between a pitcher’s RA9 and his sim RA9 (the runs per 27 outs generated by my sim), for all pitchers who threw to at least 100 batters in a season.

First here are some interesting categorical observations:

Jared Cross, of Steamer projections, suggested to me that perhaps some pitchers, like lefties, might hold base runners on first base better than others, and therefore depress scoring a little as compared to the sim, which uses league-average base running advancement numbers. Well, lefties actually did a hair worse in my database. Their RA9 was .02 greater than their sim RA. Righties were -.01 better. That does not necessarily mean that RHP have some kind of RA skill that LHP do not have. It is more likely a bias in the sim that I am not correcting for.

How about number of pitches in a pitcher’s repertoire. I hypothesized that pitchers with more pitches would be better able to tailor their approach to the situation. For example, with a base open, you want your pitcher to be able to throw lots of good off-speed pitches in order to induce a strikeout or weak contact, whereas you don’t mind if he walks the batter.

I was wrong. Pitchers with 3 or more pitches that they throw at least 10% of the time are .01 runs worse in RA9. Pitchers with only 2 or fewer pitches, are .02 runs better. I have no idea why that is.

How about pitchers who are just flat out good in their components such that their sim RA is low, like under 4.00 runs? Their RA9 is .04 worse. Again, their might be some bias in the sim which is causing that. Or perhaps if you just go out and there “air it out” and try and get as many outs and strikeouts as possible, regardless of the situation, you are not pitching optimally.

Conversely, pitchers with a sim RA of 4.5 or greater shave .03 points off their RA9. If you are over 5 in your sim RA, your actual RA9 is .07 points better and if you are below 3.5, your RA9 is .07 runs higher. So, there probably is something about having extreme components that even the sim is not picking up. I’m not sure what that could be. Or, perhaps if you are simply not that good of a pitcher, you have to find ways to minimize run scoring above and beyond the hits and walks you allow overall.

For the NL pitchers, their RA9 is .05 runs better than their sim RA, and for the AL, they are .05 runs worse. So the sim is not doing a good job with respect to the leagues, likely because of pitchers batting. I’m not sure why, but I need to fix that. For now, I’ll adjust a pitcher’s sim RA according to his league.

You might think that younger pitchers would be “throwers” and older ones would be “pitchers” and thus their RA skill would reflect that. This time you would be right – to some extent.

Pitchers less than 26 years old were .01 runs worse in RA9. Pitchers older than 30 were .03 better. But that might just reflect the fact that pitchers older than 30 are just not very good – remember, we have a bias in terms of quality of the sim RA and the difference between that and regular RA9.

Actually, even when I control for the quality of the pitcher, the older pitchers had more RA skill than the younger ones by around .02 to .04 runs. As you can see, none of these effects, even if they are other than noise, is very large.

Finally, here are the lists of the 10 best and worst pitchers with respect to “RA skill,” with no commentary. I adjusted for the “quality of the sim RA” bias, as well as the league bias. Again, take these with a large grain of salt, considering the discussion above.

Best, 2004-2013:

Sean Chacon -.18

Steve Trachsel -.18

Francisco Rodriguez -.18

Jose Mijares -.17

Scott Linebrink -.16

Roy Oswalt -.16

Dennys Reyes -.15

Dave Riske -.15

Ian Snell -.15

5 others tied for 10th.

Worst:

Derek Lowe .27

Luke Hochevar .20

Randy Johnson .19

Jeremy Bonderman .18

Blaine Boyer .18

Rich Hill .18

Jason Johnson .18

5 others tied for 8th place.

(None of these pitchers stand out to me one way or another. The “good” ones are not any you would expect, I don’t think.)

Advertisements

We showed in The Book that there is a small but palpable “pitching from the stretch” talent. That of course would effect a pitcher’s RA as compared to some kind of base runner and “timing” neutral measure like FIP or component ERA, or really any of the ERA estimators.

As well, a pitcher’s ability to tailor his approach to the situation, runners, outs, score, batter, etc., would also implicate some kind of “RA talent,” again, as compared to a “timing” neutral RA estimator.

A few months ago I looked to see if RE24 results for pitchers showed any kind of talent for pitching to the situation, by comparing that to the results of a straight linear weights analysis or even a BaseRuns measure. I found no year-to-year correlations for the difference between RE24 and regular linear weights. In other words, I was trying to see if some pitchers were able to change their approach to benefit them in certain bases/outs situations more than other pitchers. I was surprised that there was no discernible correlation, i.e., that it didn’t seem to be much of a skill if at all. You would think that some pitchers would either be smarter than others or have a certain skill set that would enable them, for example, to get more K with a runner on 3rd and less than 2 outs, more walks and fewer hits with a base open, or fewer home runs with runners on base or with 2 outs and no one on base. Obviously all pitchers, on the average, vary their approach a lot with respect to these things, but I found nothing much when doing these correlations. Essentially an “r” of zero.

To some extent the pitching from the stretch talent should show up in comparing RE24 to regular lwts, but it didn’t, so again, I was a little surprised at the results.

Anyway, I decided to try one more thing.

I used my “pitching sim” to compute a component ERA for each pitcher. I tried to include everything that would create or not create runs while he was pitching, like WP/PB, SB/CS, GIDP, roe, in addition to s,d,t,hr,bb, and so. I considered an IBB as a 1/2 BB in the sim, since I didn’t program IBB into it.

So now, for each year, I recorded the difference between a pitcher’s RA9 and his simulated component RA9, and then ran year-to-year correlations. This was again to see if I could find a “RA talent” wherever it may lie – clutch pitching, stretch talent, approach talent, etc.

I got a small year-to-year correlation which, as always, varied with the underlying sample size – TBF in each of the paired years. When I limited it to pitchers with at least 500 TBF in each year, I got an “r” of .142 with an average PA of 791 in each year. That comes out to a 50% regression at around 5000 PA, or 5 years for a full-time starter, similar to BABIP for pitchers. In other words, the “stabilization” point was around 5,000 TBF.

If that .142 is accurate (at 2 sigma the confidence interval is .072 to .211), I think that is pretty interesting. For example, notable “ERA whiz” Tom Glavine from 2001 to 2006, was an average of .246 in RA9 better than his sim RA9 (simulated component RA). If we regress that difference 50%, we get .133 runs per game, which is pretty sizable I think. That is over 1/3 of a win per season. Notable “ERA hack” Ricky Nolasco from 2008 to 2010 (I only looked at 2001-2010) was an average of .357 worse in his ERA. Regress that 62.5%, and we get .134 runs worse per season, also 1/3 of a win.

So, for example, if you want to know how to reconcile fWAR (FG) and bWAR (B-R) for pitchers, take the difference and regress according to the number of TBF, using the formula 5000/(5000+TBF) for the amount of regression.

Here are a couple more interesting ones, off the top of my head. I thought that Livan Hernandez seemed like a crafty pitcher, despite having inferior stuff late in his career. Sure enough, he out-pitched his components by around .164 runs per game over 9 seasons. After regressing, that’s .105 rpg.

The other name that popped into my head was Wakefield. I always wondered if a knuckler was able to pitch to the situation as well as other pitchers could. It does not seem like they can, with only one pitch with comparatively little control. His RA9 was .143 worse than his components suggest, despite his FIP being .3 runs per 9 worse than his ERA! After regressing, he is around .095 worse than his simulated component RA.

Of course, after looking at Wake, we have to check Dickey as well. He didn’t start throwing a knuckle ball until 2005, and then only half the time. His average difference between RA9 and simulated RA9 is .03 on the good side, but our sample size for him is small with a total of only 1600 TBF, implying a regression of 76%.

If you want the numbers on any of your favorite or no-so-favorite pitchers, let me know in the comments section.

If anyone is out there (hello? helloooo?), as promised, here are the AL team expected winning percentages and their actual winning percentages, conglomerated over the last 5 years. In case you were waiting with bated breath, as I have been.

Combined results for all five years (AL 2009-2013), in order of the “best” teams to the “worst:”

Team

My WP

Vegas WP

Actual WP

Diff

My Starters

Actual Starters

My Batting

Actual Batting

NYA

.546

.566

.585

.039

98

99

.30

.45

TEX

.538

.546

.558

.020

102

95

.14

.24

OAK

.498

.490

.517

.019

104

101

-.08

.07

LAA

.508

.526

.522

.014

103

106

.07

.17

TBA

.556

.544

.562

.006

100

102

.24

.17

BAL

.460

.452

.463

.003

110

115

-.03

-.27

DET

.548

.547

.550

.002

97

91

.21

.31

BOS

.546

.596

.546

.000

99

98

.26

.36

CHW

.489

.450

.488

-.001

99

97

-.16

-.29

TOR

.479

.482

.478

-.001

106

107

-.05

.12

MIN

.468

.469

.464

-.004

108

109

-.07

-.07

SEA

.462

.464

.446

-.016

106

106

-.26

-.36

KCR

.474

.460

.444

-.030

108

106

-.22

-.28

CLE

.492

.469

.462

-.030

108

109

.13

.01

HOU

.420

.420

.386

-.034

106

109

-.46

-.61

I find this chart quite interesting. As with the NL, it looks to me like the top over-performing teams are managed by stable high-profile, peer and player respected guys – Torre, Washington, Maddon, Scioscia, Leyland, Showalter.

Also, as with the NL teams, much of the differences between my model and the actual results are due to over-regression on my part, especially on offense. Keep in mind that I do include defense and base running in my model, so there may be some similar biases there.

Even after accounting for too much regression, some of the teams completely surprised me with respect to my model. Look at Oakland’s batting. I had them projected as a minus -.08 run per game team and somehow they managed to produce .07 rpg. That’s a huge miss over many players and many years. There has to be something going on there. Perhaps they know a lot more about their young hitters than we (I) do. That extra offense alone accounts for 16 points in WP, almost all of their 19 point over-performance. Even the A’s pitching outdid my projections.

Say what you will about the Yankees, but even though my undershooting their offense cost my model 16 points in WP, they still over-performed by a whopping 39 points, or 6.3 wins per season! I’m sure Rivera had a little to do with that even though my model includes him as closer. Then there’s the Yankee Mystique!

Again, even accounting for my too-aggressive regression, I completely missed the mark with the TOR, CLE, and BAL offense. Amazingly, while the Orioles pitched 5 points in FIP- worse than I projected and .24 runs per game worse on offense, they somehow managed to equal my projection.

Other notable anomalies are the Rangers’ and Tigers’ pitching. Those two starting staffs outdid me by seven and six points in FIP-, respectively, which is around 1/4 run in ERA – 18 points in WP. Texas did indeed win games at a 20 point clip better than I expected, but the Tigers, despite out-pitching my projections by 18 points in WP, AND outhitting me by another 11 points in WP, somehow managed to only win .3 games per season more than I expected. Must be that Leyland (anti-) magic!

Ok, enough of the bad Posnanski and Woody Allen rants and back to some interesting baseball analysis – sort of. I’m not exactly sure what to make of this, but I think you might find it interesting, especially if you are a fan of a particular team, which I’m pretty sure most of you are.

I went back five years and compared every team’s performance in each and every game to what would be expected based on their lineup that day, their starting pitcher, an estimate of their reliever and pinch hitter usage for that game, as well as the same for their opponent. Basically, I created a win/loss model for every game over the last five years. I didn’t simulate the game as I have done in the past. Instead, I used a theoretical model to estimate mean runs scored for each team, given a real-time projection for all of the relevant players, as well as the run-scoring environment, based on the year, league, and ambient conditions, like the weather and park (among other things).

When I say “real-time” projections, they are actually not up-to-the game projections. They are running projections for the year, updated once per month. So, for the first month of every season, I am using pre-season projections, then for the second month, I am using pre-season projections updated to include the first month’s performance, etc.

For a “sanity check” I am also keeping track of a consensus expectation for each game, as reflected by the Las Vegas line, the closing line at Pinnacle Sports Book, one of the largest and most respected online sports books in the internet betosphere.

The results I will present are the combined numbers for all five years, 2009 to 2013. Basically, you will see something like, “The Royals had an expected 5-year winning% of .487 and this is how they actually performed – .457.” I will present two expected WP actually – one from my models and one from the Vegas line. They should be very similar. What is interesting of course is the amount that the actual WP varies from the expected WP for each team. You can make of those variations what you want. They could be due to random chance, bad expectations for whatever reasons, or poor execution by the teams for whatever reasons.

Keep in mind that the composite expectations for the entire 5-year period are based on the expectation of each and every game. And because those expectation are updated every 6 months by my model and presumably every day by the Vegas model, they reflect the changing expected talent of the team as the season progresses. By that, I mean this: Rather than using a pre-season projection for every player and then applying that to the personnel used or presumed used (in the case of the relievers and pinch hitters) in every game that season, after the first 30 games, for example, those projections are updated and thus reflect to some extent, actual performance that season. For example, last year, pre-season, Roy Halladay might have been expected to have a 3.20 ERA or something like that. After he pitched horribly for a few weeks or months, and it was well-known that he was injured, his expected performance presumably changed in my model as well as in the Vegas model. Again, the Vegas model likely changes every day, whereas my model can only change after each month, or 5 times per season.

Here are the combined results for all five years (NL 2009-2013):

Team

My Model

Vegas

Actual

My Exp. Starting Pitching (RA9-)

Actual Starting Pitching (FIP-)

My Exp. Batting (marginal rpg)

Actual Batting (marginal rpg)

ARI

.496

.495

.486

103

103

0

-.08

ATL

.530

.545

.564

100

97

.25

.21

CHC

.488

.478

.446

103

102

-.09

-17

CIN

.522

.517

.536

104

108

.01

.12

COL

.494

.500

.486

102

96

-.04

-.09

MIA

.493

.472

.453

102

102

.01

-.05

LAD

.524

.526

.542

96

99

.02

-.03

MLW

.519

.509

.504

105

108

.13

.30

NYM

.474

.470

.464

106

108

-.02

.01

PHI

.516

.546

.554

96

98

-.01

.07

PIT

.461

.454

.450

109

111

-.19

-.28

SDP

.469

.463

.483

110

115

-.12

-.26

STL

.532

.554

.558

100

98

.23

.40

SFG

.506

.518

.515

98

102

-.19

-.30

WAS

.497

.484

.486

103

103

.01

.07

If you are an American league fan, you’ll have to wait until Part II. This is a lot of work, guys!

By the way, if you think that the Vegas line is remarkably good, and much better than mine, it is at least partly an illusion. They get to “cheat,” and to some extent they do. I can do the same thing, but I don’t. I am not looking at the expected WP and result of each game and then doing some kind of RMS error to test the accuracy of my model and the Vegas “model” on a game-by-game basis. I am comparing the composite results of each model to the composite W/L results of each team, for the entire 5 years. That probably makes little sense, so here is an example which should explain what I mean by the oddsmakers being able to “cheat,” thus making their composite odds close to the actual odds for the entire 5-year period.

Let’s say that before the season starts Vegas thinks that the Nationals are a .430 team. And let’s say that after 3 months, they were a .550 team. Now, Vegas by all rights should have them as something like a .470 team for the rest of the season – numbers for illustration purposes only – and my model should too, assuming that I started off with .430 as well. And let’s say that the updated expected WP of .470 were perfect and that they went .470 for the second half. Vegas and I would have a composite expected WP of .450 for the season, .430 for the first half and .470 for the second half. The Nationals record would be .510 for the season, and both of our models would look pretty bad.

However, Vegas, to some extent uses a team’s W/L record to-date to set the lines, since that’s what the public does and since Vegas assumes that a team’s W/L record, even over a relatively short period of time, is somewhat indicative of their true talent, which it is of course. After the Nats go .550 for the first half, Vegas can set the second-half odds as .500 rather than .470, even if they think that .470 is truly the best estimate of their performance going forward.

One they do that, their composite expected WP for the season will be (.430 + .500) / 2, or .465, rather than my .450. And even if the .470 were correct, and the Nationals go .470 for the second half, whose composite model is going to look better at the end of the season? Theirs will of course.

If Vegas wanted to look even better for the season, they can set the second half lines to .550, on the average. Even if that is completely wrong, and the team goes .470 over the second half, Vegas will look even better at the end of the season! They will be .490 for the season, I will be .450, and the Nats will have a final W/L percentage of .490! Vegas will look perfect and I will look bad, even though we had the same “wrong” expectation for the first half of the season, and I was right on the money for the second half and they were completely and deliberately wrong. Quite the paradox, huh? So take those Vegas lines with a grain of salt as you compare them to my model and to the final composite records of the teams. I’m not saying that my model is necessarily better than the Vegas model, only that in order to fairly compare them, you would have to take them one game at a time, or always look at each team’s prospective results compared to the Vegas line or my model.

Here is the same table as above, ordered by the difference between my expected w/l percentage and each team’s actual w/l percentage. The firth column is that difference. Call those differences whatever you want – luck, team “efficiency,” good or bad managing, player development, team chemistry, etc. I hope you find these numbers as interesting as I do!

Combined results for all five years (NL 2009-2013), in order of the “best” teams to the “worst:”

Team

My Model

Vegas

Actual

Difference

My Exp. Starting Pitching (RA9-)

Actual Starting Pitching (FIP-)

My Exp. Batting (marginal rpg)

Actual Batting (marginal rpg)

PHI

.516

.546

.554

.038

96

98

-.01

.07

ATL

.530

.545

.564

.034

100

97

.25

.21

STL

.532

.554

.558

.026

100

98

.23

.40

LAD

.524

.526

.542

.018

96

99

.02

-.03

SDP

.469

.463

.483

.014

110

115

-.12

-.26

CIN

.522

.517

.536

.014

104

108

.01

.12

SFG

.506

.518

.515

.009

98

102

-.19

-.30

COL

.494

.500

.486

-.008

102

96

-.04

-.09

NYM

.474

.470

.464

-.010

106

108

-.02

.01

PIT

.461

.454

.450

-.010

109

111

-.19

-.28

ARI

.496

.495

.486

-.010

103

103

0

-.08

WAS

.497

.484

.486

-.011

103

103

.01

.07

MLW

.519

.509

.504

-.015

105

108

.13

.30

MIA

.493

.472

.453

-.040

102

102

.01

-.05

CHC

.488

.478

.446

-.042

103

102

-.09

-.17

As you can see from either chart, it appears as if my model over-regresses both batting and starting pitching, especially the former.

Also, a quick and random observation from the above chart – it may mean absolutely nothing. It seems as though those top teams, most of them at least, have had notable, long-term, “players’ managers,” like Manuel, LaRussa, Mattingly, Torre, Black, Bochy, and Baker, while you might not be able to even recall or name most of the managers of the teams at the bottom. It will be interesting to see if the American League teams evince a similar pattern.