Beware, everybody: things are about to get super nerdy here.
I thought I’d stash this in the Off-Topic area, so it doesn’t get swallowed by a flood of angry threads about the team. I figure this is a topic that only a few of us will be interested in.
Some of you may remember that in the offseason I did a little bit of spreadsheet work on xBABIP (expected batting average on balls in play) for hitters. By the way, I have a new controversy-free formula there, but that’s another story. Just recently, I was inspired to look at pitchers, regarding whether they have any influence on their BABIP, contrary to what Voros McCracken might say. I’m not fully apprised of the current state of research in this area, so I would appreciate any input, though.
So in a nutshell, what I’ve found is that, yes, there are some factors that pitchers influence that significantly affect their BABIP. Much like with batters, the two main factors are line drive percentage and the frequency of infield popups. This makes perfect sense to me – an infield popup is an automatic out, basically, and a line drive is going to have much greater odds of not being caught. Pitchers seem to have quite a bit more control over how many popups they get than how many line drives they allow, much like batters, but they definitely influence both significantly.
A very simple formula that does a pretty good job of estimating a pitcher’s BABIP against is:
0.4*LD% - 0.6*FB%*IFFB% + 0.237 = xBABIP
You can find those stats at Fangraphs, by the way. LD% = Line Drive percentageFB% = Fly Ball percentageIFFB% = Infield Fly Ball percentage (which is the percentage of fly balls that are hit to the infield; when multiplied by FB%, it gives the total percentage of balls hit that are infield popups)
Not surprisingly, this formula works better for a pitcher’s career BABIP, based on career numbers: 0.637 correlation, with an RMSE of 0.00968 (the average difference between the estimated and actual BABIP). That’s for qualified pitchers from 2002-2011, and it’s pretty dang good, if you ask me, considering there are basically only 2 factors, and this was supposed to be something pretty random.
For single seasons of qualified pitchers over the same span, it’s a correlation of 0.441 and an RMSE of 0.01581.
Of course, these are just two of the most important defense-independent factors. The impact of defense is undoubtedly pretty important, and the park has some influence as well.
Refresher on correlations: 1 means a perfect correlation (when one factor goes up, the other follows in a linear fashion), negative one means a perfect negative correlation (when one goes up, the other goes down), and 0 means no correlation (no apparent connection between the factors). And just because two factors are correlated doesn’t mean one causes the other. I do sort of imply some causal relationships below, when they make sense, though.
You may be wondering what is correlated with a pitcher’s LD% and with how many popups they induce... here you go:
Strongest Correlations to LD%:KN%: -0.658 (percentage of knuckleballs thrown; more knucklers equals fewer line drives allowed)O-Swing%: -0.483 (getting batters to chase pitches outside the zone leads to fewer liners hit)XX%: 0.3465 (percentage of mystery pitches thrown... these probably are typically sliders that don’t slide much, curves that don’t curve, etc.; throwing more iffy pitches leads to more liners allowed) Zone%: 0.3455 (pitchers who throw in the zone more get hit harder... the price they pay for allowing fewer walks)GB%: -0.273 (groundball pitchers allow fewer liners; the same can’t be said about fly ball pitchers)
Now, some of the strongest correlations to infield popups:GB%: -0.871 (percentage of grounders... not a shocker that ground ball pitchers get a lot fewer popups...)Z-Contact%: -0.529 (how often hitters make contact with pitches thrown in the zone... less contact equals more popups)SwStr%: 0.356 (percent of pitches swung on and missed; more misses equals more popups)HR/FB: -.302 (home runs per fly ball hit; fewer homers are tied to more popups... hitters aren’t squaring the ball as well, or swinging with as much authority)Swing%: 0.293 (when hitters are swinging more frequently, they’re popping it up more... overconfidence, swinging defensively, or what?)Z-Swing%: 0.277 (specifically, swinging at pitches in the zone is what’s most connected; unlike with liners, O-Swing% has pretty much no correlation)KN%: 0.208 (knuckleballs are good)Zone%: 0.192 (pitching in the zone more leads to more popups)
There’s very little connection between LD% and popups, but somewhat surprisingly, pitchers who give up more liners also get more popups (only a 0.080 correlation, though). That probably has something to do with the popup pitchers being the more aggressive type, while the line drive preventers are more the nibbling type. When I came up with the formula, by the way, it was right before Weaver had his awful game. His BABIP was standing at 0.225, whereas his xBABIP per my formula was 0.294. This year, he had been giving up more liners and getting fewer popups than usual, so that 0.225 looked very fluky (though having Trout and/or Bourjos in the OF helps him a lot, for sure). It’s now up to .233, so more correction could be on the way. Weaver, however, in his career has a 0.276 BABIP (one of the best), and it’s no fluke, because his xBABIP according to the formula is 0.271. The average for qualified pitchers was 0.292, by the way.
So, any questions, comments or criticisms?
You'd be surprised how quickly I did that on Excel -- just downloaded all the data from Fangraphs and ran some simple calculations on it.
Anyway, in a nutshell, what this is is a way to tell how lucky or unlucky a pitcher is in terms of how many balls are dropping in for hits. The modern thought on it, basically, is that pitchers have next to no control over how many balls put into play get caught. I was always a bit skeptical about this, so I tried to see if I could figure out any patterns that might have been missed.
Looks like probably not many people are finding this down in OT, huh?
Let me know if there's anything you'd like me to explain further.
The formula for BABIP is:
(H - HR) / (AB - K - HR + SF)
It looks at the results of any AB where the ball is put in play, other than HRs (because the assumption is that the fielders couldn't have caught a HR, despite what you might have seen Mike Trout do).
So Weaver, in this game, gave up 7 hits, 0 homers, faced 28 at bats, had 5 strikeouts, and allowed no sac flies.
The BABIP against him was: (7 - 0) / (28 - 5 - 0 + 0) = 0.304
A good game, but his BABIP is climbing closer to where it ought to be.
Interesting but how does one determine a hard hit groundball ( more likely to travel through the infield for a hit) versus a softly hit groundball ( more likely to make an out). Same goes with the line drive. Looping line drive or a rocket off the bat. Fly ball ( can of corn to the outfield or deeply hit carrying the fielder to the warning track or further).
All these factor in on how well a pitcher is really looking out there.
Thanks, exile. I probably will submit it to fangraphs, but I think it could use some refinement first.
If you (or anybody reading) can think of any way to make it better -- e.g. what to emphasize or de-emphasize -- that would be much appreciated.
Not all liners and grounders are created equal, true. I bet pitchers who give up more liners are getting hit harder in general, though, so are probably also giving up harder grounders as well.
Yeah, it would be nice to have something like "average velocity off the bat" listed for pitchers for that reason. Like exile says, the info is out there, but if there are any sites that display it, I'm not aware. For the time being, though, the HR/FB (home runs per fly ball) is a bit of an indicator of how hard the pitchers are being hit. It's not very consistent, though... it could be more influenced by bad luck.
Meanwhile, I think it's very safe to say that pitchers who get a lot of popups are also getting more weak flyouts, even though that's not directly accounted for in the stats... I mean, the difference in bat position between a popup and a flyout is so small, I don't see how that connection couldn't exist.
Yeah, that's relevant, but tricky to incorporate, for me anyway. The data I got doesn't show how many games a pitcher pitched on the different types of field, so it would be a ton of work to figure that out.
I'm sure it affects a groundball pitcher differently than a flyball pitcher -- I think a groundball pitcher on AstroTurf gives up more hits, but a flyball pitcher doesn't see much of that. Maybe a flyball pitcher sees more singles turning into doubles or triples, though, when balls quickly roll or bounce past the outfielders. But I don't think that modern AstroTurf is anywhere near as far off from grass as it used to be.
Some domes have a color that makes it hard for fielders to spot balls against them... a fly ball pitcher probably suffers there. But a fly ball pitcher loves a place like Oakland, where there's tons of foul territory to catch popups, and it's not easy to hit homers. Ours is a pretty good park for them too.
There are a ton of factors at play here, but I'm trying to keep it as manageable as I can.