Beware, everybody: things are about to get super nerdy here.

I thought I’d stash this in the Off-Topic area, so it doesn’t get swallowed by a flood of angry threads about the team. I figure this is a topic that only a few of us will be interested in.

Some of you may remember that in the offseason I did a little bit of spreadsheet work on xBABIP (expected batting average on balls in play) for hitters. By the way, I have a new controversy-free formula there, but that’s another story. Just recently, I was inspired to look at pitchers, regarding whether they have any influence on their BABIP, contrary to what Voros McCracken might say. I’m not fully apprised of the current state of research in this area, so I would appreciate any input, though.

So in a nutshell, what I’ve found is that, yes, there are some factors that pitchers influence that significantly affect their BABIP. Much like with batters, the two main factors are line drive percentage and the frequency of infield popups. This makes perfect sense to me – an infield popup is an automatic out, basically, and a line drive is going to have much greater odds of not being caught. Pitchers seem to have quite a bit more control over how many popups they get than how many line drives they allow, much like batters, but they definitely influence both significantly.

A very simple formula that does a pretty good job of estimating a pitcher’s BABIP against is:

0.4*LD% - 0.6*FB%*IFFB% + 0.237 = xBABIP

You can find those stats at Fangraphs, by the way.

LD% = Line Drive percentage

FB% = Fly Ball percentage

IFFB% = Infield Fly Ball percentage (which is the percentage of fly balls that are hit to the infield; when multiplied by FB%, it gives the total percentage of balls hit that are infield popups)

Not surprisingly, this formula works better for a pitcher’s career BABIP, based on career numbers: 0.637 correlation, with an RMSE of 0.00968 (the average difference between the estimated and actual BABIP). That’s for qualified pitchers from 2002-2011, and it’s pretty dang good, if you ask me, considering there are basically only 2 factors, and this was supposed to be something pretty random.

For single seasons of qualified pitchers over the same span, it’s a correlation of 0.441 and an RMSE of 0.01581.

Of course, these are just two of the most important defense-independent factors. The impact of defense is undoubtedly pretty important, and the park has some influence as well.

Refresher on correlations: 1 means a perfect correlation (when one factor goes up, the other follows in a linear fashion), negative one means a perfect negative correlation (when one goes up, the other goes down), and 0 means no correlation (no apparent connection between the factors). And just because two factors are correlated doesn’t mean one causes the other. I do sort of imply some causal relationships below, when they make sense, though.

You may be wondering what is correlated with a pitcher’s LD% and with how many popups they induce... here you go:

Strongest Correlations to LD%:

KN%: -0.658 (percentage of knuckleballs thrown; more knucklers equals fewer line drives allowed)

O-Swing%: -0.483 (getting batters to chase pitches outside the zone leads to fewer liners hit)

XX%: 0.3465 (percentage of mystery pitches thrown... these probably are typically sliders that don’t slide much, curves that don’t curve, etc.; throwing more iffy pitches leads to more liners allowed)

Zone%: 0.3455 (pitchers who throw in the zone more get hit harder... the price they pay for allowing fewer walks)

GB%: -0.273 (groundball pitchers allow fewer liners; the same can’t be said about fly ball pitchers)

Now, some of the strongest correlations to infield popups:

GB%: -0.871 (percentage of grounders... not a shocker that ground ball pitchers get a lot fewer popups...)

Z-Contact%: -0.529 (how often hitters make contact with pitches thrown in the zone... less contact equals more popups)

SwStr%: 0.356 (percent of pitches swung on and missed; more misses equals more popups)

HR/FB: -.302 (home runs per fly ball hit; fewer homers are tied to more popups... hitters aren’t squaring the ball as well, or swinging with as much authority)

Swing%: 0.293 (when hitters are swinging more frequently, they’re popping it up more... overconfidence, swinging defensively, or what?)

Z-Swing%: 0.277 (specifically, swinging at pitches in the zone is what’s most connected; unlike with liners, O-Swing% has pretty much no correlation)

KN%: 0.208 (knuckleballs are good)

Zone%: 0.192 (pitching in the zone more leads to more popups)

There’s very little connection between LD% and popups, but somewhat surprisingly, pitchers who give up more liners also get more popups (only a 0.080 correlation, though). That probably has something to do with the popup pitchers being the more aggressive type, while the line drive preventers are more the nibbling type.

When I came up with the formula, by the way, it was right before Weaver had his awful game. His BABIP was standing at 0.225, whereas his xBABIP per my formula was 0.294. This year, he had been giving up more liners and getting fewer popups than usual, so that 0.225 looked very fluky (though having Trout and/or Bourjos in the OF helps him a lot, for sure). It’s now up to .233, so more correction could be on the way. Weaver, however, in his career has a 0.276 BABIP (one of the best), and it’s no fluke, because his xBABIP according to the formula is 0.271. The average for qualified pitchers was 0.292, by the way.

So, any questions, comments or criticisms?