Collegiate Linear Weights
Linear Weights, according to FanGraphs, is a “class of linear run estimators that we use to determine the relative values of particular events.” We use linear weights to calculate metrics like wOBA (weighted On-Base Average) or FIP (Fielding Independent Pitching). This article will have linear weights that will focus on both, as well as some others. Learn more about wOBA here and FIP here, both at the college level.
Why should we care about linear weights? To answer that question, I present an analogy, one that everyone has done at least once in their life, sleep. Let’s say two people get on average 8 hours of sleep a night. They both get the same amount of sleep, so they should be treated the same for analysis, correct? No. Although the two may sleep the same hours per night, there are other factors to be considered. Factors like age, weight, diet, caffeine consumption, among others, differentiate between each person to achieve better sleep. In this sense the average amount of sleep is like batting average, it tells us how much we got of something, but it doesn’t tell us the quality of that something. We in the baseball community know that all hits aren’t equal, but we should also know that a double isn’t twice as valuable as a single, a triple isn’t three times as valuable as a single, so on and so forth. That’s where the aforementioned factors of sleep come in. We know they affect the outcome in some way, but by how much? Linear weights attempts to solve that by assigning a linear run estimator to each event (non-intentional walks, HBP, 1Bs, 2Bs, 3Bs, Home Runs, and outs, in this case, as well as some other weights, which I’ll discuss later).
To get an idea of what each linear weight looks like at the MLB level, here’s a screenshot of all linear weight values from 2006–2019. If you want to see more, check out the FanGraphs Guts page, as I will be doing an analysis with more of those linear weights.
First, let’s break down the linear weights of the events I mentioned earlier (walks, HBP, 1Bs, 2Bs, 3Bs, and HRs). Walks are valued between .69 and .7 estimated runs, Hit By Pitches are valued between .72 and .73 estimated runs, singles are valued between .87 and .89 estimated runs, so on and so forth. These values are then multiplied to each event a player had, then added together and divided by plate appearances minus intentional walks and you get wOBA.
You’re probably wondering how these values are calculated and what some of the other values mean, and I promise you we’ll be getting to that.
What’s great is that we have this data at the major league level, which for most people is great, but what about at the collegiate level? I’m not just talking about Division 1 college baseball, but all of NCAA baseball (which includes Division 2 and Division 3 baseball).
I decided to dive down that rabbit hole and calculate those weights at each level for however many years I could find. First, I needed some play-by-play data, which is less accessible than MLB’s play-by-play data.
Using Bill Petti’s baseballr package, I was able to find that I could scrape play-by-play data from 2013 to the present from stats.ncaa.org across all three levels. Not all games during that span had play-by-play data, which is alright considering that a large majority of games did (which I am grateful for).
After making some adjustments to the play-by-play data (mostly adding runners on base, how many outs, ball in play location, etc., thanks largely to former Director of Baseball Analytics for the University of Hartford and current Red Sox analyst Dave Miller’s project), I can begin the analysis.
Most of the methodology came from this FanGraphs article, so I will follow along and provide examples at the college level so you, the reader, can follow along.
First, we need a run expectancy matrix. What is run expectancy? Run expectancy is the average number of runs scored given the base-out state through the rest of the half-inning. Here’s an example. Let’s say the leadoff hitter gets on first to start a half-inning. The base-out state is, runner on first, nobody out. Let’s say for example’s sake, this happens 100 times over a season. Let’s also say that THROUGH THE END OF THE INNING, 140 total runs were scored. This means that in this example, we reach the base-out state of runner on first, nobody out, that will have a run expectancy of 1.4 runs. Do the same for the other base-out states we have a matrix. Below are the Run Expectancy Matrices for each of the Division 1, Division 2, and Division 3 levels of the 2019 season, calculated by adjusting the run_expectancy_code function in baseballr to adjust for the collegiate level and adjust for games that do not go nine innings (There are frequent 7-inning doubleheaders/games at the college level).
Now, I will use the above Division 1 Run Expectancy Matrix for the remaining examples. We have to understand how the RE Matrix shifts from one base-out state to another by each plate appearance. For example, if we have a runner on first nobody out, the next batter singles and the result is runners on first and second, with 0 out. The change in run expectancy goes from 1.167 to 1.907, a +.74 change in run expectancy. Each plate appearance has varying values from positive to negative going from each state.
So, going back to linear weights, we need to calculate the average run value of each of the events I mentioned (walks, hit by pitches, singles, doubles, triples, home runs). How we achieve this is simply taking the total RE values of each individual event and dividing it by the number events. So, for a basic example, in the 2019 season, there were 69,831 walks at the D1 level and the total RE value for walks was 21647.09, which gives us a value of .31 runs above average, after rounding. We do the same for each event and we get the runs above average values. We need to scale these values so that wOBA looks like OBP, or On Base Percentage.
Here are the runs above average values for each event for the D1 season in 2019:
hit by pitch: .33
home run: 1.41
To put wOBA on a scale to look like OBP, we need to treat an out as zero, so we add .33 runs to each event so that we have run values relative to an out, which gives us:
hit by pitch: 0.66
home run: 1.74
Were almost there, so bear with me.
We multiply and add each value above with the number of events that occurred during the season, then divide by plate appearances (removing IBBs) to get the denominator. The league wOBA is really scaled to OBP, again, minus the intentional walks. The Division 1 wOBA in 2019 was .363 and the denominator was approximately .304. Divide the wOBA by the denominator and we get 1.194, after rounding. This is the wOBA Scale, which is used for two things, calculating weighted Runs Above Average (wRAA) and used to calculate the final linear weights. We multiply the wOBA scale by each of the weights above and WE MADE IT! Linear weights!
Now the moment you all have been waiting for, the NCAA guts page. Data from 2013–2020:
I mentioned most of these values and what they mean, so we know what wOBA is, the wOBA Scale, and each of the weights, wBB, wHBP, w1B, w2B, w3B, wHR. Let me explain the values for what you may not know.
runSB: the run value of stolen bases. This is set a .2 every year (I mimicked from FG). This is useful in calculating weighted Stolen Bases (wSB), which are essentially how valuable was a player in attempting stolen bases.
runCS: the run value (negative) of caught stealing. In all cases, it is much more costly to be caught stealing than it is to steal a base. It’s also used in calculating wSB but shows how costly the runner cost his team runs in each caught stealing.
runsPA: Runs per plate appearance. Simply the amount of runs scored at each level divided by the amount of plate appearances.
runsWin: Runs per Win is a useful converter in converting something like wRAA to Wins Above Average (WAA), which is useful in calculating WAR (Wins Above Replacement).
cFIP: The constant that is used to adjust FIP, which is an ERA estimator that displays how well a pitcher does with what he has the most control over, walks, hit-by-pitches, strikeouts, and home runs)
runsOut: Runs per Outs. The league runs divided by the league outs. Useful in calculating runCS.
Onto the Data Validation side, does these numbers make sense?
To validate this data, I compared it to the Guts! data of the MLB since the integration era (1947–2019). I could have used more data, but I wanted to have enough data points to validate this data, but not too many.
How I went about validating data was measuring correlations between the variables. I asked questions like, what is the relationship between the league wOBA and the wOBA Scale? What is the relationship between the wOBA Scale and its components? What about runs per plate appearance and the wOBA Scale?
First let’s look at the relationship between league wOBA and the wOBA Scale.
What this graph shows is that there is a very strong relationship (R², the correlation coefficient) between league wOBA and wOBA Scale. Meaning that, intuitively speaking, as league wOBA increases, the wOBA Scale typically decreases.
Let’s compare that at each level collegiately.
The general trend fits with the MLB data, in saying that as the league wOBA increases, the wOBA Scale decreases, but I will say this, that just because wOBA is higher, doesn’t always mean that the wOBA Scale is lower (See the last 3 years at the D3 level, the wOBA is the exact same, but three different wOBA Scales).
Some other trends that I analyzed at the MLB level were that there was a small relationship between wBB and wOBA Scale (-.24), a little relationship between wHBP and wOBA (-.115), a somewhat strong relationship between w1B and wOBA Scale (.481), a very strong relationship between w2B, w3B, and wHR and wOBA Scale (.903 for w2B, .961 for w3B, and .996 for wHR).
To check out the source code and the rest of the graphs, go to my Github page.
Thanks for reading. Go out there and calculate wOBA at the collegiate level!
Other work I have done include: