The Data Science Revolution in Baseball
How Data Analytics and Sabermetrics Upended the 2010 Cy Young Award
It all started with a book. When Bill James, a baseball obsessed statistician self-published The Bill James Baseball Abstract in 1977, the data analytics pioneer began a sabermetric revolution in Major League Baseball, that still resonates today.
At the time, most baseball writers worried about who won a given game, how many strikeouts and walks a pitcher gave up and how many home runs Willie Mays hit that week. James was interested in a deeper dive. He used a small amount of data points to create stats like the pitcher-catcher combination that gave up the most stolen bases, a fact that most MLB general managers considered useless at the time. Now it's a stat debated in contract negotiations and among amateur baseball analysts on social media.
If you're interested in studying Data Science and its impact on the world around us, baseball is an excellent example of how big data allows people to comprehend the complexities and nuances of what was once considered a simple game.
Take pitching statistics. At one point, the only thing that mattered to most fans was the number of wins a pitcher racked up by the end of the year. Unfortunately, while wins are simple and effective, they don't always tell the whole story of a pitcher’s season. Not even ERA (Earned Run Average), can properly break down a pitcher's value to an organization.
The Year Efficiency Took the Trophy
When Felix Hernandez won the Cy Young award for the Seattle Mariners in 2010, he did so with a paltry 13-12 record. There was a debate leading up to the announcement that pitted old school baseball heads, including scouts and coaches, against the new Moneyball-era analysts who looked at advanced stats to drive discussions around award winners.
Statisticians understood that Hernandez’s WHIP (walks + hits/innings pitched), WAR (wins above replacement player), and RAR (runs better than replacement level) added up to an elite season that no one else could match that year. However, while King Felix’s numbers were impressive, many journalists argued that wins were the true marker of a pitcher’s worth and David Price and CC Sabathia, the two runner-up candidates, had win percentages in the high 800s.
It was data science that extricated the real value Felix Hernandez provided his team, even if his win percentage couldn’t compare to Price or Sabathia, who both put up impressive numbers in 2010.
Consider the defense behind Hernandez. King Felix was plagued most of his career with below average to average defenses and the 2010 season was no different. The Mariners' RA9def (a measurement of defensive performance compared to league average) while Hernandez was on the mound was 0.07. Compare that to David Price (0.31) and CC Sabathia (0.27) and it’s easy to see that Hernandez’s team struggled defensively. While all three numbers are positive value, signifying above average, many analysts believed the Mariners' inability to match elite-level pitching with elite-level fielding explained the .520 winning percentage Hernandez put up that season.
Sabathia, Price and Hernandez are all Hall of Fame-level pitchers who dominated the 2010 season. Sabathia won the Cy Young Award in 2007 and played a pivotal role in the Yankees’ World Series victory the previous year. David Price went on to win the Cy Young in 2012 and led the Boston Red Sox to the World Championship in 2018.
With that level of fire-power vying for the Cy Young, it took sabermetrics to help Felix Hernandez, whose team didn’t even make the playoffs, take home the award for best pitcher in the American League.
Data Analytics for the Win
As you can see, the numbers don't always speak for themselves. It takes data science professionals who understand how to derive true value out of a vast quantity of statistics for people to truly match the stats with what their eyes see on the field.
It doesn't take a baseball mind to see how important advanced statistics are to Major League Baseball and almost any other organization that struggles to comprehend big data. That's why so many employers are searching for trained and qualified data science professionals.
If you're interested in a new career, one that is analytical and research-driven, INE's Data Science training is an affordable training solution. Get started today and you'll learn from experts, back up your knowledge with practical exercises and build a resume-enhancing Git Hub portfolio with your new skills.
Data Science Week is just getting started. Don't miss out.
{{cta('b2416a4e-7636-4aaa-9863-f4798ea9cdac')}}