-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReport.Rmd
224 lines (168 loc) · 13.9 KB
/
Report.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title: |
| \vspace{5cm} Balls, Strikes, and Controversies:
| The Career of Angel Hernandez
author: |
| Peter D. DePaul III
date: "05-29-2024"
abstract: ""
bibliography: project_files/references.bib
header-includes:
- \usepackage[usenames,dvipsnames]{xcolor}
- \usepackage[table]{xcolor}
- \usepackage{booktabs}
output:
bookdown::pdf_document2:
fig_width: 12
toc: no
number_sections: true
linkcolor: blue
urlcolor: blue
citecolor: blue
link-citations: yes
editor_options:
markdown:
wrap: 72
---
\newpage
```{=latex}
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{4}
\tableofcontents
\hypersetup{linkcolor=blue}
```
\newpage
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(reticulate)
library(tidyverse)
library(kableExtra)
source_python("report.py")
```
# An Honest Review of MLB Umpires
Today was a historic day in the MLB. Angel Hernandez has finally retired. Hernandez has had quite the career as an umpire, I can't say it was a great career but for sure it was historic. However I wanted to look further beneath the surface to see if there's any truth to him being the "worst" umpire in the MLB. I will utilize data sourced from the UmpScorecards [@umpscorecards] project to assess who is objectively the worst umpire in the MLB.
# A note on UmpScorecards Methodology
For a better understanding of the UmpScorecards methodology here are some useful links to UmpScorecards resources:
- [Explainers](https://umpscorecards.com/info/#explainers)
- [FAQs](https://umpscorecards.com/info/#faqs)
- [Glossary](https://umpscorecards.com/info/#glossary)
# Table of Variables
```{r}
variables <- data.frame(
Variable = c("Date", "Umpire", "Home", "Away", "R [H]", "R [A]",
"PC", "IC", "xIC", "CC", "xCC", "CCAx", "Acc", "xAcc", "AAx",
"Con", "Fav [H]", "totRI"),
Description = c(
"Date of the game", "Name of the umpire",
"Home team", "Away team", "Runs scored by the home team",
"Runs scored by the away team", "Pitches called", "Incorrect calls",
"Expected incorrect calls", "Correct calls", "Expected correct calls",
"Correct calls above expected", "Accuracy", "Expected accuracy",
"Accuracy above expected", "Consistency", "Favor to home team",
"Total run impact"
),
Type = c(
"Date", "Character",
"Character", "Character", "Integer",
"Integer", "Integer", "Integer",
"Integer", "Integer", "Integer",
"Numeric", "Numeric", "Numeric",
"Numeric", "Numeric", "Numeric",
"Numeric"
)
)
var_tbl <-
variables %>%
kbl(caption = "Table of Variables", booktabs = TRUE,
format = "latex") %>%
kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
stripe_color = "#e6e6fa", font_size = 12) %>%
row_spec(0, bold = TRUE, align = "c")
var_tbl
```
# A Reflection on Notable Angel Hernandez blunders
As a Phillies fan, I have quite a few fond memories of him over the last 3 seasons.
In April 2022, my favorite funny moment in baseball happened when Kyle Schwarber had a tantrum after a missed Angel Hernandez call. In a game against the Brewers when the Phillis were losing by 1 run in the bottom of the 9th Kyle Schwarber found himself in a full count. Josh Hader deals him a pitch low and outside which Hernandez incorrectly calls a ball. Hernandez had an incosistent strike zone all night, and Schwarber could not contain himself anymore. With 2 hands he overhead spikes his bat on the ground, then throws his helmet, and starts yelling at Hernandez. He points to both sides of the plate referring to the outside zone Hernandez had been missing the whole night. This resulted in Schwarber's ejection but it will forever be burned into my mind as an example of what makes people desire automated umpires.
One I'll never forget is a confrontation with Bryce Harper from the 2023 season. It was 1-1 with a full count in the bottom of the 3rd against the Pittsburgh Pirates. Bryce is dealt a pitch low and out of the zone that he checks his swing on, and Angel Hernandez makes a notable blunder saying Bryce goes around. Harper lost his mind on Hernandez's terrible call, and was promptly ejected from the game as a result.
# Cleaning the Data
I conducted the data cleaning process utilizing the `pandas` [@pandas] package in Python (script - `report.py`). I then utilized the `reticulate` [@reticulate] package in R to access the objects featured in this RMarkdown report, while utilizing `tidyverse` [@tidyverse], `bookdown` [@bookdown], and `kableExtra` [@kableExtra] to format and produce the PDF of this report.
```{r umpData}
ump_data <-
head(summary, 10) %>%
kbl(caption = "Overall Umpire Data", booktabs = TRUE,
format = "latex") %>%
kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
stripe_color = "#e6e6fa", font_size = 12) %>%
row_spec(0, bold = TRUE, align = "c")
ump_data
```
The data displayed above in Table \@ref(tab:umpData) is a snippet of the data downloaded from UmpScorecards, and is a summary of the overall game-by-game data. This includes totals for the columns PC through xCC, and averages for the columns CCAx through totRI.
# Angel Hernandez's Stats
```{r angelTable}
angel_tbl <-
angel %>%
kbl(caption = "Angel Hernandez Data", booktabs = TRUE,
format = "latex") %>%
kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
stripe_color = "#e6e6fa", font_size = 12) %>%
row_spec(0, bold = TRUE, align = "c")
angel_tbl
```
As we can see in Table \@ref(tab:angelTable) Angel Hernandez called over 247 games since the 2015 season, when UmpScorecards data begins, and this includes 34,592 pitches called in these games.
As a result I will arbitrarily constrict this to Umpires who have called within about 5,000 pitches less than Angel Hernandez, so we will use only umpires with $PC \geq 29,500$
# So how bad is Angel Hernandez truly?
The best part about Angel Hernandez's possibly forced retirement from umpiring is seeing the rejoice from the fans of Major League Baseball, yet the fairly opposite reactions of other umpires. The best example is of fellow bad umpire Joe West who commented on a radio talk show, "I know the lawyer that handled his case, and I know that when they went through everything that he was graded on, he was in the top 20 percent [of umpires]".
If I'm being honest I have no idea who is letting a lawyer speak on the matter of Umpire performance, that does not make any sense. I would love to know what the statisticians and data scientists who analyzed Hernandez's calls came to a conclusion on. Another thing that bothered me is Joe West's criticism of UmpScorecards and in general review of umpires calls utilizing technology after the game. West stated "I hate the fact that these people that sit behind a desk and get behind a computer and send out all these social media things, they don’t know what they’re talking about."
I find this funny because Joe West sounds like every other scared, low-performance umpire because after years of ineptitude technology is beginning to elucidate his flaws to the baseball community. I will continue to focus primarily on Joe West and Angel Hernandez from here on out.
Below is the full list of umpires ranked upon a few selected columns ranked using percentiles:
- The `avg_pct` column is an average of the other `_pct` columns with the exception of `Fav_pct`. This aims to measure overall performance of the umpires based on the average percentile of their rankings in the following categories:
- Accuracy (% of Correct Calls)
- Consistency
- AAx (Accuracy above Expected Accuracy)
- IC_per_100 (Incorrect Calls per 100 pitches called)
- totRI (total Run Impact)
```{r umpsTable}
seasoned_umps_tbl <-
head(seasoned_umps, 10) %>%
select(Umpire, GC, c(IC_per_100:Fav_pct)) %>%
kbl(caption = "Experienced Umpires Data", booktabs = TRUE,
format = "latex") %>%
kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
stripe_color = "#e6e6fa", font_size = 14) %>%
row_spec(0, bold = TRUE, align = "c")
seasoned_umps_tbl
```
In Table \@ref(tab:umpsTable) we see a dataframe containing the Percentile rankings for the columns with the exception of the `Fav_pct` which is simply the numerical rank in the Favor of calls for an umpire. A higher `Fav_pct` indicates the absolute value of the `totalRI` of the umpire was closer to a perfect favor of calls which is equal to 0.
As we can see from this Table of the Experienced Umpires with greater than 29,500 Pitches Called (`PC`), Will Little, Pat Hoberg, and Ben May have been the top 3 overall umpires based on Accuracy (`Acc`), Consistency (`Con`), Accuracy Above expected (`AAX`), Incorrect Calls per 100 Pitches Called (`IC_per_100`), and total Run Impact (`totRI`)
# The Angel and Joe Show
```{r angelJoe}
angel_joe_tbl <-
angel_joe %>%
kbl(caption = "Angel Hernandez and Joe West", booktabs = TRUE,
format = "latex", ) %>%
kable_styling(latex_options = c("HOLD_position", "striped", "scale_down"),
stripe_color = "#e6e6fa", font_size = 14) %>%
row_spec(0, bold = TRUE, align = "c")
angel_joe_tbl
```
Table \@ref(tab:angelJoe) is a dataframe containing specifically the ranks of the umpires Angel Hernandez, and Joe West who we will be focusing on for analysis.
## General Overview of their performance
As we can see from the above rankings Angel Hernandez and Joe West are low-performing umpires. To be clear, I'm utilizing the data available to the general public and this is not an opinion piece. From our rankings we are able to see that both of these umpires are in the bottom 20% of umpire performance.
Overall out of 56 MLB umpires with at least 29,500 Pitches Called (`PC`) since UmpScorecards data is available in 2015 MLB Season, Angel Hernandez and Joe West are the 46th and 52nd best performing MLB umpires respectively.
## Discussing their Respectives Flaws
### Angel Hernandez
Let's start with Angel Hernandez. Angel's worst traits when it comes to umpiring are his Consistency and his total Run Impact in the games he has umpired. He ranked in the 7th percentile for Consistency, and the 11th percentile for total Run Impact. These are both dreadful performances, and I believe these are the primary reasons why people were constantly on Hernandez's back and criticizing him more severely than other umpires.
Hernandez underperforms as an umpire but time and time again he does this in high leverage, and impactful situations within the games. As a fan this provides me a better understanding of why he's so widely disliked, recall the Phillies situations I mentioned earlier.
### Joe West
As for Joe West, he is noticeably worst than Hernandez in every category with the exception of Consistency and incorrect calls per 100. The two statistics I want to highlight about West is that he is the 1st percentile for both Accuracy and total Run Impact. Since 2015, Joe West has been the least accurate umpire and the most negatively impactful umpire to terms in terms of runs. I don't believe there's much more needed to be said about Joe West to understand his poor performance.
## Giving Credit where it's Due
I want to remark that the one thing I find statistically remarkable about Angel Hernandez during his umpiring career is that Hernandez ranks in the 94th percentile of umpires for the `Favor [H]` variable. This is good for 4th best among all umpires.
This variable measure the number of runs in favor of the home team where a positive value represents a benefit for the home team, while a negative value represents a benefit for the away team. The ranking was calculated using the absolute values of the `Fav [H]` column, where the closer a value is to 0 then the closer it's percentile is to 1.
I'm interested in this variable because despite Hernandez's lack of consistency and rather low accuracy on Pitch Calls he is usually hurting both teams about equally. While the impact of favor typically doesn't more than around 0.1 runs per game in either direction, often times you will see umpires who favor the home/away teams consistently. I have to admit Angel Hernandez did a great job at being unbiased in his bad calls, a lot of respect for that.
Joe West I can't say the same for you, you were still mediocre at your best being in the 52nd percentile of umpires for Favor.
# Conclusion
All in all Angel Hernandez has historically been a low-end performing umpire since 2015. There's no argument around it, from the objective data we have on umpires he has under performed compared to the rest of the MLB umpires. As for Joe West and the comments he made regarding Angel Hernandez, I have no idea what he is talking about or what statistics he is referencing. Obviously he is wrong in saying Angel Hernandez is top 20% of MLB Umpires perhaps he meant the bottom 20% and simply mispoke.
I would urge Joe West, regardless of his lengthy experience as an MLB umpire should probably remain quite on situations like this. Yes Joe West was an umpire for 43 seasons, but from the 6 years of games we have from him he should never discuss the performance of other umpires. Quite frankly he will probably only draw further criticism to himself with this statement, as in our rankings West was the 5th worst umpire in all of the MLB since 2015.
Furthermore, I hope that the data presented in this report could be utilized as a general ranking for monitoring the performance of umpires. The MLB Umpire Association is a controversial union, as they operate under near immunity from being fired due to performance. This promotes a "god complex" within the umpire community, and I believe they should be held responsible for their performance. Perhaps it could promote a possible model of Umpire relegation similar to soccer leagues. For example, if you are at or below the 20th percentile of overall umpire performance like Hernandez and West for an individual then you will be relegated for the next season to the Minor League Baseball (MiLB) system, and an MiLB umpire would be promoted likewise.
\newpage
# Bibliography