-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
379 lines (291 loc) · 14.9 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
output: github_document
always_allow_html: yes
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
```{r packages, include = FALSE}
library(pensionviewr)
library(tidyverse)
library(kableExtra)
library(reasontheme)
```
# pensionviewr
The goal of `pensionviewr` is to simplify the process of gathering and visualizing public pension plan data from the Reason pension database. This repo contains the functions of the `pensionviewr` package, which once installed locally, provides helpful functions for creating and exporting graphics made in ggplot in the style used by the Reason Pension Integrity Project team.
## Create Token:
To use devtools you'd need to authenticate yourself by
creating Personal Access Tokens (PAT):
- Obtain a PAT by typing `usethis::browse_github_pat()`.
Click "Generate token" and Copy to Clipboard the displayed string of 40 letters/digits.
- Find your .Renviron in your home directory by typing:
`usethis::edit_r_environ()`
- Put your PAT in your .Renviron file. Have a line that looks like this:
`GITHUB_PAT=8c70fd8419398999c9ac5bacf3192882193cadf2` (but use your own PAT instead)
- Save edited .Renviron file, and
- Lastly, restart R & check your PAT by typing:
`Sys.getenv("GITHUB_PAT")`
## Installing pensionviewr
`pensionviewr` is not on CRAN, so you will have to install it directly from Github using `devtools`.
If you do not have the `devtools` package installed, you will have to run the first line in the code below as well.
``` r
install.packages('devtools')
devtools::install_github("ReasonFoundation/pensionviewr")
```
## Using the functions:
* The package has seven functions for data pulling and preparation:
`planList()`,
`pullData()`,
`pullStateData()`,
`pullSourceData()`,
`filterData()`,
`masterView()`,
`loadData()`, and
`selectedData()`.
* The package has four functions for plots:
`glPlot()`,
`linePlot()`,
`areaPlot()`,
`barPlot()`, and
`savePlot()`.
* The package has Reason color palette:
`palette_reason` (e.g. `palette_reason$Orange`))
A basic explanation and summary here:
### `planList()`
1. `planList()`: returns a stripped down list of the pension plans in the database along with their state and the internal databse id.
Example of how it is used in a standard workflow:
```{r planList}
pl <- planList()
pl %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `pullData()`
2. `pullData()`: pulls the data for a specified plan from the Reason pension database. `pullData` has two arguments:
`pullData(pl, plan_name)`
* `pl`: A dataframe containing the list of plan names, states, and ids in the form produced by the `planList()` function.
* `plan_name`: A string enclosed in quotation marks containing a plan name as it is listed in the Reason pension database.
Example of how it is used in a standard workflow:
The next step would be to load the data for the specific plan of interest. Let's use the Vermont State Retirement System as an example. Let's first see what plans in Vermont are available:
```{r Kansas}
VT <- pl %>% filter(state == 'Vermont')
VT %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
The full plan name we are interested in is there listed as "Vermont State Retirement System". We can pull the data for it now:
```{r pullData}
vtsrs_data <- pullData(pl, plan_name = "Vermont State Retirement System")
vtsrs_data %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `pullStateData()`
3. `pullStateData()`: pulls all state-level pension data in wide format from the Reason pension database. `pullStateData` has a single argument:
`pullStateData(fy)`
* `fy`: Starting fiscal year for the data pulled from the Reason pension database.
This could be the initial step for either making state-level pension analysis or then filter pulled data for a specific state plan.
Example of how it is used in a standard workflow:
```{r pullStateData}
state.data <- pullStateData(2001)
New_Mexico.data <- state.data %>% filter(state == 'New Mexico')
NMERB.data <- state.data %>% filter(display_name == "New Mexico Educational Retirement Board")
state.data <- state.data %>% filter(year == 2019)#filter for 2019 for display
state.data %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `pullSourceData()`
4. `pullSourceData()`: pulls pension data for a specific plan, along with `data_source_name` column in wide format from the Reason pension database. `pullSourceData(pl,plan_name, fy)` has 3 arguments:
* `pl`: A dataframe containing the list of plan names, states, and ids in the form produced by the `planList()` function.
* `plan_name`: A string enclosed in quotation marks containing a plan name as it is listed in the Reason pension database.
* `fy`: Starting fiscal year for the data pulled from the Reason pension database.
Example how this function is used:
```{r pullSourceData}
NMERB.source <- pullSourceData(pl, "New Mexico Educational Retirement Board", 2001)
NMERB.source %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `filterData()`
5. `filterData()`: filters existing data (data.frame/data.table format) keeping & renaming set of commonly used variables in pension analysis. `filterData(Data, fy, source = FALSE)` has 4 arguments:
* `Data`: A data table already pulled with `pullData`, `pullStateData`, or other ways.
* `fy`: Starting fiscal year for the data pulled from the Reason pension database.
* `employee`: Character designating type of employees covered (e.g. "teacher", "state", "local", "state and local", "police and fire").
* `source`: Set to `FALSE`. It should be set to `source = TRUE` if you filter data pulled with `pullSourceData` function.
Example of the workflow around the filtering function:
```{r filterData}
state.data <- pullStateData(2001)
filtered <- filterData(state.data, 2010)
filtered %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `masterView()`
6. `masterView()`: allows to view list of mapped & unmapped columns in pension database per data source.
`masterView(source = NULL, expand = FALSE)` has 2 arguments:
* `source`: main data sources: "Reason", "Public Plans Database", "Census".
* `expand`: Set to `FALSE` (default) to see mapped variables. Change to `TRUE` if you want to see all unmapped variables.
Example of how to use this function:
```{r masterView}
view.reason <- masterView("Reason", FALSE)
view.reason %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `loadData`
7. `loadData`: loads the data for a specified plan from an Excel file. `loadData` has one argument:
`loadData(file_name)`
* `file_name`: A string enclosed in quotation marks containing a file name with path of a pension plan Excel data file.
```
data_from_file <- loadData('data/NorthCarolina_PensionDatabase_TSERS.xlsx')
```
### `selectedData()`
8. `selectedData()`: selects the only the variables used in historical analyses. `selectedData` has one argument, `wide_data`, that is required:
`selectedData(wide_data)`
* `wide_data`: a datasource in wide format
Back to the Kansas Public Employees' example. That is a lot of variables. The `selectedData()` function selects only a handful of needed variables:
```
df <- selectedData(vtsrs_data)
df %>%
head() %>%
kable() %>%
kable_styling(full_width = FALSE, position = "left")
```
### `glPlot()`
9. `glPlot()`: creates the 'Gain/Loss' plot using a CSV file as an input. glPlot has two arguments:
`glPlot(filename, ylab_unit)`
* `filename`: a csv (comma separated value) file containing columns of gain loss category names with one row of values.
* `ylab_unit`: a string contained within quotation marks containing th y-axis label unit. The default value is "Billions"
Example of how it is used in a standard workflow:
`filename <- "data/GainLoss_data.csv"`
`glPlot(filename)`
### `linePlot()`
10. `linePlot()`: creates a line chart comparing up to 5 variables, such as ADEC vs. Actual contributions. `linePlot()` has 22 arguments, with `data` being required:
`linePlot(data, title, caption, grid, interactive, Ytitle, treasury, inv.returns, ticks, font, yaxisMin, yaxisMax, yaxisSeq, yaxisScale, format, str, labelY, lab1, lab2, lab3, lab4, lab5)`
* `data` a dataframe produced by the filterData function or in the same format.
* `title` naming the chart (e.g. "Unfunded Liability Growth").
* `caption` set to `TRUE` to add "reason.org/pensions" caption at the bottom right corner
* `grid` set to `TRUE` to add major gridlines
* `ticks` set to `FALSE` to remove ticks
* `interactive` set to `TRUE` to create interactive plot
* `Ytitle` naming the Y-axis title
* `treasury` if set to `TRUE` shows 30-Year treasury Yields on the secondary Y-axis
* `inv.returns` if set to `FALSE` allows to graph anything other than Investment Return graph
* `yaxisMin` minimum value for Y-axis
* `yaxisMax` maximum value for Y-axis
* `yaxisSeq` value that sets space between Major breaks.
* `yaxisScale` value that sets Y-axis scale. Example: 100 for percentages or 1/1000 for thousands.
* `format` format of Y-axis scale. Examples: "%", "$", or something else.
* `str` value that sets number of strings at which to cut legend text -- Defaulted to 20.
* `labelY` title of the Y-axis.
* `lab1` text label for the 1st variable - optional.
* `lab2` text label for the 2nd variable - optional.
* `lab3` text label for the 3rd variable - optional.
* `lab4` text label for the 4th variable - optional.
* `lab5` text label for the 5th variable - optional.
Example of how it is used in a standard workflow:
```
graph <- linePlot(data = PERSI.data, title = "", inv.returns = TRUE, treasury = FALSE,
interactive = TRUE, Ytitle = NULL,
font = "Calibri", yaxisMin = -21, yaxisMax = 21, yaxisSeq = 3,
yaxisScale = 100, format = "%", str = 60,
labelY = "",
lab1 = "Market Valued Returns (Actual)",
lab2 = "Actuarially Valued Investment Return (Smoothed by Plan)",
lab3 = "Assumed Rate of Return",
lab4 = "10-Year Geometric Rolling Average")
```
### `areaPlot()`
11. `areaPlot()`: creates Mountain of Debt or S&P500 chart. `areaPlot` has seven arguments, with `data` being required:
`areaPlot(data, title, caption, grid, ticks, sp500, font)`
* `data` a dataframe or data.table produced by the pullStateData function or in the same format.
* `title` naming the chart (e.g. "Unfunded Liability Growth").
* `caption` set to `TRUE` to add "reason.org/pensions" caption at the bottom right corner
* `grid` set to TRUE to add major gridlines
* `ticks` Set to `FALSE` to remove ticks
* `sp500` default is `FALSE` for Mountain of Debt chart. Set sp500 to `TRUE` to visualize annual S&P500 Index values on the secondary Y-axis
* `font` directly paste name of a font (e.g. "Calibri") to change the default font of the text
Example of how it is used in a standard workflow:
```
debt <- areaPlot(PERSI.debt, caption = F, grid = F, ticks = F, sp500 = F, font = "Calibri")
```
### `barPlot()`
12. `barPlot()`: creates bar charts for Compound Gain/Loss, Net Amortization, Stochastic, and ADEC vs. ERCs. `barPlot()` has 24 arguments, with `data` not being required for G/L or Stochastic charts:
`barplot(data, net.amo, contributions, compound.gl, gl.url, stochastic.alpha, Ymax, Ymin, Ybreak, yaxisScale, interactive, font, lab1, lab2, lab3, lab4, lab5, lab6, lab7, lab8, lab9)`
* `data`: data used for the bar chart (default it set to `NULL`)
* `net.amo`: Set to `TRUE` to graph Net Amortization bar chart
* `contributions`: Set to `TRUE` to graph ADEC vs. Actual Contribution bar chart
* `compound.gl`: Set to `TRUE` to graph Compound Change in Gain/Loss bar chart
* `gl.url`: Provide a url to GitHub or Dropbox file with Compound Gain/Loss data
* `str`: Length of legend text (default set to 25 characters),
* `stochastic`: Set to `TRUE` to graph Stochastic projections bar chart
* `stochastic.url`: provide url to the file with Stochastic data outputs
* `Ymax`: Set the Y-axis maximum value
* `Ymin`: Set the Y-axis minimum value
* `Ybreak`: Set the chucks for Y-axis breaks
* `yaxisScale`: Set the Y-axis scale (e.g., 1/1000000 – in $Millions)
* `stochastic.alpha`: Set the transparency for the blue color used for Stochastic bar chart (e.g., 0.95 - 5% transparent)
* `interactive`: Set to ‘TRUE` to create an interactive, as opposed to static, chart (using ggplotly)
* `lab1`: Gain/Loss chart Legend 1 (e.g., "Investment Experience")
* `lab2`: Gain/Loss chart Legend 2 (e.g., "Benefit Changes")
* `lab3`: Gain/Loss chart Legend 3 (e.g.,"Changes to Actuarial Methods & Assumptions")
* `lab4`: Gain/Loss chart Legend 4 (e.g., "Deviations from Demographic Assumptions")
* `lab5`: Gain/Loss chart Legend 5 (e.g.,"Negative Amortization")
* `lab6`: Gain/Loss chart Legend 6 (e.g., "Gains From Pay Increases Not Given")
* `lab7`: Gain/Loss chart Legend 7 (e.g., "Net Change to Unfunded Liability")
* `lab8`: Gain/Loss chart Legend 8 (e.g., NULL)
* `lab9`: Gain/Loss chart Legend 9 (e.g., NULL)
Example of how it is used in a standard workflow:
```
barPlot <- function(
data = NULL,
net.amo = FALSE,
contributions = FALSE,
compound.gl = TRUE,
gl.url = "https://raw.githubusercontent.com/ReasonFoundation/GraphicsR/master/PERSI_GL2.csv",
str = 25,
stochastic = FALSE,
stochastic.url = "https://raw.githubusercontent.com/ReasonFoundation/databaseR/master/files/perctest2.csv",
stochastic.alpha = 0.40,
Ymax = 25,
Ymin = 0,
Ybreak = 5,
yaxisScale = 1/1000000000,
interactive = FALSE,
font = "Calibri",
lab1 = "Investment Experience",
lab2 = "Benefit Changes",
lab3 = "Changes to Actuarial Methods & Assumptions",
lab4 = "Deviations from Demographic Assumptions",
lab5 = "Negative Amortization",
lab6 = "Gains From Pay Increases Not Given",
lab7 = "Net Change to Unfunded Liability",
lab8 = NULL,
lab9 = NULL)
```
### `savePlot()`
13. `savePlot()`: adds a source and save ggplot chart. `savePlot` takes five arguments:
`savePlot(plot = myplot, source = "The source for my data", save_filepath = "filename_that_my_plot_should_be_saved_to.png", width_pixels = 648, height_pixels = 384.48)`
* `plot`: The variable name of the plot you have created that you want to format and save
* `source`: The text you want to come after the text 'Source:' in the bottom left hand side of your side
* `save_filepath`: Exact filepath that you want the plot to be saved to
* `width_pixels`: Width in pixels that you want to save your chart to - defaults to 648
* `height_pixels`: Height in pixels that you want to save your chart to - defaults to 384.48
```
savePlot(debt_plot, source = "Source: KPERS", save_filepath = "output/test.png")
```
The BBC has created a wonderful data journalism cookbook for R graphics located here:
https://bbc.github.io/rcookbook/