-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add daily Federal Tax Revenue scraper and resulting data files #79
Add daily Federal Tax Revenue scraper and resulting data files #79
Conversation
Thanks @benarthur91 - this scraper looks good. The output format we have been trying to conform to is a table with these columns:
Usually have been dumping into a CSV for testing, then will push directly to a postgres database when integrated. Do you want to try building the table parser for those raw text files as well? |
Definitely! I'll get started. |
I'm moving into the testing phase with this parser. I expect to update the pull request sometime over the weekend. I've noticed the file contains a variety of metrics, and adequately describing them in the standard schema may be difficult. For example, a label that uses all available descriptions might look something like:
It may be helpful to either:
I'll continue to build the parser to write fully qualified names in the CSV, but let me know if you have any input on this. |
I think we should avoid making any changes to the database schema for this specific data source. Want to keep things as general as possible. On the "this month to date" specifier, we actually only need to scrape the raw daily value. Something like "this month to date" would be calculated on the graphing and visualization side. Also think about what makes sense to keep as a single table. For example in this case, it might make sense to separate deposits and withdrawals into their own tables. |
Currently the parser grabs all the data for Today and writes it into a single file (one output file per source file). If it's possible to populate multiple DB tables per source file, I can look to separate the output into multiple files. |
I think that's a good idea. Looking at how many different metrics are in each file, we will likely want a configurable way to filter specific metrics and map them to a specific database. Also, when we push a certain set of metrics to the database, data from different days will all be in the same table. This is looking good! |
I've added a config file for the parser. This allows the caller to specify the names of the files that should be output, and which data fields should go in which file. The metric name is still the "fully qualified" attribute name as mentioned above. Quick guide to configuration:
One thing I noticed is that some of the values in the source file are asterisks rather than integers, and at the bottom is a note explaining the asterisk: "Statutory debt limit is temporarily suspended through December 8, 2017". How should the parser handle these? |
#74