Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 2.5 KB

about_the_data.md

File metadata and controls

26 lines (16 loc) · 2.5 KB
title author date output
Foreign Travel: About the Data
@ryanes
February 17, 2017
html_document

If you haven't already, please review our readme to learn more about this project and how to contribute.

The purpose of this document is to provide links to the original sources of data that are used in the Data for Democracy/ProPublica repository official-foreign-travel. It also provides descriptions and context for each dataset, including information about the data cleaning methods used. This document will be updated as new datasets are introduced.

Dataset: Foreign Travel Reports

The original datasets can be downloaded from the Office of the Clerk.

From Derek Willis of ProPublica:

House Official Foreign travel reports, which are published quarterly by the House Clerk, are produced either by committees or delegations that are not committee-sponsored. They contain the name of each traveler, arrival and departure dates, the destination, three spending categories (per diem, transportation and other) along with a grand total of money spent (usually in US dollars).

For committee trips, the name of the committee is in the line beginning REPORT OF EXPENDITURES FOR OFFICIAL FOREIGN TRAVEL in the files. Those without a committee might contain DELEGATION or an individual's name.

Caveats: in some cases, the destination is a continent, not a country. This usually happens for trips paid for by the Intelligence Committee. Lawmakers are typically identified by the prefix "Hon" before their names. There could be amended reports, meaning substantially duplicative information would occur. To the extent we can identify those cases, we want to retain the most recent report.

The script to clean this data is an ongoing process. scraper_report_text.py pulls down the text files from the server. scraper.py cleans and outputs the data, which is stored on our data.world page or can be read in using this link.

To keep things consistent, please use links to our data.world page in your scripts whenever possible.