Skip to content

Repository for a small example of ETL process using The Guardian API.

Notifications You must be signed in to change notification settings

leoalmeidasant/theguardian-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TheGuardianETL

Repository from project to get data from technology section on The Guardian API and store as csv files locally.

Getting Started

Requirements

Before start, you have to register your account to get the api-key here: https://bonobo.capi.gutools.co.uk/register/developer

You need python 3.6 and pip installed on your machine to run this application.

And, you have to set the configurations in config.ini file:

[DEFAULT]
SECTION=technology
API_KEY=<your-api-key>
URI=https://content.guardianapis.com/search
PAGE_SIZE=200
OUTPUT_DIR=

Installing

pip install -r requirements.txt

Running

To run the ETL process, you have to pass a start date and an end date, with arguments --from-date and --to-date

python run.py --from-date "2018-08-15" --to-date "2018-08-17"

Running tests

To run all test cases you need to run the following command:

python -m unittest discover ./app/tests

Built With

  • Python 3.6 - Main language of project
  • PIP - Dependency manager
  • Pandas - Used pandas for test case

About

Repository for a small example of ETL process using The Guardian API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages