Skip to content

hateryx/newspyscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

PH News PDF Generator

How to Use

  1. Install the requirements first by entering in the terminal:

    pip install -r requirements.txt

  2. Run the app through this command:

    python app.py

Description:

The PH News PDF Generator aims to simulate somehow the delivery of latest news article from ABS CBN straight to the reader straight from a PDF (a news PDF, if you will). You open the app, and then asks you to select a news report category. Once inputted, voila! A PDF containing the latest news is instantly generated for your reading consumption.

Behind the scene, this Python-based app has two (2) main components to deliver the news:

1.) Web scraping

Enabling Library: Beautiful Soup (bs4)

This library enables the app to do the web scraping functions:

  • Extract all the links of news articles related to the selected news category;
  • Extract the contents of a news article and store it into string for loading to PDF;

Supporting Libraries Used:

  • request - Access to the website for web scraping is made possible through the use of modififed `header;
  • re (regex) - While content is extracted largely by bs4, the re.match function allows cleaning of the content before its finalization;

2.) PDF Generation

Enabling Library: Report Lab

This library gets the title and finalized content, and puts them into PDF in an orderly fashion. Other functions used to achieve the desired PDF product includes:

  • Use of frame and Paragraph to load and format the news contents therein for proper wrapping and presentation. This function also handles generation of multi-page PDF for voluminous contents.
  • Use of imagereader to load and properly the logo image in the PDF.
  • Use of drawString to place the title, date and disclaimer footers to appropriate location.
  • Use of TTfont to load true fonts for styling contents
  • save generates the PDF!

Supporting Libraries Used:

  • datetime - converts date from the link into a string format that is used to load and print to the title;
  • titlecase - formats title for styling purposes

Disclaimer: All news articles generated by this app are owned and published by ABS CBN Corporation, a media and entertainment organization in the Philippines (PH). All rights and credits go directly to ABS CBN. No copyright infringement intended.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages