Skip to content

Twitter scraping pipeline for scraping latest posts in respect to given keywords.

Notifications You must be signed in to change notification settings

nimanov/twitter-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline for scraping twitter posts for a given keyword.

This pipeline scrapes latest posts for a given keywords. In this repository "təhsil" and "iqtisadiyyat" keywords are used. For accessing twitter posts "snscrape" library of Python is used where there is no need to have personalized token that is given for twitter developer account. This pipeline takes the current date and scrapes the posts that are posted on the current day regarding the given keyword. This process is repeated in every hour.

Docker

Creating network for containers to communicate.

docker network create myNetwork

PostgreSQL database (This part can be skipped if the database container is already created in previous projects)

Downloading PostgreSQL image

docker pull postgres 

Running PostgreSQL container from the postgres image in "myNetwork" network with below credentials.

docker run --name postgres-cnt-0 -e POSTGRES_USER=nurlan -e POSTGRES_PASSWORD=1234  --network="myNetwork" -d postgres

Creating "neurotime" database inside the "postgres-cnt-0" container.

docker exec -it postgres-cnt-0 bash
# psql -U nurlan
# create database neurotime;

Application dockerization

Building an image of the application

docker image build -t twitter:1.0 . 

Running a container from the image in "myNetwork" network.

docker run  --name twitter_cnt --network="myNetwork" -d  twitter:1.0

About

Twitter scraping pipeline for scraping latest posts in respect to given keywords.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published