Skip to content

rharige/GrepTwitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grep Twitter

A small utility program to try out PostgreSQL DB and the support for JSON data type introduced in v9.3. Performs keyword-search on twitter, stores resulting tweet json data in PostgreSQL DB.


Create following two tables in your db:

CREATE TABLE tweets_db (id serial, data json, search_id integer);
CREATE TABLE tweets_metadb (id serial, start_time varchar(20), end_time varchar(20), search_mode smallint, keywords varchar(200));

(Notice that I am directly storing JSON data into db, without splitting fields.)

The db connection params & Twitter keys go into keys.cnf config file.

With above setup, program can be launched as shown below:

Usage: python search.py <mode:forward|backward> <keyword1, keyword2..>

Two search modes are supported: backward & forward (using Search API & Streaming API respectively).


JSON Support in PostgreSQL is pretty impressive. With the newly collected data in JSON format, I could run following SQL queries to filter it based on some basic criteria - something that I would have done in code otherwise.

  • In the downloaded data see how many of the tweets have/ do not have geo tags:
select count(*) from tweets_db where (data->>'geo') is not null;
select count(*) from tweets_db where (data->>'geo') is null;
  • Search for the keywords in tweets
select data->>'text' from tweets_db where data->>'text' LIKE '%bod%' limit 20;
  • Show only unique tweets
select count(distinct(data->>'id_str')) from tweets_db;
  • Filter tweets by date of creation
select data->>'id_str', data->>'created_at'::text as time from tweets_db where to_date(data->>'created_at'::text,'Dy Mon DD HH24:MI:SS +0000 YYYY')<'2014-03-18' limit 20;

These were some basics queries I tried. Ofcourse you could do more by using powerful SQL operators and functions on JSON data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages