Skip to content

IDenra/elon_musk_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

elon_musk_dataset

Preparing datasets with Elon Musk phrases.

Preparing

  1. Install poetry
  2. Install dependencies: poetry install
  3. Activate virtual environment: poetry shell

Scrapping

To scrap data from rev.com use command like that:
PYTHONPATH=. python scrap/rev.py --base_url https://www.rev.com/blog/transcripts?s=Elon+Musk --save_path rev-interviews.jsonlines

From vox.com:
PYTHONPATH=. python scrap/vox.py --interview_url https://www.vox.com/2018/11/2/18053428/recode-decode-full-podcast-transcript-elon-musk-tesla-spacex-boring-company-kara-swisher --save_path vox-interview.jsonlines

Collecting data

By script

PYTHONPATH=. python collect_data/run.py collect-all-dialogs --interview-paths=rev-interviews.jsonlines,vox-interview.jsonlines --save-path=all_dialog_dataset.csv

PYTHONPATH=. python collect_data/run.py collect-short-answer-dialogs --interview-paths=rev-interviews.jsonlines,vox-interview.jsonlines --save-path=short_answer_dialog_dataset.csv

PYTHONPATH=. python collect_data/run.py collect-all-phrases --interview-paths=rev-interviews.jsonlines,vox-interview.jsonlines --save-path=all_phrases_dataset.csv

By ipynb

  • notebooks/create_twitter_dataset.ipynb
  • notebooks/create_oneliners_dataset.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published