Just a random scraper to retrieve some data about movies listed on Allociné.fr.
The script will save movie data available on the http://www.allocine.fr/films webpage as a .csv
file and in a postgres database.
The movie attributes retrieved when available are:
-
The movie ID ;
-
The title ;
-
The release date ;
-
The duration ;
-
The genre(s) ;
-
The director(s) ;
-
The main actor(s) ;
-
The press rating ;
-
The spectators rating ;
-
The movie Summary.
First, clone the repository:
git clone git@github.com:kinoute/scraper-allocine.git
Go to the folder and build the container:
docker-compose build
# or "make build"
Important: First, you have to rename the .env.dist
template file to .env
. Then fill it with your own values. At first start, the postgres environment variables will be used to create the postgres server.
By default, the script will:
- Scrap the first 50 pages of Allociné ;
- Save every movie to the postgres database in its own container ;
- Wait 10 seconds between each page scraped ;
- Save the full results in a csv filename called
allocine.csv
in thefiles
folder.
To run the script with these default options, simply do:
docker-compose up --build
# or make start
The script has 3 customizable options that can be changed in the .env
file:
- The number of pages to scrap (Default: 50) ;
- The time in sec to wait before each page is scraped (Default: 10) ;
- The CSV filename where results will be stored (Default:
allocine.csv
).
The script automatically update and save the results after every page scraped for the .csv
file. For postgres, the database is updated on every movie scraped.
If for whatever reason, you want to stop the scraping, just do Ctrl+C
in your Terminal.
While the scraper is running, you can connect into the postgres container and use psql
to do any SQL operation by typing make admin-db
in your project.
You can also simply type make test-db
. It should return 5 records for the movies table if everything went well.
This script was just made for fun to play around with BeautifulSoup and Python. Please don't use to do bad things and ruin Allociné servers!