This project aims to examine public traffic camera feeds, build vision models that can detect objects in these feeds, and to store the object counts for further analysis such as trending over time. Camera data is stored in the short term to enable models to be trained, and to support late data arrival. Once processed, the camera data no longer needs to be stored unless alternative models are required to be trialled later.
This document describes how to host the project on a single machine.
- NE Travel Data ingestion is not supported
- For simplicity, database support is not provided - instead CSV files are generated one per day (akin to database sharding), which can be imported into a database as required.
- Time series analysis has not been ported to
localhost
; the R code is available incloud/vm
, but assumes the data is hosted in BigQuery.
The object identification is experimental, and hence needs to be re-evaluated multiple times across the same dataset. With this in mind, rather than streaming data through a model to generate results, we instead collect the data in bucket storage for later processing (and potentially re-processing with a refined model). This creates the following requirements:
- Discover available image sources
- Update image sources regularly (e.g. daily)
- Download all images at regular intervals and store for later processing
Once imagery is downloaded, we can then process. There are two scenarios:
- Run a specified model over historical data, or
- Run a specified model over newly received data
Finally, images can be deleted once no longer required.
This is initiated by manually looking for public sources of data, as we are focused on a publicly available and reusable solution. If you are using your own data, this step is simply to determine what data you have and where it is stored - assuming you will follow this workflow and upload your data to google cloud to be processed.
The result of this phase is a JSON file per camera provider
(e.g. Transport for London (TfL), North East Travel Information),
which is stored in a localhost/config/ingest
folder. For instance, TfL cameras
sources would be stored in localhost/config/ingest/TfL-images.json
, which is a list
of image URLs.
Given suppliers of publicly available cameras often provide API
endpoints on the web, we have a python tool that calls these
once a day to update the list of camera images we should
download. This is the scripts/localhost/update_sources.py
script,
which generates a JSON file describing each available camera
and its image URL where it can be downloaded.
Detailed usage instructions are presented in scripts/localhost/README.md
.
This script needs to be called daily (or as often as you wish to updated of available imagery); we recommend daily at 3am. Other suppliers may not provide such an endpoint, in which case this function is not needed.
Note you may wish to reduce the number of images the system ingests - in which case, run this utility once to
populate the JSON
files, then remove unnecessary cameras. For example, TfL will ingest >900 cameras, yet you may
only wish to download (and later process) far fewer. Don't forget to likewise reduce the cameras to be analysed to
reflect the reduced list, otherwise the CSV will contain many entries flagged as missing
.
The python script scripts/localhost/download_files.py
needs to be called at 10 minute intervals past the hour,
which will request each image listed in the JSON files
stored in the localhost/config/ingest
folder. Each image is stored in
the localhost/data
folder, using the naming convention
<camera supplier>/<date as YYYYMMDD>/<time HHMM>/<camera ID>.<format>
.
- "Camera supplier" is the name of the JSON file (without the
.json
extension) - Date and time are when the request was triggered - not when the
image was actually captured
- Hence the date/time may be up to 10 minutes out
- However, this ensures a consistent, simplified folder structure
- Actual capture date/time may be stored in JPEG metadata, depending on supplier
- Camera ID is the base name of the source image file - which we assume is a unique ID for that supplier's camera
- File extension is as originally provided - it may be JPEG, or it something else; for instance, TfL offer both JPEG and MP4.
To trigger regular downloads, the local system's scheduler should be used. Under Microsoft Windows there is the Task Scheduler (note: this link is for developer API access rather than the UI) or UN*X crontab (also suitable for MacOS).
scripts/localhost/download_files.py
internally retries downloads in the case when the
URL doesn't return a 200 (success code) or 404 (not found) - e.g. if a 504
"Gateway timeout" error code is returned, then the client should try again.
A random length delay is triggered before retrying to improve load balance and chances of success.
Detailed usage instructions are presented in scripts/localhost/README.md
.
Using crontab
we are able to execute Python scripts automatically at regular intervals. This is perfect for running
bash scripts which in turn will execute download_files.py
and process_scheduled.py
. Below is a quick simple guide
on installing and setting up crontab
for users on MacOS. We highly recommend NOT setting up chrono_lens within the
Documents
folder due to its symbolic link nature. Instead we recommend the 'Home' folder.
1. Creating Bash Scripts for crontab
Within a text editor you will need to write two simple bash scripts to 1. activate your virtual environment
(both will do this), 2. one will execute download_file.py
, and 3. the second will execute process_scheduled.py
.
An example bash script for download_files.py:
#!/bin/bash
export PYTHONPATH='/location/of/your/chrono_lens_folder'
cd /location/of/your/chrono_lens_folder
source venv/bin/activate
python3 scripts/localhost/download_files.py
Save the above as download_files_bash_script.sh
. Repeat accordingly for process_scheduled.py
.
2. Accessing crontab
Within the terminal app execute sudo crontab -u your_username -e
and crontab
will open in the VI text editor.
Enter SHIFT + I
to allow editing ("insert" mode in the VI editor). Enter the following:
0/10 * * * * /location/of/your/download_files_bash_script.sh
0/10 * * * * /location/of/your/process_scheduled_bash_script.sh
Enter ESC
and then SHIFT + Z
and SHIFT + Z
again. This will save your edits to crontab
.
The 0/10 * * * *
commands an execution every 10-minutes starting at 0 minutes past the hour; we found
crontab guru useful for an explanation of the fields.
3. Setting Permissions (to Enable crontab
to Access Your Scripts)
WARNING this step will ensure crontab
can access your files - you may be able to use weaker permissions, but the below
approach has been tested to work. You should consider carefully if you want to do this on a shared / multi-user machine,
as this will render your files visible to other users.
We need to ensure all of the bash scripts and tree of directories above has 'read, execute access for everyone'. We can achieve this by executing simple commands within the terminal application. Simply go to the location of your saved bash script and execute the following separately:
chmod 755 download_files_bash_script.sh
chmod 755 process_scheduled_bash_script.sh
chmod 755
within the folder containing your bash scripts and each parent folder above.
Images and counts should now be populating your chrono_lens/local_host
sub-folders within 10 minutes.
The storage of the data in a local folder enables us to re-process data with different
models, and to compare results to see impact of models. With this in mind,
we need to be able to run any model, and store its results for comparison.
The approach we took is to store models parameters in the localhost/config/models
folder,
where the root folder name is a specific model version - hence for the RCNN model
received from Newcastle University, we have a folder called NewcastleV0
. Inside the
bucket the file configuration.json
defines any specific parameters - such as
filename of serialised model.
In addition, models can be daisy-chained to form a pipeline, where the pipeline is defined
as a sequence of model names separated by _
.
At present this is limited to pre-processing with a FaultyImageFilter
and
post-processing with a StaticObjectFilter
,
used when the model name is defined as FaultyImageFilterV0_NewcastleV0_StaticObjectFilterV0
.
The specific model to use is declared in localhost/config/analyse-configuration.json
.
NOTE: given that settings are local to the user, we do not store them in GitHub. However,
as a starting point, we have provided our configurations in the chrono_lens/localhost/exampleJSON
folder.
The JSON files present need to be copied to the appropriate localhost
subfolder; this is achieved by replacing
underscores (_
) with folder separators (which depend on your local system; Windows will need \
, UNX will need /
).
For example, localhost_config_models_FaultyImageFilterV0_configuration.json
should be copied to a UNX folder
of localhost/config/models/FaultyImageFilterV0/configuration.json
. Note that the RCNN weights must be supplied
as an additional file - see next section.
The script to use is scripts/locahost/process_scheduled.py
;
detailed usage instructions are presented in scripts/localhost/README.md
.
This model is detected by its name starting with Newcastle
- underneath it is an RCNN
defined using tensorflow. The JSON configuration file defines the file name of the
serialised model weights, and reference information for the user (ignored by the pipeline),
such as GitHub source. The user can hence provide multiple variants of the model by putting
each model in a different root folder. A local copy of the serialised weights is provided at
tests/test_data/test_detector_data/fig_frcnn_rebuscov-3.pb
and must be copied into the
localhost/config/models/NewcastleV0/
folder.
This filter rejects images if they are corrupt; two forms are detected:
- Large portion of image contain repeated rows of exactly the same R,G,B values, and
- Large proportion of a pure greyscale where R=G=B (for different values of pure grey in the area); this detects a "camera off" static image.
The constants that define thresholds for faulty image detection are defined in the
JSON configuration file. To run alternative settings, a different folder should be created
so that existing model configurations are unchanged to enable re-running of models as required
and hence ensure repeatability. The new configuration hence has a new configuration folder, and this
is what is supplied (for instance, you may make a v1 variant, so model name would be
FaultyImageFilterV0
, and could be used with the Newcastle model via
FaultyImageFilterV1_NewcastleV0
or with the static object filter with
FaultyImageFilterV1_NewcastleV0_StaticObjectFilterV0
).
The constants that define thresholds for static object detection are defined in the
JSON configuration file. To run alternative settings, a different folder should be created
so that existing model configurations are unchanged to enable re-running of models as required
and hence ensure repeatability. The new configuration hence has a new configuration folder, and this
is what is supplied (for instance, you may make a v1 variant, so model name would be
FaultyImageFilterV1
, and could be used with the Newcastle model via NewcastleV0_StaticObjectFilterV1
).
The python script scripts/localhost/batch_process_images.py
will run selected models over named cameras
across a selected date range. Refer to scripts/localhost/README.md
for full instructions.
Once images have been processed, they do not need to be retained, unless they need to be
re-processed (e.g. if model experimentation is being carried out). The script scripts/localhost/remove_old_images.py
deletes images older than 28 days (4 weeks), as this is deemed a sufficient window to detect
recent issues and sufficient historical images to correct the time series.
Detailed usage instructions are presented in scripts/localhost/README.md
.
Note, running the default models and camera locations provided will yield results that differ from those published by ONS Faster Indicators. This difference will arise from internal factors such as variation in camera sampling times and manual imputation when cameras are unavailable for sustained periods. Furthermore, external changes such as camera addresses changing over time will impact results. The current lists of selected cameras are valid as of 25/01/2021.
The python script scripts/localhost/update_sources.py
populates the folder
localhost/config/ingest
, creating one JSON file per camera image supplier. However, images are not analysed unless
they are specified in localhost/config/analyse
JSON files
Our default models have already been installed for you. However this section will detail how to add your own models if you so wish to do so.
Each model, pre- and post-processor follows the naming pattern: objectV#
where the V#
is used to demark a version
number - such as V0
. This enables multiple configurations to be defined, with the idea that existing configurations
that have been used are not modified, but new variants created instead. With this, previous processing runs
can be recreated with ease, as the full name of each model/processor is used to name the database where results are
stored. Already set-up and ready, we have 1 of each defined (model, pre, post):
FaultyImageFilterV0
- pre-processor, marks images as faulty if they are unchanging or contain "camera unavailable" imageryNewcastleV0
- a model, an RCNN as provided from Newcastle University's Urban Observatory (see fileuo-object_counting/app/fig_frcnn_rebuscov-3.pb
)StaticObjectFilterV0
- a post-processor, that rejects detected objects that do not move between frames
If, for instance, the pre-processor FaultyImageFilterV0
and the model NewcastleV0
are used, then a folder
named FaultyImageFilterV0_NewcastleV0
will be created in localhost/counts
where a CSV file will be generated
for each calendar day.
Camera locations have already been added for you and include TfL. The following section details how one would add additional locations.
Each camera supplier needs a JSON file in the localhost/config/analyse
folder, and is used to determine which images will be processed.
An JSON files is present in the chrono_lens/localhost/exampleJSON/localhost_config_analyse_TfL-images.json
file. Each JSON file contains a list of camera names, for example, TfL-images.json
:
[
"00001.01251",
"00001.01252",
"00001.01445",
"00001.01606",
"..."
]
The named cameras in a JSON file are mapped to images named <JSON base filename>/YYYYMMDD/HHMM/<camera name>.jpg
, where
"JSON base filename" is the JSON filename without the ".json" extension.