This project analyzes the Foursquare Open Source Places dataset to explore the distribution of coffee shops across the United States, with a special focus on Portland, Oregon. It provides insights into top coffee chains, their locations, open versus closed status, and closure trends over time.



The project is divided into two main components:
- Analysis Script (coffee_shops.py): Processes the dataset, performs data analysis, and generates summary reports.
- Visualization Module (coffee-shops-viz.py): Creates interactive visualizations such as bar charts, choropleth maps, and heatmaps based on the analyzed data.
This script fetches and processes the Foursquare Places dataset (from S3 or a local file), analyzes coffee shop distribution, and generates summary reports. It supports caching for faster subsequent runs and can create maps for top coffee chains.
Key features include:
- Filtering and processing coffee shop data (excluding Starbucks).
- Analyzing top chains by location count.
- Examining open versus closed status and closure trends.
- Exporting processed data in CSV, Parquet, and GeoJSON formats.
This module provides functions to create various interactive visualizations based on the analyzed data. It can be used independently with properly formatted data or alongside the analysis script.
Visualizations include:
- Bar charts of top coffee chains and open/closed status.
- Choropleth maps of coffee shops by state.
- Heatmaps and marker maps for specific chains or regions (e.g., Portland, OR).
- Closure trend charts and comprehensive dashboards.
To run this project, you need the following Python libraries:
- pandas
- polars
- numpy
- matplotlib
- altair
- plotly
- folium
- requests
- pyarrow For optimized data processing, you can optionally install Daft.
Install the required libraries using pip:
bash pip install pandas polars numpy matplotlib altair plotly folium requests pyarrow To include Daft (optional):
bash pip install getdaft
Run the coffee_shops.py script to process and analyze the coffee shop data. It accepts several command-line arguments to customize the analysis:
bash python coffee_shops.py [--output OUTPUT_DIR] [--save-maps] [--chains TOP_N] [--skip-daft] [--local-data FILE]
- --output OUTPUT_DIR: Directory to save output files (default: ./output).
- --save-maps: Save generated maps as HTML files.
- --chains TOP_N: Number of top chains to analyze (default: 5).
- --skip-daft: Skip Daft processing and load processed CSV if available.
- --local-data FILE: Use a local Parquet/CSV file instead of fetching from S3.
To analyze the top 10 coffee chains and save maps:
bash python coffee_shops.py --chains 10 --save-maps
- Fetch and process the dataset (or load from cache/local file).
- Analyze the top 10 coffee chains.
- Generate and save interactive maps for each chain and a US heatmap.
- Save summary reports and processed data in the ./output directory.
The coffee-shops-viz.py module offers functions to generate visualizations from the analyzed data.
Import and use it in your Python code as follows: python from coffee_shops_viz import CoffeeShopVisualizer
- viz = CoffeeShopVisualizer(output_dir="./output")
- viz.create_chain_bar_chart(top_chains) # Bar chart of top chains
- viz.create_open_closed_chart(status_df) # Open vs. closed status chart
- viz.create_portland_map(pdx_df) # Portland coffee shop map
This will generate HTML and PNG files in the specified output directory (./output/visualizations).
The analysis utilizes the Foursquare Open Source Places dataset, offering comprehensive location data for places across the United States.
The project generates the following outputs in the specified output directory (default: ./output):
- us_coffee_shops.csv and us_coffee_shops.parquet (all US coffee shops).
- portland_coffee_shops.csv (Portland-specific data).
- coffee_shops.geojson (for mapping applications).
- Maps (if --save-maps is used):
- HTML files for top chain locations, US heatmap, and Portland map (in ./output/maps).
- Visualizations (from coffee-shops-viz.py):
- HTML and PNG files for charts and maps (in ./output/visualizations).
If you encounter issues or have suggestions for improvements, please open an issue on the GitHub repository.
- Foursquare OS Places Open Dataset (https://location.foursquare.com/resources/blog/products/foursquare-open-source-places-a-new-foundational-dataset-for-the-geospatial-community/)
- Simon Willson's article (https://simonwillison.net/search/?q=Foursquare+OS+Places)
License
Distributed under the GNU Affero General Public License v3.0 License. See LICENSE
for more information.