Emphasizing financial data processing and pipeline automation
EC-Bridge is a financial data pipeline and analysis system designed to extract, transform, and validate SEC financial statement data efficiently. The project aims to support analysts conducting fundamental analysis of US public companies by building a robust, scalable, and structured financial database using Snowflake, Airflow, FastAPI, and Streamlit.
- Vedant Mane
- Abhinav Gangurde
- Yohan Markose
WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK
Streamlit Application: Streamlit App
Backend API: Google Cloud Run
Airflow API: Airflow
Google Codelab: Codelab
Google Docs: Project Document
Video Walkthrough: Video
- Streamlit: Frontend Framework
- FastAPI: API Framework
- Google Cloud Run: Backend Deployment
- AWS S3: External Cloud Storage
- Cloud & Storage: Snowflake, AWS S3
- ELT & Pipeline: Apache Airflow
- Programming: Python, SQL, JSON Transformation
- Validation & Testing: DBT (Data Built Tool)
- SEC files are quarterly financial reports
- Users select: year, quarter, and processing pipeline (RAW/JSON/Fact Tables)
- Triggers appropriate pipeline based on user selection
- RAW Pipeline:
- Stores SEC file in S3
- Converts to CSV format
- Loads to Snowflake as VARCHAR
- Creates processed tables with correct datatypes
- Fact Tables Pipeline:
- Loads S3 data to Snowflake
- Uses DBT for staging views
- Creates fact tables for Balance Sheet, Income Statement, and Cash Flow
- JSON Pipeline:
- Converts SEC files to parquet
- Generates company-specific JSON files
- Loads to Snowflake as VARIANT type
- Users can verify data availability for specific year/quarter
- Run custom queries on available tables
- FastAPI handles database connections
- Returns query results to Streamlit interface
Required Python Version 3.12.*
git clone https://github.com/BigDataIA-Spring2025-4/DAMG7245_Assignment02.git
cd DAMG7245_Assignment02
python -m venv venvsource venv/bin/activate
pip install -r requirements.txt
Go inside the airflow directorydocker compose up -d
docker compose up -d
Step 1: Create an AWS Account
- Go to AWS Signup and click Create an AWS Account.
- Follow the instructions to enter your email, password, and billing details.
- Verify your identity and choose a support plan.
Step 2: Log in to AWS Management Console
- Visit AWS Console and log in with your credentials.
- Search for S3 in the AWS services search bar and open it.
Step 3: Create an S3 Bucket
- Click Create bucket.
- Enter a unique Bucket name.
- Select a region closest to your users.
- Configure settings as needed (e.g., versioning, encryption).
- Click Create bucket to finalize.
Step 1: Download and Install Google Cloud SDK
- Visit the Google Cloud SDK documentation for platform-specific installation instructions.
- Download the installer for your operating system (Windows, macOS, or Linux).
- Follow the installation steps provided for your system.
Step 2: Initialize Google Cloud SDK
- Open a terminal or command prompt.
- Run
gcloud init
to begin the setup process. - Follow the prompts to log in with your Google account and select a project.
Step 3: Verify Installation
- Run
gcloud --version
to confirm installation. - Use
gcloud config list
to check the active configuration.
- Build the Docker Image
# Build and tag your image (make sure you're in the project directory)
docker build --platform=linux/amd64 --no-cache -t gcr.io/<YOUR_PROJECT_ID>/fastapi-app .
- Test Locally (Optional but Recommended)
# Run the container locally
docker run -p 8080:8080 gcr.io/<YOUR_PROJECT_ID>/fastapi-app
# For Managing Environment Variables
docker run --env-file .env -p 8080:8080 gcr.io/<YOUR_PROJECT_ID>/fastapi-app
Visit http://localhost:8080/docs to verify the API works.
- Push to Google Container Registry
# Push the image
docker push gcr.io/<YOUR_PROJECT_ID>/fastapi-app
- Deploy to Cloud Run
gcloud run deploy fastapi-service \
--image gcr.io/<YOUR_PROJECT_ID>/fastapi-app \
--platform managed \
--region us-central1 \
--allow-unauthenticated
- Get your Service URL
gcloud run services describe fastapi-service \
--platform managed \
--region <REGION> \
--format 'value(status.url)'
- Check Application Logs
gcloud run services logs read fastapi-service --region <REGION>