SEC Bridge

Project Overview

Emphasizing financial data processing and pipeline automation

EC-Bridge is a financial data pipeline and analysis system designed to extract, transform, and validate SEC financial statement data efficiently. The project aims to support analysts conducting fundamental analysis of US public companies by building a robust, scalable, and structured financial database using Snowflake, Airflow, FastAPI, and Streamlit.

Team Members

Vedant Mane
Abhinav Gangurde
Yohan Markose

Attestation:

WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK

Resources

Streamlit Application: Streamlit App

Backend API: Google Cloud Run

Airflow API: Airflow

Google Codelab: Codelab

Google Docs: Project Document

Video Walkthrough: Video

Technologies Used

Streamlit: Frontend Framework
FastAPI: API Framework
Google Cloud Run: Backend Deployment
AWS S3: External Cloud Storage
Cloud & Storage: Snowflake, AWS S3
ELT & Pipeline: Apache Airflow
Programming: Python, SQL, JSON Transformation
Validation & Testing: DBT (Data Built Tool)

Application Workflow Diagram

Workflow

1. Initial User Input

SEC files are quarterly financial reports
Users select: year, quarter, and processing pipeline (RAW/JSON/Fact Tables)

2. Airflow Processing

Triggers appropriate pipeline based on user selection
RAW Pipeline:
- Stores SEC file in S3
- Converts to CSV format
- Loads to Snowflake as VARCHAR
- Creates processed tables with correct datatypes
Fact Tables Pipeline:
- Loads S3 data to Snowflake
- Uses DBT for staging views
- Creates fact tables for Balance Sheet, Income Statement, and Cash Flow
JSON Pipeline:
- Converts SEC files to parquet
- Generates company-specific JSON files
- Loads to Snowflake as VARIANT type

3. Secondary User Interface

Users can verify data availability for specific year/quarter
Run custom queries on available tables

4. Data Retrieval

FastAPI handles database connections
Returns query results to Streamlit interface

Environment Setup

Required Python Version 3.12.*

1. Clone the Repository

git clone https://github.com/BigDataIA-Spring2025-4/DAMG7245_Assignment02.git
cd DAMG7245_Assignment02

2. Setting up the virtual environment

python -m venv venvsource venv/bin/activate
pip install -r requirements.txt

Setting Up Airflow

Go inside the airflow directorydocker compose up -d

docker compose up -d

3. AWS S3 Setup

Step 1: Create an AWS Account

Go to AWS Signup and click Create an AWS Account.
Follow the instructions to enter your email, password, and billing details.
Verify your identity and choose a support plan.

Step 2: Log in to AWS Management Console

Visit AWS Console and log in with your credentials.
Search for S3 in the AWS services search bar and open it.

Step 3: Create an S3 Bucket

Click Create bucket.
Enter a unique Bucket name.
Select a region closest to your users.
Configure settings as needed (e.g., versioning, encryption).
Click Create bucket to finalize.

4. Google Cloud SDK Setup

Step 1: Download and Install Google Cloud SDK

Visit the Google Cloud SDK documentation for platform-specific installation instructions.
Download the installer for your operating system (Windows, macOS, or Linux).
Follow the installation steps provided for your system.

Step 2: Initialize Google Cloud SDK

Open a terminal or command prompt.
Run gcloud init to begin the setup process.
Follow the prompts to log in with your Google account and select a project.

Step 3: Verify Installation

Run gcloud --version to confirm installation.
Use gcloud config list to check the active configuration.

5. Setting up the Docker Image on Google Cloud Run

Build the Docker Image

# Build and tag your image (make sure you're in the project directory)
docker build --platform=linux/amd64 --no-cache -t gcr.io/<YOUR_PROJECT_ID>/fastapi-app .

Test Locally (Optional but Recommended)

# Run the container locally
docker run -p 8080:8080 gcr.io/<YOUR_PROJECT_ID>/fastapi-app

# For Managing Environment Variables
docker run --env-file .env -p 8080:8080 gcr.io/<YOUR_PROJECT_ID>/fastapi-app

Visit http://localhost:8080/docs to verify the API works.

Push to Google Container Registry

# Push the image
docker push gcr.io/<YOUR_PROJECT_ID>/fastapi-app

Deploy to Cloud Run

gcloud run deploy fastapi-service \
  --image gcr.io/<YOUR_PROJECT_ID>/fastapi-app \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

Get your Service URL

gcloud run services describe fastapi-service \
  --platform managed \
  --region <REGION> \
  --format 'value(status.url)'

Check Application Logs

gcloud run services logs read fastapi-service --region <REGION>

References

Streamlit documentation

SQLAlchemy Integration

FastAPI Documentation

Apache Airflow

Snowflake Code Lab

Snowflake Python Connector

DBT

Snowflake, Airflow, DBT Code Lab

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.devcontainer		.devcontainer
airflow		airflow
backend/api		backend/api
frontend		frontend
prototype		prototype
services		services
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_stream.png		data_stream.png
diagram.py		diagram.py
dockerfile		dockerfile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEC Bridge

Project Overview

Team Members

Attestation:

Resources

Technologies Used

Application Workflow Diagram

Workflow

1. Initial User Input

2. Airflow Processing

3. Secondary User Interface

4. Data Retrieval

Environment Setup

1. Clone the Repository

2. Setting up the virtual environment

Setting Up Airflow

3. AWS S3 Setup

4. Google Cloud SDK Setup

5. Setting up the Docker Image on Google Cloud Run

References

About

Contributors 3

Languages

License

BigDataIA-Spring2025-4/DAMG7245_Assignment02

Folders and files

Latest commit

History

Repository files navigation

SEC Bridge

Project Overview

Team Members

Attestation:

Resources

Technologies Used

Application Workflow Diagram

Workflow

1. Initial User Input

2. Airflow Processing

3. Secondary User Interface

4. Data Retrieval

Environment Setup

1. Clone the Repository

2. Setting up the virtual environment

Setting Up Airflow

3. AWS S3 Setup

4. Google Cloud SDK Setup

5. Setting up the Docker Image on Google Cloud Run

References

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

Languages