Name		Name	Last commit message	Last commit date
parent directory ..
etl-cicd-pipeline		etl-cicd-pipeline
etl-s3-upload-pipeline		etl-s3-upload-pipeline
.DS_Store		.DS_Store
README.md		README.md

README.md

ETL Flow Examples

This directory contains examples of different ETL (Extract, Transform, Load) workflows implemented using Prefect.

Examples

1. ETL CI/CD Pipeline

Located in /etl-cicd-pipeline

A complete production-ready ETL pipeline that demonstrates:

GitHub API data extraction
AWS S3 integration for data storage
Data validation and cleaning
Automated deployment via GitHub Actions
Email notifications
Error handling and retries
Environment configuration management

Key features:

Uses Docker infrastructure
Includes CI/CD pipeline configuration
Configurable via environment variables
Production-ready monitoring setup

The pipeline extracts repository data from GitHub, processes it through multiple transformation stages, and loads it into S3 buckets. It includes comprehensive error handling and notification systems.

2. ETL S3 Upload Pipeline

Located in /etl-s3-upload-pipeline

A streamlined ETL pipeline demonstrating two implementation approaches:

Full Prefect Integration (`full-example-etl.py`)

Uses Prefect's complete feature set
Prefect Blocks for AWS credentials
Built-in S3 integration with prefect-aws
Email notifications using prefect-email
Human-in-the-loop capabilities
Automatic retries and error handling
Flow run tracking and observability

Orchestration Only (`simple-example-etl.py`)

Lightweight implementation using basic Prefect features
Direct boto3 integration with AWS
Simpler configuration management
Minimal dependencies
More control over AWS interactions
Basic error handling and logging

Getting Started

Each example contains:

Detailed README with setup instructions
Environment configuration examples
Complete source code
Deployment instructions

To use any example:

Navigate to the specific example directory
Follow the README instructions
Configure the required environment variables
Deploy using the provided scripts

Prerequisites

General Requirements

Python 3.12+
AWS account and credentials
Prefect installation (pip install prefect)

CI/CD Pipeline Additional Requirements

Prefect Cloud account
Docker
GitHub account with repository access
GitHub Actions enabled

S3 Upload Pipeline Additional Requirements

prefect-aws for full integration example
boto3 for simple example
pandas for data processing
pydantic for data validation

Environment Configuration

CI/CD Pipeline

AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_DEFAULT_REGION=us-east-1
S3_BUCKET_NAME=your-bucket-name
NOTIFICATION_EMAIL=alerts@example.com

S3 Upload Pipeline

AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-bucket-name

Pipeline Flow Structure

Both pipelines follow a similar ETL pattern:

Extract data from source
Store raw data in S3
Transform and validate data
Clean and structure data
Perform aggregations
Load final results to S3

The main difference is in the implementation approach and additional features included in each version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl

etl

README.md

ETL Flow Examples

Examples

1. ETL CI/CD Pipeline

2. ETL S3 Upload Pipeline

Full Prefect Integration (`full-example-etl.py`)

Orchestration Only (`simple-example-etl.py`)

Getting Started

Prerequisites

General Requirements

CI/CD Pipeline Additional Requirements

S3 Upload Pipeline Additional Requirements

Environment Configuration

CI/CD Pipeline

S3 Upload Pipeline

Pipeline Flow Structure

Files

etl

Directory actions

More options

Directory actions

More options

Latest commit

History

etl

Folders and files

parent directory

README.md

ETL Flow Examples

Examples

1. ETL CI/CD Pipeline

2. ETL S3 Upload Pipeline

Full Prefect Integration (full-example-etl.py)

Orchestration Only (simple-example-etl.py)

Getting Started

Prerequisites

General Requirements

CI/CD Pipeline Additional Requirements

S3 Upload Pipeline Additional Requirements

Environment Configuration

CI/CD Pipeline

S3 Upload Pipeline

Pipeline Flow Structure

Full Prefect Integration (`full-example-etl.py`)

Orchestration Only (`simple-example-etl.py`)