Skip to content

slilichenko/streaming-dataflow-examples

Repository files navigation

Beam Pipelines - Streaming Analytics Techniques

This project is a demo for several Beam techniques to do streaming analytics.

Running the demo

  1. Create a GCP project
  2. Create a file in terraform directory named terraform.tfvars with the following content:
    project_id = "<GCP Project Id>"
    
    There are additional Terraform variables that can be overwritten; see variables.tf for details.
  3. Run the following commands:
    export PROJECT_ID=<project-id>
    export GCP_REGION=us-central1
    export BIGQUERY_REGION=us-central1
  4. Create BigQuery tables, Pub/Sub topics and subscriptions, and GCS buckets by running this script:
    source ./setup-env.sh
  5. Start event generation process:
    ./start-event-generation.sh
  6. Start the event processing pipeline:
    (cd pipeline; ./run-streaming-pipeline.sh)
  7. Optionally, start the pipeline which will ingest the findings sent as pubsub messages into BigQuery:
    ./start-findings-to-bigquery-pipeline.sh

Cleaning up

  1. Shutdown the pipelines via GCP console (TODO: add scripts)
  2. Run this command:
    cd terraform; terraform destroy

Alternatively, delete the project you created.

Disclaimer

The techniques and code contained here are not supported by Google and is provided as-is (under Apache license). This repo provides some options you can investigate, evaluate and employ if you choose to.

About

Streaming Dataflow Samples

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published