Skip to content

mf2199/BBT-temporary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Read and Write Bigtable with Google Dataflow

Description

This code demonstrates how to connect,read and write data to bigtable using Google Dataflow.

Install the Docker

  1. You need to create the Dockerfile.
docker build . -t python27_beam_docker
  1. List the images
docker image ls
  1. Run your docker image, with your sharing folder setup(-v).
docker run --security-opt seccomp:unconfined -it -v C:\Users\Juan\Project\python:/usr/src/app python2_beam_docker bash

the first element in the -v argument is the windows shared path and the second is your path in the docker container.(you need to separate it with a :). python2_beam_docker is the name of your docker image.

  1. Go to your Work directory.
cd /usr/src/app/
  1. Install your dependencies.
pip install -r requirements.txt
  1. Add the Google Credentials.
export GOOGLE_APPLICATION_CREDENTIALS="/path/of/your/credential/file/file.json"

Create the Bigtable extra file Package to upload to Dataflow.

  1. This code demonstrates how to connect create and upload a extra package to Dataflow and run a code in Dataflow without problems.

    ├── beam_bigtable           # Package folder
        ├─── __init__.py        # Package Initialization file.
        ├─── __version__.py     # Version file.
        └─── bigtable.py        # Your Package Code.
    ├── setup.py                # Setup Code
    ├── LICENSE
    └── README.md
    
  2. Create the Setup.py using the name of the package create a folder with that name and then create in that folder a file init.py, and import your package file in it. 3.Run the command to create the file to install the new package.

    $ python setup.py sdist --formats=gztar
  3. Install your code in your system, because, your going to use it to run your python dataflow code.

    $ pip install beam_bigtable-0.1.1.tar.gz
  4. Set the arguments in the PipelineOptions Need to use extra_package and setup_file arguments. extra_package set the path of the compress package. setup_file set the file to install this package.

    [
        '--experiments=beam_fn_api',
        '--project=project-id',
        '--requirements_file=requirements.txt',
        '--runner=dataflow',
        '--staging_location=gs://storage-instance/stage',
        '--temp_location=gs://storage-instance/temp',
        '--setup_file=./beam_bigtable/setup.py',
        '--extra_package=./beam_bigtable/dist/beam_bigtable-0.1.1.tar.gz'
    ]

Run the examples

Run the Read Example

python test_million_rows_read.py

Run the Write Example

python test_million_rows_write.py
  1. Go to Dataflow API.

Contributing changes

Licensing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages