This code demonstrates how to connect,read and write data to bigtable using Google Dataflow.
- You need to create the
Dockerfile
.
docker build . -t python27_beam_docker
- List the images
docker image ls
- Run your docker image, with your sharing folder setup(
-v
).
docker run --security-opt seccomp:unconfined -it -v C:\Users\Juan\Project\python:/usr/src/app python2_beam_docker bash
the first element in the -v argument is the windows shared path and the second is your path in the docker container.(you need to separate it with a :). python2_beam_docker is the name of your docker image.
- Go to your Work directory.
cd /usr/src/app/
- Install your dependencies.
pip install -r requirements.txt
- Add the Google Credentials.
export GOOGLE_APPLICATION_CREDENTIALS="/path/of/your/credential/file/file.json"
-
This code demonstrates how to connect create and upload a extra package to Dataflow and run a code in Dataflow without problems.
├── beam_bigtable # Package folder ├─── __init__.py # Package Initialization file. ├─── __version__.py # Version file. └─── bigtable.py # Your Package Code. ├── setup.py # Setup Code ├── LICENSE └── README.md
-
Create the Setup.py using the name of the package create a folder with that name and then create in that folder a file init.py, and import your package file in it. 3.Run the command to create the file to install the new package.
$ python setup.py sdist --formats=gztar
-
Install your code in your system, because, your going to use it to run your python dataflow code.
$ pip install beam_bigtable-0.1.1.tar.gz
-
Set the arguments in the PipelineOptions Need to use extra_package and setup_file arguments. extra_package set the path of the compress package. setup_file set the file to install this package.
[ '--experiments=beam_fn_api', '--project=project-id', '--requirements_file=requirements.txt', '--runner=dataflow', '--staging_location=gs://storage-instance/stage', '--temp_location=gs://storage-instance/temp', '--setup_file=./beam_bigtable/setup.py', '--extra_package=./beam_bigtable/dist/beam_bigtable-0.1.1.tar.gz' ]
Run the Read Example
python test_million_rows_read.py
Run the Write Example
python test_million_rows_write.py
- Go to Dataflow API.
- See CONTRIBUTING.md
- See LICENSE