Skip to content

Commit b8e18b0

Browse files
author
John Plaisted
authored
refactor(docker): make docker files easier to use during development. (datahub-project#1777)
* Make docker files easier to use during development. During development it quite nice to have docker work with locally built code. This allows you to launch all services very quickly, with your changes, and optionally with debugging support. Changes made to docker files: - Removed all redundant docker-compose files. We now have 1 giant file, and smaller files to use as overrides. - Remove redundant README files that provided little information. - Rename docker/<dir> to match the service name in the docker-compose file for clarity. - Move environment variables to .env files. We only provide dev / the default environment for quickstart. - Add debug options to docker files using multistage build to build minimal images with the idea that built files will be mounted instead. - Add a docker/dev.sh script + compose file to easily use the dev override images (separate tag; images never published; uses debug docker files; mounts binaries to image). - Added docs/docker documentation for this.
1 parent 43dfce8 commit b8e18b0

File tree

84 files changed

+699
-1434
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+699
-1434
lines changed

.github/workflows/docker-frontend.yml

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ jobs:
2121
echo "tag=$TAG"
2222
echo "::set-output name=tag::$TAG"
2323
- uses: docker/build-push-action@v1
24+
env:
25+
DOCKER_BUILDKIT: 1
2426
with:
2527
dockerfile: ./docker/frontend/Dockerfile
2628
username: ${{ secrets.DOCKER_USERNAME }}

.github/workflows/docker-gms.yml

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ jobs:
2121
echo "tag=$TAG"
2222
echo "::set-output name=tag::$TAG"
2323
- uses: docker/build-push-action@v1
24+
env:
25+
DOCKER_BUILDKIT: 1
2426
with:
2527
dockerfile: ./docker/gms/Dockerfile
2628
username: ${{ secrets.DOCKER_USERNAME }}

.github/workflows/docker-mae-consumer.yml

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ jobs:
2121
echo "tag=$TAG"
2222
echo "::set-output name=tag::$TAG"
2323
- uses: docker/build-push-action@v1
24+
env:
25+
DOCKER_BUILDKIT: 1
2426
with:
2527
dockerfile: ./docker/mae-consumer/Dockerfile
2628
username: ${{ secrets.DOCKER_USERNAME }}

.github/workflows/docker-mce-consumer.yml

+2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ jobs:
2121
echo "tag=$TAG"
2222
echo "::set-output name=tag::$TAG"
2323
- uses: docker/build-push-action@v1
24+
env:
25+
DOCKER_BUILDKIT: 1
2426
with:
2527
dockerfile: ./docker/mce-consumer/Dockerfile
2628
username: ${{ secrets.DOCKER_USERNAME }}

.gitignore

-3
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,8 @@ metadata-events/mxe-registration/src/main/resources/**/*.avsc
1717
.java-version
1818

1919
# Python
20-
.env
2120
.venv
22-
env/
2321
venv/
24-
ENV/
2522
env.bak/
2623
venv.bak/
2724
.mypy_cache/

docker/README.md

+42-13
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,56 @@
11
# Docker Images
2+
3+
## Prerequisites
4+
You need to install [docker](https://docs.docker.com/install/) and
5+
[docker-compose](https://docs.docker.com/compose/install/) (if using Linux; on Windows and Mac compose is included with
6+
Docker Desktop).
7+
8+
Make sure to allocate enough hardware resources for Docker engine. Tested & confirmed config: 2 CPUs, 8GB RAM, 2GB Swap
9+
area.
10+
11+
## Quickstart
12+
213
The easiest way to bring up and test DataHub is using DataHub [Docker](https://www.docker.com) images
314
which are continuously deployed to [Docker Hub](https://hub.docker.com/u/linkedin) with every commit to repository.
415

16+
You can easily download and run all these images and their dependencies with our
17+
[quick start guide](../docs/quickstart.md).
18+
19+
DataHub Docker Images:
20+
521
* [linkedin/datahub-gms](https://cloud.docker.com/repository/docker/linkedin/datahub-gms/)
622
* [linkedin/datahub-frontend](https://cloud.docker.com/repository/docker/linkedin/datahub-frontend/)
723
* [linkedin/datahub-mae-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mae-consumer/)
824
* [linkedin/datahub-mce-consumer](https://cloud.docker.com/repository/docker/linkedin/datahub-mce-consumer/)
925

10-
Above Docker images are created for DataHub specific use. You can check subdirectories to check how those images are
11-
generated via [Dockerbuild](https://docs.docker.com/engine/reference/commandline/build/) files or
12-
how to start each container using [Docker Compose](https://docs.docker.com/compose/). Other than these, DataHub depends
13-
on below Docker images to be able to run:
26+
Dependencies:
1427
* [**Kafka and Schema Registry**](kafka)
15-
* [**Elasticsearch**](elasticsearch)
28+
* [**Elasticsearch**](elasticsearch-setup)
1629
* [**MySQL**](mysql)
1730

18-
Local-built ingestion image allows you to create on an ad-hoc basis `metadatachangeevent` with Python script.
19-
The pipeline depends on all the above images composing up.
20-
* [**Ingestion**](ingestion)
31+
### Ingesting demo data.
2132

22-
## Prerequisites
23-
You need to install [docker](https://docs.docker.com/install/) and [docker-compose](https://docs.docker.com/compose/install/).
33+
If you want to test ingesting some data once DataHub is up, see [**Ingestion**](ingestion/README.md).
2434

25-
## Quickstart
26-
If you want to quickly try and evaluate DataHub by running all necessary Docker containers, you can check
27-
[Quickstart Guide](quickstart).
35+
## Using Docker Images During Development
36+
37+
See [Using Docker Images During Development](../docs/docker/development.md).
38+
39+
## Building And Deploying Docker Images
40+
41+
We use GitHub actions to build and continuously deploy our images. There should be no need to do this manually; a
42+
successful release on Github will automatically publish the images.
43+
44+
### Building images
45+
46+
To build the full images (that we are going to publish), you need to run the following:
47+
48+
```
49+
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub build
50+
```
51+
52+
This is because we're relying on builtkit for multistage builds. It does not hurt also set `DATAHUB_VERSION` to
53+
something unique.
54+
55+
This is not our recommended development flow and most developers should be following the
56+
[Using Docker Images During Development](#using-docker-images-during-development) guide.

docker/broker/env/docker.env

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
KAFKA_BROKER_ID=1
2+
KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
3+
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
4+
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
5+
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
6+
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
File renamed without changes.

docker/datahub-frontend/README.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# DataHub Frontend Docker Image
2+
3+
[![datahub-frontend docker](https://github.com/linkedin/datahub/workflows/datahub-frontend%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-frontend+docker%22)
4+
5+
Refer to [DataHub Frontend Service](../../datahub-frontend) to have a quick understanding of the architecture and
6+
responsibility of this service for the DataHub.
7+
8+
## Checking out DataHub UI
9+
10+
After starting your Docker container, you can connect to it by typing below into your favorite web browser:
11+
12+
```
13+
http://localhost:9001
14+
```
15+
16+
You can sign in with `datahub` as username and password.
+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
DATAHUB_GMS_HOST=datahub-gms
2+
DATAHUB_GMS_PORT=8080
3+
DATAHUB_SECRET=YouKnowNothing
4+
DATAHUB_APP_VERSION=1.0
5+
DATAHUB_PLAY_MEM_BUFFER_SIZE=10MB

docker/datahub-gms/Dockerfile

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Defining environment
2+
ARG APP_ENV=prod
3+
4+
FROM openjdk:8-jre-alpine as base
5+
ENV DOCKERIZE_VERSION v0.6.1
6+
RUN apk --no-cache add curl tar \
7+
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
8+
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
9+
10+
FROM openjdk:8 as prod-build
11+
COPY . /datahub-src
12+
RUN cd /datahub-src && ./gradlew :gms:war:build
13+
RUN cp /datahub-src/gms/war/build/libs/war.war /war.war
14+
15+
FROM base as prod-install
16+
COPY --from=prod-build /war.war /datahub/datahub-gms/bin/war.war
17+
COPY --from=prod-build /datahub-src/docker/datahub-gms/start.sh /datahub/datahub-gms/scripts/start.sh
18+
RUN chmod +x /datahub/datahub-gms/scripts/start.sh
19+
20+
FROM base as dev-install
21+
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
22+
# See this excellent thread https://github.com/docker/cli/issues/1134
23+
24+
FROM ${APP_ENV}-install as final
25+
26+
EXPOSE 8080
27+
28+
CMD /datahub/datahub-gms/scripts/start.sh

docker/datahub-gms/README.md

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# DataHub Generalized Metadata Store (GMS) Docker Image
2+
[![datahub-gms docker](https://github.com/linkedin/datahub/workflows/datahub-gms%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-gms+docker%22)
3+
4+
Refer to [DataHub GMS Service](../../gms) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
6+
7+
## Other Database Platforms
8+
9+
While GMS defaults to using MySQL as its storage backend, it is possible to switch to any of the
10+
[database platforms](https://ebean.io/docs/database/) supported by Ebean.
11+
12+
For example, you can run the following command to start a GMS that connects to a PostgreSQL backend.
13+
14+
```
15+
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.postgre.yml -p datahub up)
16+
```
17+
18+
or a MariaDB backend
19+
20+
```
21+
(cd docker/ && docker-compose -f docker-compose.yml -f docker-compose.mariadb.yml -p datahub up)
22+
```

docker/datahub-gms/env/docker.env

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
EBEAN_DATASOURCE_USERNAME=datahub
2+
EBEAN_DATASOURCE_PASSWORD=datahub
3+
EBEAN_DATASOURCE_HOST=mysql:3306
4+
EBEAN_DATASOURCE_URL=jdbc:mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8
5+
EBEAN_DATASOURCE_DRIVER=com.mysql.jdbc.Driver
6+
KAFKA_BOOTSTRAP_SERVER=broker:29092
7+
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
8+
ELASTICSEARCH_HOST=elasticsearch
9+
ELASTICSEARCH_PORT=9200
10+
NEO4J_HOST=neo4j:7474
11+
NEO4J_URI=bolt://neo4j
12+
NEO4J_USERNAME=neo4j
13+
NEO4J_PASSWORD=datahub
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
EBEAN_DATASOURCE_USERNAME=datahub
2+
EBEAN_DATASOURCE_PASSWORD=datahub
3+
EBEAN_DATASOURCE_HOST=mariadb:3306
4+
EBEAN_DATASOURCE_URL=jdbc:mariadb://mariadb:3306/datahub
5+
EBEAN_DATASOURCE_DRIVER=org.mariadb.jdbc.Driver
6+
KAFKA_BOOTSTRAP_SERVER=broker:29092
7+
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
8+
ELASTICSEARCH_HOST=elasticsearch
9+
ELASTICSEARCH_PORT=9200
10+
NEO4J_HOST=neo4j:7474
11+
NEO4J_URI=bolt://neo4j
12+
NEO4J_USERNAME=neo4j
13+
NEO4J_PASSWORD=datahub
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
EBEAN_DATASOURCE_USERNAME=datahub
2+
EBEAN_DATASOURCE_PASSWORD=datahub
3+
EBEAN_DATASOURCE_HOST=postgres:5432
4+
EBEAN_DATASOURCE_URL=jdbc:postgresql://postgres:5432/datahub
5+
EBEAN_DATASOURCE_DRIVER=org.postgresql.Driver
6+
KAFKA_BOOTSTRAP_SERVER=broker:29092
7+
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
8+
ELASTICSEARCH_HOST=elasticsearch
9+
ELASTICSEARCH_PORT=9200
10+
NEO4J_HOST=neo4j:7474
11+
NEO4J_URI=bolt://neo4j
12+
NEO4J_USERNAME=neo4j
13+
NEO4J_PASSWORD=datahub

docker/gms/start.sh docker/datahub-gms/start.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ dockerize \
66
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
77
-wait http://$NEO4J_HOST \
88
-timeout 240s \
9-
java -jar jetty-runner.jar gms.war
9+
java -jar /jetty-runner.jar /datahub/datahub-gms/bin/war.war
+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Defining environment
2+
ARG APP_ENV=prod
3+
4+
FROM openjdk:8-jre-alpine as base
5+
ENV DOCKERIZE_VERSION v0.6.1
6+
RUN apk --no-cache add curl tar \
7+
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
8+
9+
FROM openjdk:8 as prod-build
10+
COPY . datahub-src
11+
RUN cd datahub-src && ./gradlew :metadata-jobs:mae-consumer-job:build
12+
RUN cd datahub-src && cp metadata-jobs/mae-consumer-job/build/libs/mae-consumer-job.jar ../mae-consumer-job.jar
13+
14+
FROM base as prod-install
15+
COPY --from=prod-build /mae-consumer-job.jar /datahub/datahub-mae-consumer/bin/
16+
COPY --from=prod-build /datahub-src/docker/datahub-mae-consumer/start.sh /datahub/datahub-mae-consumer/scripts/
17+
RUN chmod +x /datahub/datahub-mae-consumer/scripts/start.sh
18+
19+
FROM base as dev-install
20+
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
21+
# See this excellent thread https://github.com/docker/cli/issues/1134
22+
23+
FROM ${APP_ENV}-install as final
24+
25+
EXPOSE 9090
26+
27+
CMD /datahub/datahub-mae-consumer/scripts/start.sh

docker/datahub-mae-consumer/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# DataHub MetadataAuditEvent (MAE) Consumer Docker Image
2+
[![datahub-mae-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mae-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mae-consumer+docker%22)
3+
4+
Refer to [DataHub MAE Consumer Job](../../metadata-jobs/mae-consumer-job) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
KAFKA_BOOTSTRAP_SERVER=broker:29092
2+
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
3+
ELASTICSEARCH_HOST=elasticsearch
4+
ELASTICSEARCH_PORT=9200
5+
NEO4J_HOST=neo4j:7474
6+
NEO4J_URI=bolt://neo4j
7+
NEO4J_USERNAME=neo4j
8+
NEO4J_PASSWORD=datahub

docker/mae-consumer/start.sh docker/datahub-mae-consumer/start.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ dockerize \
55
-wait http://$ELASTICSEARCH_HOST:$ELASTICSEARCH_PORT \
66
-wait http://$NEO4J_HOST \
77
-timeout 240s \
8-
java -jar mae-consumer-job.jar
8+
java -jar /datahub/datahub-mae-consumer/bin/mae-consumer-job.jar
+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Defining environment
2+
ARG APP_ENV=prod
3+
4+
FROM openjdk:8-jre-alpine as base
5+
ENV DOCKERIZE_VERSION v0.6.1
6+
RUN apk --no-cache add curl tar \
7+
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
8+
9+
FROM openjdk:8 as prod-build
10+
COPY . datahub-src
11+
RUN cd datahub-src && ./gradlew :metadata-jobs:mce-consumer-job:build
12+
RUN cd datahub-src && cp metadata-jobs/mce-consumer-job/build/libs/mce-consumer-job.jar ../mce-consumer-job.jar
13+
14+
FROM base as prod-install
15+
COPY --from=prod-build /mce-consumer-job.jar /datahub/datahub-mce-consumer/bin/
16+
COPY --from=prod-build /datahub-src/docker/datahub-mce-consumer/start.sh /datahub/datahub-mce-consumer/scripts/
17+
RUN chmod +x /datahub/datahub-mce-consumer/scripts/start.sh
18+
19+
FROM base as dev-install
20+
# Dummy stage for development. Assumes code is built on your machine and mounted to this image.
21+
# See this excellent thread https://github.com/docker/cli/issues/1134
22+
23+
FROM ${APP_ENV}-install as final
24+
25+
EXPOSE 9090
26+
27+
CMD /datahub/datahub-mce-consumer/scripts/start.sh

docker/datahub-mce-consumer/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# DataHub MetadataChangeEvent (MCE) Consumer Docker Image
2+
[![datahub-mce-consumer docker](https://github.com/linkedin/datahub/workflows/datahub-mce-consumer%20docker/badge.svg)](https://github.com/linkedin/datahub/actions?query=workflow%3A%22datahub-mce-consumer+docker%22)
3+
4+
Refer to [DataHub MCE Consumer Job](../../metadata-jobs/mce-consumer-job) to have a quick understanding of the architecture and
5+
responsibility of this service for the DataHub.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
KAFKA_BOOTSTRAP_SERVER=broker:29092
2+
KAFKA_SCHEMAREGISTRY_URL=http://schema-registry:8081
3+
GMS_HOST=datahub-gms
4+
GMS_PORT=8080

docker/mce-consumer/start.sh docker/datahub-mce-consumer/start.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@
44
dockerize \
55
-wait tcp://$KAFKA_BOOTSTRAP_SERVER \
66
-timeout 240s \
7-
java -jar mce-consumer-job.jar
7+
java -jar /datahub/datahub-mce-consumer/bin/mce-consumer-job.jar

docker/dev.sh

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/bash
2+
3+
# Launches dev instances of DataHub images. See documentation for more details.
4+
# YOU MUST BUILD VIA GRADLE BEFORE RUNNING THIS.
5+
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
6+
cd $DIR && \
7+
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose \
8+
-f docker-compose.yml \
9+
-f docker-compose.override.yml \
10+
-f docker-compose.dev.yml \
11+
pull \
12+
&& \
13+
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \
14+
-f docker-compose.yml \
15+
-f docker-compose.override.yml \
16+
-f docker-compose.dev.yml \
17+
up

0 commit comments

Comments
 (0)