Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor base HA image #355

Merged
merged 59 commits into from
Apr 24, 2023
Merged

Refactor base HA image #355

merged 59 commits into from
Apr 24, 2023

Conversation

graveland
Copy link
Collaborator

@graveland graveland commented Feb 1, 2023

This is a large refactor that strips out all the things that aren't required for this image.

It also adds extra checks in the cicd scripts to make sure the requested extensions and
packages are actually present in the image. It does this by looking in the new /.image_config
file that contains the build parameters so we can do some introspection of the image.

The build scripts have been overhauled as well, including some matching for cargo-pgx
vs extension versions to optimize rust builds. Another major change is that requested
extensions first try to install .deb packages, and fall back to building them. The
timescaledb extension is always built though, as we want different build options than
what's currently in the debian packages.

Extension version and pg compatibility is described in build_scripts/versions.yaml
and that file should be the only thing that needs updating to add new versions of
extensions as they are released.

This also adds most of the required support for arm64 builds as well, but we need to
fix public arm64 build runners before fully building multiplatform images. We do currently
build images with -amd64 tags, and create a manifest for the base tags, but only add
the one architecture to it for now.

There's also new github step summary information available after the build completes
that helps to easily see what's included in the image.

This currently pushes to timescaledev/timescaledb-ha instead of the main
timescale/timescaledb-ha docker hub repository. Changing that will be the last
thing to happen prior to merging this PR. This also won't be merged until other
downstream changes have been sorted out. Until then, we can build and test from
the timescaledev/timescaledb-ha images that this will build.

In order to speed things up, this branch's publish-images github action only builds pg15,
so adding 12-14 to the matrix would happen before merging as well.

@graveland graveland requested a review from a team February 1, 2023 02:39
@feikesteenbergen
Copy link
Member

and versions later than that are installed from pre-built debian packages.

Tricky with that is that we want debug symbols included in the libraries - that is the reason we compile them ourselves currently. -DCMAKE_BUILD_TYPE=RelWithDebInfo

The second thing we wanted to ensure is that the downgrade scripts are included (DGENERATE_DOWNGRADE_SCRIPT=ON), it looks like that is covered nowadays in the deb packages themselves.

The third thing we wanted to insure is the INSTALL_METHOD is distinct from timescaledb installed from packages.

        ./bootstrap \
            -DTAP_CHECKS=OFF \
            -DWARNINGS_AS_ERRORS=off \
            -DCMAKE_BUILD_TYPE=RelWithDebInfo \
            -DREGRESS_CHECKS=OFF \
            -DGENERATE_DOWNGRADE_SCRIPT=ON \
            -DPROJECT_INSTALL_METHOD="${INSTALL_METHOD}${OSS_ONLY}"

Copy link

@manute manute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of bash code that has not got tests, sure it will work but not sure if we can make tests or is it too complex?

@graveland
Copy link
Collaborator Author

graveland commented Feb 1, 2023

and versions later than that are installed from pre-built debian packages.

Tricky with that is that we want debug symbols included in the libraries - that is the reason we compile them ourselves currently. -DCMAKE_BUILD_TYPE=RelWithDebInfo

The second thing we wanted to ensure is that the downgrade scripts are included (DGENERATE_DOWNGRADE_SCRIPT=ON), it looks like that is covered nowadays in the deb packages themselves.

The third thing we wanted to insure is the INSTALL_METHOD is distinct from timescaledb installed from packages.

        ./bootstrap \
            -DTAP_CHECKS=OFF \
            -DWARNINGS_AS_ERRORS=off \
            -DCMAKE_BUILD_TYPE=RelWithDebInfo \
            -DREGRESS_CHECKS=OFF \
            -DGENERATE_DOWNGRADE_SCRIPT=ON \
            -DPROJECT_INSTALL_METHOD="${INSTALL_METHOD}${OSS_ONLY}"

Sounds good. It's an easy tweak to build always. Looking at the debian packages, there are only a few with corresponding -dbgsym packages. I'll change that. If we get .debs sometime for hot-forge, we should be able to use them here too and then go back to installing instead of building.

@timescale timescale deleted a comment from manute Feb 1, 2023
@graveland
Copy link
Collaborator Author

There is a lot of bash code that has not got tests, sure it will work but not sure if we can make tests or is it too complex?

I can't say I ever really made tests for shell scripts? Between running the cicd/install_checks during build time, and CI time, and the github step summaries showing what actually got installed, I figured it was good enough? It's certainly more complex than before, but we also were missing things without manually looking inside the resulting images.

If people want tests, or if there is any more complexity at all, I'd switch languages.

@graveland
Copy link
Collaborator Author

The current procedure for adding a new version of an extension includes modifying build_scripts/shared_versions.sh to update the appropriate pgx_X_versions structure if you're changing one of those, and/or modifying the supported_X functions that determine if a package version + pg version should result in an installation/build.

Streamlining that would be a win. I'll think a bit on it more.

@graveland graveland force-pushed the brent/base branch 4 times, most recently from 79c0ff6 to b625d21 Compare February 5, 2023 14:20
@graveland
Copy link
Collaborator Author

The current procedure for adding a new version of an extension includes modifying build_scripts/shared_versions.sh to update the appropriate pgx_X_versions structure if you're changing one of those, and/or modifying the supported_X functions that determine if a package version + pg version should result in an installation/build.

Streamlining that would be a win. I'll think a bit on it more.

The latest push includes build_scripts/versions.yaml and is now the only place that you need to edit when adding a new version of an extension. This also adds the ability to use all and latest when specifying versions of extensions that you want to build, and the Makefile defaults to all. To add timescaledb 2.9.3 for example, add one line to the file, and it will be included. For faster builds / testing, the make target latest uses to latest version specification for timescaledb, timescaledb_toolkit, and promscale.

Copy link
Contributor

@sebastianwebber sebastianwebber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small comments. LGTM

publish-combined-builder-manifest: # publish a combined builder image manifest
@echo "Creating manifest $(DOCKER_BUILDER_URL) that includes $(DOCKER_BUILDER_URL)-amd64"
amddigest_image="$$(./fetch_tag_digest $(DOCKER_BUILDER_URL)-amd64)"
echo "AMD: $$amddigest_image"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm info is missing. is this the case to add it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. Arm was working 100%, but we need to sort out a secure arm64 builder infrastructure. I've got a PoC for that, but I need to sort out the webhooks for it from GH before re-introducing arm64.

@graveland graveland force-pushed the brent/base branch 5 times, most recently from 6e807b4 to 21eeb16 Compare February 7, 2023 15:29
@alexeyklyukin
Copy link
Member

Tricky to review, but we can try it on dev. A declarative way to describe build options and extensions/versions is a huge win. I'd add a step to list build docker images, which helps find the tag to put into the deployer config.

Copy link
Member

@feikesteenbergen feikesteenbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits, nothing major

- "*.md"

schedule:
- cron: '0 7 * * 2'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For confirmation: we'll be publishing an image every Tuesday at 07:00 UTC, that's what we want?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time picked was arbitrary, and it stepped on me during testing a couple times so I bounced it around :)

The idea is to automatically pick up updated .debs to automatically include security fixes. I'd like it scheduled, but I don't care at all about when.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

README.md Outdated
@@ -10,6 +10,8 @@ This directory contains everything that allows us to create a Docker image with

Currently, our base image is Ubuntu, as we require glibc 2.33+.

It's currently pushing the resuting images to: https://hub.docker.com/r/timescaledev/timescaledb-ha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's currently pushing the resuting images to: https://hub.docker.com/r/timescaledev/timescaledb-ha
It's currently pushing the resulting images to: https://hub.docker.com/r/timescale/timescaledb-ha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we push to /timescale/ since commit 0c54000

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, is pushing to timescaledev just so that it can run concurrently with the master branch, and would be a thing that changes just before merging.

Copy link
Member

@alexeyklyukin alexeyklyukin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@graveland
Copy link
Collaborator Author

Tricky to review, but we can try it on dev. A declarative way to describe build options and extensions/versions is a huge win. I'd add a step to list build docker images, which helps find the tag to put into the deployer config.

We wouldn't be using the results of this in the deployer, and the full image URLs/tags are in the action step summary for easy access :)

@cryptobench
Copy link

Looking forward to this as i'm interested in the ARM image for local development so I can achieve env parity. Thank you for the work you're putting in!

`build_scripts/versions.yaml` contains packages, versions, min-max pg
versions, and cargo-pgx version information for timescaledb and
rust extensions. It's the one place to add configuration for extensions
to decide if they'll be included in what pg versions.

The next step will be to remove the `Makefile` environment variables
containing all the requested versions. If the version environment
variable is empty, we'll iterate through all the available versions
in `versions.yaml`. If the environment variables are passed, we'll
keep the current behaviour of using that to pick versions to build.
No need to update the Makefile, Dockerfile, or workflow files
for most standard extension version changes. Timescaledb, toolkit,
and promscale versions are all taken from versions.yaml unless
overridden by the environment variables.

This also adds `all` and `latest` as version specifications as well
when they're the only item in the `*_VERSIONS` environment variables.
These expand to all available, or the latest available versions from
`versions.yaml`.
We have to expand the versions requested at build-time so it makes
sense afterwards when checking the image
Getting prepared to merge
Docker hub: timescaledev -> timescale
@graveland graveland merged commit 880b01f into master Apr 24, 2023
@graveland graveland deleted the brent/base branch April 24, 2023 17:27
@leppaott
Copy link

the -latest suffix here indicates the latest docker build, not the latest commit. In particular, images built from a tag will be published with the -latest suffix in addition to the tag-based suffix.

From Main README are the image tagging prefixes still valid? How about this '-all'?

@graveland
Copy link
Collaborator Author

-all means that the specified image includes all the pg versions (pg15-all includes pg12-15). Images with the -latest tag aren't actually getting published as of this PR merge. I'll have to update that README in a PR ASAP!

Now that the builds are scheduled, there's a slight change. Taking pg15.2-ts2.10.3 as an example, between two builds, pg and timescale won't change, but it's possible that the postgis version does change since we're getting upstream fixes from debian and pgdg automatically. If somebody doesn't want the image to change at all, then I'd suggest using the SHA-based tags sha256:8a32cbd9709f5cb9bc56e39a7832e6285ecbfc2f978e58165c946c87c36a8d2e.

If you want to suggest any changes to how the tags are pushed, let us know!

@polarn
Copy link

polarn commented May 17, 2023

@graveland I think the introduction of --platform for the docker build introduced a small problem for people running in for example Rancher Desktop on Silicon M1/M2 Macs. Before this, the image produced was of the old non multi-arch type and now it is of multi arch (with only one arch, amd64). It confuses Rancher to try to pull the arm64 image, but there is no such image (yet). The behaviour before with the "single" architecture image was to pull that.

Not thinking you should do anything about it, just wanted to mention it if others see this in their Kubernetes:

Failed to pull image "timescale/timescaledb-ha:pg15.2-ts2.10.3": rpc error: code = Unknown desc = no matching manifest for linux/arm64/v8 in the manifest list entries

A work-around for users could maybe be to point out the image digest instead of tag, or just pulling the image manually to the dockerd.

If you are curious to see the difference in image types, check the output of these:

docker manifest inspect --verbose timescale/timescaledb-ha:pg15.2-ts2.10.3 
docker manifest inspect --verbose timescale/timescaledb-ha:pg15.2-ts2.10.1-latest

You will see an array with one element for the first command and you will see the element directly on the root for the second image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants