Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] natively support .dockerignore #10921

Closed
1 of 2 tasks
blaenk opened this issue Oct 17, 2020 · 1 comment · Fixed by #10922
Closed
1 of 2 tasks

[core] natively support .dockerignore #10921

blaenk opened this issue Oct 17, 2020 · 1 comment · Fixed by #10922
Assignees
Labels
@aws-cdk/core Related to core CDK functionality effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on. p1

Comments

@blaenk
Copy link
Contributor

blaenk commented Oct 17, 2020

A not-uncommon pattern to use with Docker is to turn the .dockerignore file into a whitelist by making the first rule one that ignores everything and then having subsequent negated patterns explicitly exempting specific files and directories.

For example:

*
!Dockerfile
!.dockerignore
!foobar.txt
!index.py
!subdirectory

This is useful to guard against accidentally passing files to the Docker context (or worse, to the image itself) due to forgetting to add a corresponding entry to the .dockerignore file.

CDK currently not only prevents this pattern from being used, but actually makes the situation worse, due to its under-the-hood reading of .dockerignore patterns and processing them with minimatch, which has different pattern matching behavior.

I noticed this when I noticed that my CDK operations were incredibly slow (cdk diff taking 5 minutes) and upon inspection of the cloud assemblies I noticed that everything was being copied in, including node_modules.

Use Case

Ergonomics

Directories for generic assets are pretty straightforward to reason about what is being passed to the cloud assembly and what can be excluded via exclude. Directories for docker assets often contain more directories including large ones such as node_modules or build directories.

Users may mistakenly assume that having a .dockerignore means that they don't have to provide an exclude option since DockerImageAsset gives the impression of a deeper understanding of Docker assets compared to a generic Asset, so I don't think it's unreasonable to expect that CDK may be able to Do The Right Thing™ when a .dockerignore is present.

I think this expectation is reasonable especially since CDK currently does read and process .dockerignore under the hood, albeit incorrectly applying minimatch semantics instead of .dockerignore semantics (see below).

Adding this feature will mean that users of DockerImageAsset get an out-of-the-box experience that Just Works™ without any further exclude modifications, unless they need to for whatever reason, but then it would be obvious to them.

Correctness

Even setting aside ergonomics, it's simply incorrect to read in .dockerignore and process its patterns with minimatch semantics, since it differs from .dockerignore semantics, yet CDK is doing just this, and it's also doing some workarounds for preventing exclusion of the Dockerfile and .dockerignore from the cloud assembly, which has consequences on user-provided exclusions (see below).

Perceived Slowness of CDK

Incorrectly configured exclusions can have a severe impact on the perceived slowness of CDK when every synthesis requires copying large directories like node_modules, and it is easy to misattribute that to CDK itself being slow (which in a way it is being), possibly driving users away.

My motivation for writing this and implementing the proposed solution is because something like cdk diff was taking my project over 5 minutes to finish. I did believe it was just CDK being this slow.

Whitelist Support

Users cannot use a whitelist style .dockerignore due to the way things work (explained in-depth below).

Even if one recognizes this, someone who uses a whitelist approach with .dockerignore may see the benefits of such an approach and attempt to apply them to exclude, but it is also not possible there (in particular with directory contents, see below).

Proposed Solution

I propose to remedy this by adding native support for .dockerignore style patterns (rather than trying to process them with minimatch). This will be exposed via an optional CopyOptions.ignoreMode setting which defaults to IgnoreMode.GLOB (for backward compatibility) which would use minimatch, while DockerImageAsset would use IgnoreMode.DOCKER (and error for any other mode) with full, proper .dockerignore behavior.

This way, users of DockerImageAsset that have a .dockerignore will have a everything just work out of the box, even if they are using a whitelist style .dockerignore.

Summary of Problems

Here is an in-depth summary of the current problems when attempting to use a whitelist .dockerignore:

CDK drops all whitelist-style patterns

The guard against excluding Dockerfile or .dockerignore from the cloud assembly does not consider that a pattern may be an 'ignore-all' * pattern or a negated pattern !, causing every pattern in a whitelist to be dropped.

// make sure the docker file and the dockerignore file end up in the staging area
// see https://github.com/aws/aws-cdk/issues/6004
exclude = exclude.filter(ignoreExpression => {
return !(minimatch(file, ignoreExpression, { matchBase: true }) ||
minimatch(ignore, ignoreExpression, { matchBase: true }));
});

This guard checks that the given pattern doesn't end up excluding Dockerfile or .dockerignore.

If the pattern is * for example then it will definitely match against Dockerfile or .dockerignore, which this code interprets as a dangerous pattern which would filter out those necessary files, so that pattern is filtered out.

Likewise all negated patterns are filtered out as well. For example if !foo and !bar will match against Dockerfile and .dockerignore since match(Dockerfile, !foo) is true because Dockerfile is not foo.

This causes every single pattern to get dropped in a whitelist-style .dockerignore without the user having done anything at all, since .dockerignore is automatically read and combined with possibly-user-supplied exclude patterns.

.dockerignore patterns are not minimatch patterns

CDK automatically under-the-hood reads .dockerignore files and combines them with exclude patterns which may be user-supplied. These patterns are then processed by the minimatch package which provides functionality somewhat like fnmatch(3).

This is incorrect because minimatch interprets patterns differently. For example, per .dockerignore rules, a simple !subdirectory pattern is enough to whitelist subdirectory/ and any of its contents. However the same is not true for minimatch.

CDK passes the matchBase option which means that if the pattern contains no slashes, it'll match against the path basename. So the directory itself will correctly whitelist, but when CDK checks child paths, those will match against the basename so e.g. subdirectory/child will be a check of minimatch(child, subdirectory).

.dockerignore is more like .gitignore pattern behavior with some deviations which you can read about here.

There is one package which seems to provide .dockerignore matching behavior called @balena/dockerignore, which conforms to the API of node-ignore which is a package providing .gitignore matching behavior.

Note that @balena/dockerignore also details the differences between .dockerignore and .gitignore behavior.

CDK .dockerignore-sourced patterns override user patterns

Being aware of the above issue, I can simply, albeit redundantly, add my own explicit patterns to plug the leaks that CDK opens up.

Specifically, since .dockerignore directory exemptions (negated patterns) aren't interpreted correctly (the .dockerignore way), I can remedy this by adding my own !subdirectory/* for any !subdirectory in the .dockerignore.

As already mentioned, CDK automatically reads a .dockerignore file if present and concatenates it after user supplied rules.

The exclude pattern logic operates in a 'last pattern wins' fashion.

All of this together means that my !subdirectory/* pattern will be clobbered by the * at the beginning of the .dockerignore, meaning my pattern won't take effect at all.

This can be remedied by making user-provided patterns come after .dockerignore patterns, so that user-provided ones can have the 'final say'. The Dockerfile/.dockerignore guard will still filter out the * pattern so CDK won't be using a whitelist as intended, however, and it would be up to the user to recreate the whitelist present in .dockerignore in exclude minimatch patterns.

Minimatch matchBase behavior

Even if we allow user-provided exclude patterns to 'win', CDK uses minimatch's matchBase option which:

If set, then patterns without slashes will be matched against the basename of the path if it contains slashes. For example, a?b would match the path /xyz/123/acb, but not /xyz/acb/123.

CDK passes absolute paths to minimatch(), which is probably why passing this option is considered useful.

The problem is that my above workaround of explicit providing a !subdirectory/* pattern won't work because it contains a slash, so it will be matched against the full absolute path, and it won't match, and there's no way for me as a user to work around this.

The solution here is to pass to minimatch() paths relative to the root directory (i.e. root of the asset).

Other

  • 👋 I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

@blaenk blaenk added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Oct 17, 2020
@github-actions github-actions bot added the @aws-cdk/core Related to core CDK functionality label Oct 17, 2020
@SomayaB SomayaB added the in-progress This issue is being actively worked on. label Oct 19, 2020
@rix0rrr rix0rrr added effort/medium Medium work item – several days of effort p1 labels Oct 26, 2020
@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Nov 6, 2020
@mergify mergify bot closed this as completed in #10922 Nov 10, 2020
mergify bot pushed a commit that referenced this issue Nov 10, 2020
Patterns in `.dockerignore` used to be interpreted as globs, which has different behavior from how `.dockerignore` is supposed to be interpreted. Specifically it was hard to properly ignore whole directories at once (such as a large `node_modules`).

This PR adds an `ignoreMode` flag which controls how the ignore patterns should be interpreted. The choices are:

* `IgnoreMode.GLOB` - interpret as before
* `IgnoreMode.GIT` - interpret as in `.gitignore`
* `IgnoreMode.DOCKER` - interpret as in `.dockerignore`

Users might have dependencies on the current behavior, which is why the default `ignoreMode` depends on a feature flag: `@aws-cdk/aws-ecr-assets:dockerIgnoreSupport` (which is set to `true` for new projects) enables the new, correct Docker-ignore behavior by default.

----

Closes #10921

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/core Related to core CDK functionality effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. in-progress This issue is being actively worked on. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants