-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow outdated files on mirrors #188
Open
elboulangero
wants to merge
21
commits into
etix:master
Choose a base branch
from
elboulangero:github/allow-outdated-files
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Allow outdated files on mirrors #188
elboulangero
wants to merge
21
commits into
etix:master
from
elboulangero:github/allow-outdated-files
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this commit, all the code block that filter mirrors (ie. separate them in two groups, mirrors candidates for the redirect, and mirrors excluded) is moved to a separate function, for the purpose of adding test cases for it. The test cases added don't cover every possible situation, far from that, but at least it's a start.
Instead of modifying mlist in place, we can just create a new slice to hold the mirrors that are accepted. Then we don't have to move elements within an existing slice, making the code simpler. Gbp-Pq: Name prep-selection-Simplify-Filter-a-bit.patch
In this piece of the code we compare distances twice, and IMO it's easier to read if we put the mirror distance on the same side of the operator for both comparisons.
Define and use a new helper `HasAnyPrefix`, to make some lines of code more concise. At the moment, it's used only twice, but we're going to use it more in the upcoming commits, so it's going to be useful.
Those functions are trivial for now, but in the upcoming commits we will allow a mirror to support both HTTP and HTTPS. The logic to know if it's up will be more complex, given that a mirror can be up over HTTP but down over HTTPS, for example. Similarily, a status string will be a bit more than just "up" or "down", eg. when the mirror is up for HTTP and down for HTTPS.
With this change, the actual health check implementation (ie. the http request) is moved to a new function called `healthCheckDo`. The original `HealthCheck` function does only two things now: 1) pick the file that will be used to perform the health check and 2) call `healthCheckDo`. The purpose of this split is to allow, in the upcoming commits, to perform 2 health checks instead of one. That will be used for mirrors that support both HTTP and HTTPS, we will need to check that requests work for both protocols.
This commit adds the protocol (HTTP or HTTPS) to the health-check logs. This in preparation for the next commits, where a mirror can support both HTTP and HTTPS, and health checks will be done twice: over HTTP and HTTPS. In this case, logging that a mirror is "up" or "down" won't be enough, we'll need to know the protocol concerned as well. 1) Changes to the `mirrorbits logs` command Let's take a look at how `mirrorbits log MIRROR` looks like before this commit: ``` 2025-01-08 08:54:42 UTC: Mirror is down: Unreachable 2025-01-08 09:11:12 UTC: Mirror is up ``` And after this commit: ``` 2025-01-08 08:55:02 UTC: HTTPS mirror is down: Unreachable 2025-01-08 09:11:12 UTC: HTTPS mirror is up ``` If you considered `mirrorbits log` as a stable format (ie. maybe you have scripts that parse its output), then take this commit as a breaking change. You'll need to adjust your scripts accordingly. 2) Changes to the mirrorbits stderr logs Similarily, the mirrorbits logs (that go to stderr by default) are updated as such. Before this commit: ``` 2025/01/09 15:35:53.121 UTC kali.darklab.sh Down! Status: 503 2025/01/10 03:48:56.864 UTC repo.jing.rocks Up! (509ms) ``` After this commit: ``` 2025/01/09 15:35:53.121 UTC kali.darklab.sh HTTP Down! Status: 503 2025/01/10 03:48:56.864 UTC repo.jing.rocks HTTPS Up! (509ms) ``` 3) How to review this commit File `mirrors/logs.go` is simple to review, there is a new field `proto` in the struct, and updates to support this new field. The field `proto` is of type `Protocol`, this is a new type introduced in `mirrors/mirrors.go`. Finally, in order to actually add this field to the log, we need to update the signatures of `MarkMirrorUp`, `MarkMirrorDown` and `SetMirrorState` (`mirrors/mirrors.go`), then the caller `daemon/monitor.go` can make use of it.
This is in preparation to support relative URLs in the `HttpURL` field. These are trivial code changes, aimed at making the diff more readable in the upcoming commit that will add support for relative URLs.
This is in preparation to support relative URLs. This commit adds a new field `AbsoluteURL` in the `Mirror` struct (`mirrors/mirrors.go`). This field is derived from the field `HttpURL`, and is guaranteed to be a absolute URL (ie. to have a scheme). At the moment, it's equal to `HttpURL` (since `HttpURL` always have a scheme), but that will change in the next commits. As can be seen (or rather, guessed) in the Mirror struct, this field is among the fields that are set during mirror selection, so only the code that comes after that can make use of it, and it is not saved in the redis db. In practice, this field is set in two places: `http/selection.go` and also in `http/http.go` (in the code that handles fallbacks mirrors).
Before this commit, the value for HttpURL must start with either `http://` or `https://`, for example `http://mirror.com/foobar/`. With this commit, we also allow "relative URLs", ie. URLs without a scheme, meaning that `mirror.com/foobar/` is now a valid value, for mirrors and also for fallback mirrors (defined in mirrorbits.conf). In case of relative URL, the protocol is assumed to be HTTP, but that's only temporary. In upcoming commits, relative URL will mean that the mirror supports both HTTP and HTTPS. This change adds some complexity: all the code that uses `HttpURL` must now check if the URL has a scheme, and deal with that. There's no one-size-fits-all helper for that (even though for now it looks like that, but that's because relative URL default to HTTP now, and that's only temporary). Additionally, codes that runs after the selection process, and that requires a absolute URL, must now use `AbsoluteURL` instead of `HttpURL` (that's `templates/mirrorlist.html` and `http/pagerenderer.go`).
So far, the field ExcludeReason had two uses. First, it was set during health-check when the mirror was found to be down (hence commited to redis db), and it was later set during mirror selection when the mirror was excluded (in this case, the value set was never set in the redis db, it was just discarded after the request was served). So these are two different things that were conflated in one field, and that worked, but now that we're about to support both HTTP and HTTPS for mirrors, that won't cut it. So, this commit does two things: 1) Rename ExcludeReason to DownReason - this is the field that is set by the health check 2) Add a new ExcludeReason field, which is to be set by the mirror selection, and used by the code that comes after that and wants to know why the mirror was excluded Note that, given that the redis key is also renamed, there needs to be a database migration - that will come in a later commit.
In previous commit "Support relative URL in the mirror HttpURL" we started to accept mirror URLs without a scheme, eg. mirror.com/foobar/, but in this case the scheme was HTTP by default, so it was not very interesting... With this commit, we change this behaviour: mirrors with a relative URL now support both HTTP and HTTPS. It means that the health check is done twice (for HTTP and for HTTPS, cf. `daemon/monitor.go`). We duplicate the fields `Up, DownReason` to become `HttpUp, HttpsUp, HttpDownReason, HttpsDownReason` in the Mirror struct `mirrors/mirrors.go`). And finally rework the selection logic to redirect appropriately (`http/selection.go`). Note that, given that some redis keys are renamed, there needs to be a database migration - that will come in a later commit.
Refer to previous commit for details. Note that a good part of the code was taken from database/v1/version1.go, it wasn't written from scratch.
As there was changes in rpc.proto in previous commits
Mirrorbits doens't respect the incoming protocol *strictly*, instead the behavior is as such: - HTTPS request: HTTPS redirect - HTTP request: HTTP or HTTPS redirect So it doesn't try to respect the HTTP protocol, and is happy to redirect HTTP requests to HTTPS. Which makes sense if mirrorbits is meant to serve files downloaded from a web browser, where HTTPS is the norm. However for Linux distro, for a package repository, it's important to respect the incoming protocol. There are minimal systems that don't have HTTPS support out of the box. For Debian (and Debian-like), the standard Docker image doesn't have CA certificates pre-installed, and apt is configured to use HTTP repositories. The first command to run is `apt update`, and if ever that gets redirected to HTTPS, apt fails. With this commit, we can now change this behavior, and ask mirrorbits to respect the protocol strictly. As a bonus point, having it in mirrobits.conf allows to document a rather obscure (but important) aspect of mirrorbits behavior. Note that, by default, the setting is true, so the default behavior of mirrorbits is still the same: redirect HTTP to HTTPS is allowed.
I find it easier to break long lines and have more variables with meaningful names. Maybe just a matter of taste.
In preparation for the next commit
The new setting AllowOutdatedFiles allows user to define which files are allowed to be outdated on the mirrors, and for how long. The user defines a list of rules, each rule is of the form: - Prefix: matched against the beginning of the path of the requested file - Minute: if Prefix matches, how long the file is allowed to be oudated AllowOutdatedFiles is a list of rules, they are checked in order, and the first rule that matches is selected. Note that, when a rule matches, the filesize check is also disabled for this file. As it wouldn't make much sense if we allowed a file to be outdated, but didn't allow it to be of a different size. Now, here's the use-case for this setting. For a Debian-like distribution, the directory `/dists` (aka. the metadata of the repository) contains a lot of files that are updated in-place. Each time the repository is updated, and immediately after mirrorbits rescans the local repo, mirrorbits redirects all the traffic for those files to the fallback mirror, since they have a new modtime, a new size, and mirrorbits doesn't know yet any mirror with those new files. It's only after 1) mirrors sync with the origin repository and 2) mirrorbits scans the updated mirrors, that it can redirect traffic to mirrors again. For more details in a real-life setup: Kali Linux is a rolling distro, the repository is updated every 6 hours, and mirrors are scanned every hour. In effect, it means that every 6 hours, mirrorbits redirects most of the metadata traffic to the fallback mirrors, then it takes around 1 to 2 hours before all the mirrors are scanned and traffic flows back to normal. Then again, 4 times a day. To prevent that, Kali uses the following setting: ``` AllowOutdatedFiles: - Prefix: /dists/ Minutes: 540 ``` Cf. etix#85 for more details.
Add TestMain to make sure that the configuration is always loaded, as the Filter function needs to get some configuration values. Add variables noFileInfo and noClientInfo variables to improve readability.
This was referenced Feb 14, 2025
Open
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR applies after #186, therefore it will need to be rebased after/if #186 gets merged. Only the 6 last commits really belong to this PR.
No need to review it before #186 gets sorted out.
This PR replaces #147
It fixes #85 for Debian-like distributions. The setting proposed here is kind of minimal, but it should be easy to extend it in the future (ie. match files with regex rather than prefix), if ever a needs arise.
The new setting
AllowOutdatedFiles
allows user to define which files are allowed to be outdated on the mirrors, and for how long.The user defines a list of rules, each rule is of the form:
Prefix
: matched against the beginning of the path of the requested fileMinute
: if Prefix matches, how long the file is allowed to be oudatedAllowOutdatedFiles
is a list of rules, they are checked in order, and the first rule that matches is selected.Note that, when a rule matches, the filesize check is also disabled for this file. As it wouldn't make much sense if we allowed a file to be outdated, but didn't allow it to be of a different size.
Now, here's the use-case for this setting.
For a Debian-like distribution, the directory
/dists
(aka. the metadata of the repository) contains a lot of files that are updated in-place. Each time the repository is updated, and immediately after mirrorbits rescans the local repo, mirrorbits redirects all the traffic for those files to the fallback mirror, since they have a new modtime, a new size, and mirrorbits doesn't know yet any mirror with those new files. It's only after 1) mirrors sync with the origin repository and 2) mirrorbits scans the updated mirrors, that it can redirect traffic to mirrors again.For more details in a real-life setup: Kali Linux is a rolling distro, the repository is updated every 6 hours, and mirrors are scanned every hour. In effect, it means that every 6 hours, mirrorbits redirects most of the metadata traffic to the fallback mirrors, then it takes around 1 to 2 hours before all the mirrors are scanned and traffic flows back to normal. Then again, 4 times a day.
To prevent that, Kali uses the following setting: