Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow outdated files on mirrors #188

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

elboulangero
Copy link
Contributor

@elboulangero elboulangero commented Feb 14, 2025

This PR applies after #186, therefore it will need to be rebased after/if #186 gets merged. Only the 6 last commits really belong to this PR.

No need to review it before #186 gets sorted out.

This PR replaces #147

It fixes #85 for Debian-like distributions. The setting proposed here is kind of minimal, but it should be easy to extend it in the future (ie. match files with regex rather than prefix), if ever a needs arise.


The new setting AllowOutdatedFiles allows user to define which files are allowed to be outdated on the mirrors, and for how long.

The user defines a list of rules, each rule is of the form:

  • Prefix: matched against the beginning of the path of the requested file
  • Minute: if Prefix matches, how long the file is allowed to be oudated

AllowOutdatedFiles is a list of rules, they are checked in order, and the first rule that matches is selected.

Note that, when a rule matches, the filesize check is also disabled for this file. As it wouldn't make much sense if we allowed a file to be outdated, but didn't allow it to be of a different size.

Now, here's the use-case for this setting.

For a Debian-like distribution, the directory /dists (aka. the metadata of the repository) contains a lot of files that are updated in-place. Each time the repository is updated, and immediately after mirrorbits rescans the local repo, mirrorbits redirects all the traffic for those files to the fallback mirror, since they have a new modtime, a new size, and mirrorbits doesn't know yet any mirror with those new files. It's only after 1) mirrors sync with the origin repository and 2) mirrorbits scans the updated mirrors, that it can redirect traffic to mirrors again.

For more details in a real-life setup: Kali Linux is a rolling distro, the repository is updated every 6 hours, and mirrors are scanned every hour. In effect, it means that every 6 hours, mirrorbits redirects most of the metadata traffic to the fallback mirrors, then it takes around 1 to 2 hours before all the mirrors are scanned and traffic flows back to normal. Then again, 4 times a day.

To prevent that, Kali uses the following setting:

AllowOutdatedFiles:
    - Prefix: /dists/
      Minutes: 540

In this commit, all the code block that filter mirrors (ie. separate
them in two groups, mirrors candidates for the redirect, and mirrors
excluded) is moved to a separate function, for the purpose of adding
test cases for it.

The test cases added don't cover every possible situation, far from
that, but at least it's a start.
Instead of modifying mlist in place, we can just create a new slice to
hold the mirrors that are accepted. Then we don't have to move elements
within an existing slice, making the code simpler.

Gbp-Pq: Name prep-selection-Simplify-Filter-a-bit.patch
In this piece of the code we compare distances twice, and IMO it's
easier to read if we put the mirror distance on the same side of the
operator for both comparisons.
Define and use a new helper `HasAnyPrefix`, to make some lines of code
more concise.

At the moment, it's used only twice, but we're going to use it more in
the upcoming commits, so it's going to be useful.
Those functions are trivial for now, but in the upcoming commits we will
allow a mirror to support both HTTP and HTTPS. The logic to know if it's
up will be more complex, given that a mirror can be up over HTTP but
down over HTTPS, for example.

Similarily, a status string will be a bit more than just "up" or "down",
eg. when the mirror is up for HTTP and down for HTTPS.
With this change, the actual health check implementation (ie. the http
request) is moved to a new function called `healthCheckDo`. The original
`HealthCheck` function does only two things now: 1) pick the file that
will be used to perform the health check and 2) call `healthCheckDo`.

The purpose of this split is to allow, in the upcoming commits, to
perform 2 health checks instead of one. That will be used for mirrors
that support both HTTP and HTTPS, we will need to check that requests
work for both protocols.
This commit adds the protocol (HTTP or HTTPS) to the health-check logs.
This in preparation for the next commits, where a mirror can support
both HTTP and HTTPS, and health checks will be done twice: over HTTP and
HTTPS. In this case, logging that a mirror is "up" or "down" won't be
enough, we'll need to know the protocol concerned as well.

1) Changes to the `mirrorbits logs` command

Let's take a look at how `mirrorbits log MIRROR` looks like before this
commit:

```
2025-01-08 08:54:42 UTC: Mirror is down: Unreachable
2025-01-08 09:11:12 UTC: Mirror is up
```

And after this commit:

```
2025-01-08 08:55:02 UTC: HTTPS mirror is down: Unreachable
2025-01-08 09:11:12 UTC: HTTPS mirror is up
```

If you considered `mirrorbits log` as a stable format (ie. maybe you
have scripts that parse its output), then take this commit as a breaking
change. You'll need to adjust your scripts accordingly.

2) Changes to the mirrorbits stderr logs

Similarily, the mirrorbits logs (that go to stderr by default) are
updated as such.

Before this commit:

```
2025/01/09 15:35:53.121 UTC kali.darklab.sh                Down! Status: 503
2025/01/10 03:48:56.864 UTC repo.jing.rocks                Up! (509ms)
```

After this commit:

```
2025/01/09 15:35:53.121 UTC kali.darklab.sh                HTTP  Down! Status: 503
2025/01/10 03:48:56.864 UTC repo.jing.rocks                HTTPS Up! (509ms)
```

3) How to review this commit

File `mirrors/logs.go` is simple to review, there is a new field `proto`
in the struct, and updates to support this new field.

The field `proto` is of type `Protocol`, this is a new type introduced
in `mirrors/mirrors.go`.

Finally, in order to actually add this field to the log, we need to
update the signatures of `MarkMirrorUp`, `MarkMirrorDown` and
`SetMirrorState` (`mirrors/mirrors.go`), then the caller
`daemon/monitor.go` can make use of it.
This is in preparation to support relative URLs in the `HttpURL` field.

These are trivial code changes, aimed at making the diff more readable
in the upcoming commit that will add support for relative URLs.
This is in preparation to support relative URLs.

This commit adds a new field `AbsoluteURL` in the `Mirror` struct
(`mirrors/mirrors.go`).

This field is derived from the field `HttpURL`, and is guaranteed to be
a absolute URL (ie. to have a scheme).  At the moment, it's equal to
`HttpURL` (since `HttpURL` always have a scheme), but that will change
in the next commits.

As can be seen (or rather, guessed) in the Mirror struct, this field is
among the fields that are set during mirror selection, so only the code
that comes after that can make use of it, and it is not saved in the
redis db.

In practice, this field is set in two places: `http/selection.go` and
also in `http/http.go` (in the code that handles fallbacks mirrors).
Before this commit, the value for HttpURL must start with either
`http://` or `https://`, for example `http://mirror.com/foobar/`.

With this commit, we also allow "relative URLs", ie. URLs without a
scheme, meaning that `mirror.com/foobar/` is now a valid value, for
mirrors and also for fallback mirrors (defined in mirrorbits.conf).

In case of relative URL, the protocol is assumed to be HTTP, but that's
only temporary. In upcoming commits, relative URL will mean that the
mirror supports both HTTP and HTTPS.

This change adds some complexity: all the code that uses `HttpURL` must
now check if the URL has a scheme, and deal with that. There's no
one-size-fits-all helper for that (even though for now it looks like
that, but that's because relative URL default to HTTP now, and that's
only temporary).

Additionally, codes that runs after the selection process, and that
requires a absolute URL, must now use `AbsoluteURL` instead of `HttpURL`
(that's `templates/mirrorlist.html` and `http/pagerenderer.go`).
So far, the field ExcludeReason had two uses. First, it was set during
health-check when the mirror was found to be down (hence commited to
redis db), and it was later set during mirror selection when the mirror
was excluded (in this case, the value set was never set in the redis db,
it was just discarded after the request was served).

So these are two different things that were conflated in one field, and
that worked, but now that we're about to support both HTTP and HTTPS for
mirrors, that won't cut it.

So, this commit does two things:

1) Rename ExcludeReason to DownReason - this is the field that is set by
   the health check

2) Add a new ExcludeReason field, which is to be set by the mirror
   selection, and used by the code that comes after that and wants to
   know why the mirror was excluded

Note that, given that the redis key is also renamed, there needs to be a
database migration - that will come in a later commit.
In previous commit "Support relative URL in the mirror HttpURL" we
started to accept mirror URLs without a scheme, eg. mirror.com/foobar/,
but in this case the scheme was HTTP by default, so it was not very
interesting...

With this commit, we change this behaviour: mirrors with a relative URL
now support both HTTP and HTTPS. It means that the health check is done
twice (for HTTP and for HTTPS, cf. `daemon/monitor.go`). We duplicate
the fields `Up, DownReason` to become `HttpUp, HttpsUp, HttpDownReason,
HttpsDownReason` in the Mirror struct `mirrors/mirrors.go`). And finally
rework the selection logic to redirect appropriately
(`http/selection.go`).

Note that, given that some redis keys are renamed, there needs to be a
database migration - that will come in a later commit.
Refer to previous commit for details.

Note that a good part of the code was taken from
database/v1/version1.go, it wasn't written from scratch.
As there was changes in rpc.proto in previous commits
Mirrorbits doens't respect the incoming protocol *strictly*, instead the
behavior is as such:
- HTTPS request: HTTPS redirect
- HTTP  request: HTTP or HTTPS redirect

So it doesn't try to respect the HTTP protocol, and is happy to redirect
HTTP requests to HTTPS. Which makes sense if mirrorbits is meant to
serve files downloaded from a web browser, where HTTPS is the norm.

However for Linux distro, for a package repository, it's important to
respect the incoming protocol. There are minimal systems that don't have
HTTPS support out of the box. For Debian (and Debian-like), the standard
Docker image doesn't have CA certificates pre-installed, and apt is
configured to use HTTP repositories. The first command to run is `apt
update`, and if ever that gets redirected to HTTPS, apt fails.

With this commit, we can now change this behavior, and ask mirrorbits to
respect the protocol strictly. As a bonus point, having it in
mirrobits.conf allows to document a rather obscure (but important)
aspect of mirrorbits behavior.

Note that, by default, the setting is true, so the default behavior of
mirrorbits is still the same: redirect HTTP to HTTPS is allowed.
I find it easier to break long lines and have more variables with
meaningful names. Maybe just a matter of taste.
In preparation for the next commit
The new setting AllowOutdatedFiles allows user to define which files are
allowed to be outdated on the mirrors, and for how long.

The user defines a list of rules, each rule is of the form:
- Prefix: matched against the beginning of the path of the requested file
- Minute: if Prefix matches, how long the file is allowed to be oudated

AllowOutdatedFiles is a list of rules, they are checked in order, and
the first rule that matches is selected.

Note that, when a rule matches, the filesize check is also disabled for
this file. As it wouldn't make much sense if we allowed a file to be
outdated, but didn't allow it to be of a different size.

Now, here's the use-case for this setting.

For a Debian-like distribution, the directory `/dists` (aka. the
metadata of the repository) contains a lot of files that are updated
in-place.  Each time the repository is updated, and immediately after
mirrorbits rescans the local repo, mirrorbits redirects all the traffic
for those files to the fallback mirror, since they have a new modtime, a
new size, and mirrorbits doesn't know yet any mirror with those new
files. It's only after 1) mirrors sync with the origin repository and 2)
mirrorbits scans the updated mirrors, that it can redirect traffic to
mirrors again.

For more details in a real-life setup: Kali Linux is a rolling distro,
the repository is updated every 6 hours, and mirrors are scanned every
hour.  In effect, it means that every 6 hours, mirrorbits redirects most
of the metadata traffic to the fallback mirrors, then it takes around 1
to 2 hours before all the mirrors are scanned and traffic flows back to
normal. Then again, 4 times a day.

To prevent that, Kali uses the following setting:

```
AllowOutdatedFiles:
    - Prefix: /dists/
      Minutes: 540
```

Cf. etix#85 for more details.
Add TestMain to make sure that the configuration is always loaded, as
the Filter function needs to get some configuration values.

Add variables noFileInfo and noClientInfo variables to improve
readability.
@elboulangero elboulangero changed the title GitHub/allow outdated files Allow outdated files on mirrors Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mirrorbits for linux distributions (rapidly changing metadata files)
1 participant