Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mirrors that support both HTTP and HTTPS #186

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

elboulangero
Copy link
Contributor

This pull request is a complete rework of #156 and replaces it.

short story

This PR brings a new feature to mirrorbits: users can now define mirrors that support both HTTP and HTTPS, by using a relative URL (ie. URL without scheme) for the field HttpURL. Additionally, a new setting AllowHTTPToHTTPSRedirects (default: true) allow users to control this behavior that was hardcoded in mirrorbits before.

Bonus point: in the process of reworking the mirror selection, this PR added serious unit testing to this critical function!

long story

So far, mirrorbits would only accept URLs that start with either "http://" or "https://", meaning that it's not possible to express that a mirror supports both HTTP and HTTPS. Which was Ok in a sense, due to another detail: mirrorbits happily redirects HTTP requests to HTTPS mirrors. I suspect the initial use-case for mirrorbits is to serve downloads from a web page, and redirecting HTTP to HTTPS without asking is standard practice for the web.

However, for Linux distros (at least, based on Debian), this behaviour doesn't work well. The decision to use HTTP or HTTPS belongs to the user, and the redirector must respect it. There are minimal systems that don't have HTTPS support out of the box. For Debian (and Debian-like), the standard Docker image doesn't have CA certificates pre-installed, and apt is configured to use HTTP repositories. The first command to run is apt update, and if ever that gets redirected to HTTPS, apt fails.

Therefore, this PR makes mirrorbits much more useful for distro mirrors, by doing mostly two things:

  1. Support relative URLs (ie. mirror.example.com/foo/ in place of http://mirror.example.com/foo/) for the field HttpURL of a mirror, therefore declaring that a mirror supports both HTTP and HTTPS.
  2. Allow user to disabled the HTTP -> HTTPS redirects by setting the new config option AllowHTTPToHTTPSRedirects to false.

under the hood

When a mirror has a relative URL, it means that mirrorbits will perform 2 health-checks (one over HTTP and one over HTTPS) instead of one. It also means that, internally, mirrorbits now has 2 fields to keep track of the status of a mirror: HttpUp and HttpsUp, and also to different field HttpDownReason and HttpsDownReason to describe why the mirror is down.

Which also means that this pull request brings: a rebuild of the generated file rpc.pb.go and a database upgrade (v1 -> v2).

visible changes for users

For the command mirrorbits list, column STATE, and when a mirror supports both HTTP and HTTPS, a mirror can now be up/down (if it's up over HTTP but down over HTTPS) or down/up (HTTP down, HTTPS up).

For the command mirrorbits logs, now mirrorbits reports the protocol, meaning that a line Mirror is up is now HTTPS mirror is up, for example. Same with the mirrorbits stdout logs, the line repo.jing.rocks Up! (509ms) becomes repo.jing.rocks HTTPS Up! (509ms).

The HTML templates have been updated in this PR, users who don't use the stock template will need to update their copies.

Database upgrade (should take only a few seconds), meaning NO ROLL BACK POSSIBLE AFTER UPDATING!

quick guide to review

The 2 initial commits actually belong to #185, and can be dropped if ever #185 is merged.

Commits b4bc62d to 88a07e7 add unit testing to mirror selection (and could be merged in independently of this PR).

Commits 825278f to 493d8b9 are fairly trivial, and just prepare the code for changes to come.

Commits cf647db to afbd77d are where the big changes are done.

Finally comes the 3 last commits: db upgrade, regen of rpc.pb.go, and the addition of the setting AllowHTTPToHTTPSRedirects which is needed to make the whole PR really useful.

to finish

The first iteration of this work at #156 has been running in production for more than a year now. This new PR has been deployed in a staging instance for a few weeks, and is now in production at eg. https://http.kali.org/README?mirrorlist.

While #156 was functional, it was a bit rushed out and not ready, the approach was clumsy. I believe this PR is a much better attempt.

This PR will close #157

Thanks!

@jbkempf
Copy link
Collaborator

jbkempf commented Feb 9, 2025

This needs a manual rebase.

In this commit, all the code block that filter mirrors (ie. separate
them in two groups, mirrors candidates for the redirect, and mirrors
excluded) is moved to a separate function, for the purpose of adding
test cases for it.

The test cases added don't cover every possible situation, far from
that, but at least it's a start.
Instead of modifying mlist in place, we can just create a new slice to
hold the mirrors that are accepted. Then we don't have to move elements
within an existing slice, making the code simpler.

Gbp-Pq: Name prep-selection-Simplify-Filter-a-bit.patch
In this piece of the code we compare distances twice, and IMO it's
easier to read if we put the mirror distance on the same side of the
operator for both comparisons.
Define and use a new helper `HasAnyPrefix`, to make some lines of code
more concise.

At the moment, it's used only twice, but we're going to use it more in
the upcoming commits, so it's going to be useful.
Those functions are trivial for now, but in the upcoming commits we will
allow a mirror to support both HTTP and HTTPS. The logic to know if it's
up will be more complex, given that a mirror can be up over HTTP but
down over HTTPS, for example.

Similarily, a status string will be a bit more than just "up" or "down",
eg. when the mirror is up for HTTP and down for HTTPS.
With this change, the actual health check implementation (ie. the http
request) is moved to a new function called `healthCheckDo`. The original
`HealthCheck` function does only two things now: 1) pick the file that
will be used to perform the health check and 2) call `healthCheckDo`.

The purpose of this split is to allow, in the upcoming commits, to
perform 2 health checks instead of one. That will be used for mirrors
that support both HTTP and HTTPS, we will need to check that requests
work for both protocols.
This commit adds the protocol (HTTP or HTTPS) to the health-check logs.
This in preparation for the next commits, where a mirror can support
both HTTP and HTTPS, and health checks will be done twice: over HTTP and
HTTPS. In this case, logging that a mirror is "up" or "down" won't be
enough, we'll need to know the protocol concerned as well.

1) Changes to the `mirrorbits logs` command

Let's take a look at how `mirrorbits log MIRROR` looks like before this
commit:

```
2025-01-08 08:54:42 UTC: Mirror is down: Unreachable
2025-01-08 09:11:12 UTC: Mirror is up
```

And after this commit:

```
2025-01-08 08:55:02 UTC: HTTPS mirror is down: Unreachable
2025-01-08 09:11:12 UTC: HTTPS mirror is up
```

If you considered `mirrorbits log` as a stable format (ie. maybe you
have scripts that parse its output), then take this commit as a breaking
change. You'll need to adjust your scripts accordingly.

2) Changes to the mirrorbits stderr logs

Similarily, the mirrorbits logs (that go to stderr by default) are
updated as such.

Before this commit:

```
2025/01/09 15:35:53.121 UTC kali.darklab.sh                Down! Status: 503
2025/01/10 03:48:56.864 UTC repo.jing.rocks                Up! (509ms)
```

After this commit:

```
2025/01/09 15:35:53.121 UTC kali.darklab.sh                HTTP  Down! Status: 503
2025/01/10 03:48:56.864 UTC repo.jing.rocks                HTTPS Up! (509ms)
```

3) How to review this commit

File `mirrors/logs.go` is simple to review, there is a new field `proto`
in the struct, and updates to support this new field.

The field `proto` is of type `Protocol`, this is a new type introduced
in `mirrors/mirrors.go`.

Finally, in order to actually add this field to the log, we need to
update the signatures of `MarkMirrorUp`, `MarkMirrorDown` and
`SetMirrorState` (`mirrors/mirrors.go`), then the caller
`daemon/monitor.go` can make use of it.
This is in preparation to support relative URLs in the `HttpURL` field.

These are trivial code changes, aimed at making the diff more readable
in the upcoming commit that will add support for relative URLs.
This is in preparation to support relative URLs.

This commit adds a new field `AbsoluteURL` in the `Mirror` struct
(`mirrors/mirrors.go`).

This field is derived from the field `HttpURL`, and is guaranteed to be
a absolute URL (ie. to have a scheme).  At the moment, it's equal to
`HttpURL` (since `HttpURL` always have a scheme), but that will change
in the next commits.

As can be seen (or rather, guessed) in the Mirror struct, this field is
among the fields that are set during mirror selection, so only the code
that comes after that can make use of it, and it is not saved in the
redis db.

In practice, this field is set in two places: `http/selection.go` and
also in `http/http.go` (in the code that handles fallbacks mirrors).
Before this commit, the value for HttpURL must start with either
`http://` or `https://`, for example `http://mirror.com/foobar/`.

With this commit, we also allow "relative URLs", ie. URLs without a
scheme, meaning that `mirror.com/foobar/` is now a valid value, for
mirrors and also for fallback mirrors (defined in mirrorbits.conf).

In case of relative URL, the protocol is assumed to be HTTP, but that's
only temporary. In upcoming commits, relative URL will mean that the
mirror supports both HTTP and HTTPS.

This change adds some complexity: all the code that uses `HttpURL` must
now check if the URL has a scheme, and deal with that. There's no
one-size-fits-all helper for that (even though for now it looks like
that, but that's because relative URL default to HTTP now, and that's
only temporary).

Additionally, codes that runs after the selection process, and that
requires a absolute URL, must now use `AbsoluteURL` instead of `HttpURL`
(that's `templates/mirrorlist.html` and `http/pagerenderer.go`).
So far, the field ExcludeReason had two uses. First, it was set during
health-check when the mirror was found to be down (hence commited to
redis db), and it was later set during mirror selection when the mirror
was excluded (in this case, the value set was never set in the redis db,
it was just discarded after the request was served).

So these are two different things that were conflated in one field, and
that worked, but now that we're about to support both HTTP and HTTPS for
mirrors, that won't cut it.

So, this commit does two things:

1) Rename ExcludeReason to DownReason - this is the field that is set by
   the health check

2) Add a new ExcludeReason field, which is to be set by the mirror
   selection, and used by the code that comes after that and wants to
   know why the mirror was excluded

Note that, given that the redis key is also renamed, there needs to be a
database migration - that will come in a later commit.
In previous commit "Support relative URL in the mirror HttpURL" we
started to accept mirror URLs without a scheme, eg. mirror.com/foobar/,
but in this case the scheme was HTTP by default, so it was not very
interesting...

With this commit, we change this behaviour: mirrors with a relative URL
now support both HTTP and HTTPS. It means that the health check is done
twice (for HTTP and for HTTPS, cf. `daemon/monitor.go`). We duplicate
the fields `Up, DownReason` to become `HttpUp, HttpsUp, HttpDownReason,
HttpsDownReason` in the Mirror struct `mirrors/mirrors.go`). And finally
rework the selection logic to redirect appropriately
(`http/selection.go`).

Note that, given that some redis keys are renamed, there needs to be a
database migration - that will come in a later commit.
Refer to previous commit for details.

Note that a good part of the code was taken from
database/v1/version1.go, it wasn't written from scratch.
As there was changes in rpc.proto in previous commits
Mirrorbits doens't respect the incoming protocol *strictly*, instead the
behavior is as such:
- HTTPS request: HTTPS redirect
- HTTP  request: HTTP or HTTPS redirect

So it doesn't try to respect the HTTP protocol, and is happy to redirect
HTTP requests to HTTPS. Which makes sense if mirrorbits is meant to
serve files downloaded from a web browser, where HTTPS is the norm.

However for Linux distro, for a package repository, it's important to
respect the incoming protocol. There are minimal systems that don't have
HTTPS support out of the box. For Debian (and Debian-like), the standard
Docker image doesn't have CA certificates pre-installed, and apt is
configured to use HTTP repositories. The first command to run is `apt
update`, and if ever that gets redirected to HTTPS, apt fails.

With this commit, we can now change this behavior, and ask mirrorbits to
respect the protocol strictly. As a bonus point, having it in
mirrobits.conf allows to document a rather obscure (but important)
aspect of mirrorbits behavior.

Note that, by default, the setting is true, so the default behavior of
mirrorbits is still the same: redirect HTTP to HTTPS is allowed.
@elboulangero
Copy link
Contributor Author

Rebased

@jbkempf jbkempf mentioned this pull request Feb 12, 2025
@jbkempf
Copy link
Collaborator

jbkempf commented Feb 12, 2025

I've reviewed patches 1 to 8 so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dual HTTP/HTTPS for a mirror
2 participants