-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mirrorbits for linux distributions (rapidly changing metadata files) #85
Comments
ping |
Perhaps it would be best if the origin server would also keep old files for as long as it expects that clients have not updated their metadata. This would remove this burden from the mirrors and the download redirector. |
With this setting, files are allowed to be outdated on the mirrors, for at most MaxOutdated minutes. The filesize check is also disabled for those files. Use-case: for a Debian-like distribution, the metadata (ie. the directory /dists) are updated in-place, so we must give time for mirrors to sync, and then for mirrorbits to be aware of the changes. Otherwise, as soon as the source is updated and scanned, Mirrorbits will go in fallback mode for all the files under /dists, since at this point, either mirrors didn't sync yet, either they did but Mirrorbits is not aware of it yet (as the interval to scan mirrors is higher than the interval to scan the source). Cf. etix#85 for more details.
@stormi I'd be curious to know if/how you solved it for XCP-ng. I've been looking at the same issue for Kali Linux. Quoting what you said at the time:
For Kali, we don't need to « "remember" deleted files for a while », in the sense that we solve this issue with reprepro, the tool that generates the repository. When a package is replaced by a new one, we keep the old package around in the archive for a few days. Hence this problem doesn't need to be solved at the Mirrorbits level. However, I definitely observed the issue that you mentioned with metadata (the files that are updated in place). When we push an update of the repo, Mirrorbits will be quickly aware of the new version of the metadata files, and since it didn't rescan the mirrors yet (and maybe the mirrors didn't even sync anyway), it can't redirect, and it goes in fallback mode for those files. « accepting to serve an old version of some files » seems to a good solution for Kali, so I implemented this feature in #147. |
@elboulangero no, we haven't solved it. Thankfully, the metadata on non-testing repositories doesn't change often, so issues are rare. |
@stormi I'd like to improve the MR #147 so that it would work for RPM repos as well. As I said quickly above, the idea with this MR is to tell mirrorbits to accept serving old versions of some files, and within a certain time limit (when files are really too old, mirrorbits will stop serving it). So far, the setting I proposed is pretty crude, as the only matching option is a prefix. It works for Kali, as all I want to do is to match requests paths that start with Now, how would that go for the XCP-ng repo, what outdated files do you need to match? I had a quick look, it seems like we could match As you rightfully pointed out, from the moment we allow mirrorbits to serve old metadata, the next issue is that clients that get these old metadata will also request old files, that might not be on the repo anymore. (NB: Mirrorbits won't serve files that are not on the local repo, it doesn't matter if those files are still on the mirrors). You suggested that Mirrorbits could try to keep track of deleted files for a while. Now that I'm familiar with the code of mirrorbits, I'd prefer to avoid this route. Ok, I'm a bit biased as for my own use-case (Kali), we already solve this issue outside of Mirrorbits. But still, I wonder if you could look at the options you have with the tool you use to create and manage your RPM repository. Is there any option to snapshot a repository? I suggest the idea of "snapshot" because that's how we do it for the Kali repo, with reprepro. Every time we update the repo (4 times a day, as Kali is a rolling distro), we take a snapshot of the distro. We keep something like the last 10 snapshots. It means that after packages are removed from Kali rolling, they still linger around for 2.5 days, as the snapshots still hold a reference to it. Can you take the same approach for your RPM distro? @lazka Please allow me to pull you in the discussion, as you're maintaining a Arch-based distro it seems, and I'd like to also have your feedback. Do you have the same kind of issue, to start with? |
We only upload ~1 a day, and the only metadata change there is that the database files change, which amounts to ~12MB. While that means all clients will pull from the main server, it hasn't been a problem so far (at least no one complained). We don't have that many users, and most traffic comes from downloading packages which this doesn't affect, also we have enough mirrors that things get in sync quite fast. So it's definitely not a problem traffic wise, but might result in sluggish database syncs for some far away users for a bit. There is also an upcoming change in pacman where package signatures will be moved out of the database files, which will reduce the metadata size by ~50%. As for trying to fetch no longer existing files: We keep all packages for >1.5 years before we prune them, so this isn't really an issue for us. tl;dr: we don't have that many packages or users for this to be a big problem. The only potential problem I see with serving existing files from outdated mirrors is that two DB syncs in a short period might lead to pacman doing package downgrades, if it happens to hit an old mirror after a fresh one, which we don't support really. |
Ack, thanks very much for your detailed reply @lazka!
Ah Ok. This is not a problem on Debian's side, as apt will silently discard a Release file that is older than the local one. So if we hit an old mirror after a fresh one, from apt point of view it just means that the system is up-to-date. |
Something else I wanted to share in this discussion: the methodology (and scripts) I used to monitor the availability of some files. In short:
And here's the result, requesting the What we clearly see above is that, after the sync of 18:00 and the sync of 06:00, for a while the I don't know why the number of returned mirrors goes way above 4, and then drop suddenly to 4 at some point. I'm sure this can be explained by a careful reading of the selection algorithm... Anyway. So if someone wants to do the same check and produce a similar graph, I pushed the scripts at: https://gitlab.com/kalilinux/tools/mirrorbits-scripts/-/tree/main/check-availability. It's very straightforward to use it, there's even a README! |
Hi! Sorry for the late reply. So, as I understand it, the problem is that most filenames contain unique identifiers in See the current contents of one of the
And the the contents of
The next time we regenerate the medata, filenames will change. |
If I understand correctly: yum (or is it dnf?) downloads the I ask for comparison with |
I think it does hit the redirector again, because it is not aware there is any redirector at all, with mirrorbits. This is the big difference with other mirror management software that distros may use, be it with Now maybe I'm wrong and there's some logic in |
Sorry for being late, I missed your reply.
100% sure, let me detail. First, we can easily log the requests that are sent by apt. So here's a
To translate that to words:
It was implemented in |
The new setting AllowOutdatedFiles allows user to define which files are allowed to be outdated on the mirrors, and for how long. The user defines a list of rules, each rule is of the form: - Prefix: matched against the beginning of the path of the requested file - Minute: if Prefix matches, how long the file is allowed to be oudated AllowOutdatedFiles is a list of rules, they are checked in order, and the first rule that matches is selected. Note that, when a rule matches, the filesize check is also disabled for this file. As it wouldn't make much sense if we allowed a file to be outdated, but didn't allow it to be of a different size. Now, here's the use-case for this setting. For a Debian-like distribution, the directory `/dists` (aka. the metadata of the repository) contains a lot of files that are updated in-place. Each time the repository is updated, and immediately after mirrorbits rescans the local repo, mirrorbits redirects all the traffic for those files to the fallback mirror, since they have a new modtime, a new size, and mirrorbits doesn't know yet any mirror with those new files. It's only after 1) mirrors sync with the origin repository and 2) mirrorbits scans the updated mirrors, that it can redirect traffic to mirrors again. For more details in a real-life setup: Kali Linux is a rolling distro, the repository is updated every 6 hours, and mirrors are scanned every hour. In effect, it means that every 6 hours, mirrorbits redirects most of the metadata traffic to the fallback mirrors, then it takes around 1 to 2 hours before all the mirrors are scanned and traffic flows back to normal. Then again, 4 times a day. To prevent that, Kali uses the following setting: ``` AllowOutdatedFiles: - Prefix: /dists/ Minutes: 540 ``` Cf. etix#85 for more details.
As discussed on IRC, in the context of a linux distribution, repository metadata files can change quite often, and when they change it can cause a delay during which no mirror can serve those files (unless you provide a least one mirror that syncs instantly).
It can also happen that a user with a slightly older repository metadata cache tries to install a file from the mirrors and get an error because the file does not exist anymore in mirrorbits local reference, and there's no grace delay to let the cache expire (usually a few hours). It might be preferrable to let the request reach one of the mirrors that have not synced yet and that still have the file.
A few leads (may contain very bad ideas!):
The text was updated successfully, but these errors were encountered: