Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to CLOCK_BOOTTIME and friends to improve accuracy. #58

Closed
tobz opened this issue Sep 23, 2021 · 2 comments
Closed

Switch to CLOCK_BOOTTIME and friends to improve accuracy. #58

tobz opened this issue Sep 23, 2021 · 2 comments

Comments

@tobz
Copy link
Member

tobz commented Sep 23, 2021

Per the discussion happening on rust-lang/rust#88714, there's a meaningful difference between CLOCK_BOOTTIME and CLOCK_MONOTONIC when it comes to time across system suspends on Linux. According to the issue, the problem they're trying to solve is that CLOCK_MONOTONIC stops ticking during suspend, while CLOCK_BOOTTIME does not. This raises two problems for quanta:

Monotonic mode

When invariant TSC support is not detected, we fall back to the "monotonic" mode where we query the time directly. This is all fine and good, but we're also querying with CLOCK_MONOTONIC, and similar variants on other platforms. This leaves us open to the exact same problem described in the above issue.

Counter (TSC) mode

While I have not fully traced whether or not this matters, there's a potential reality where CLOCK_MONOTONIC stops ticking during lower CPU power states, such that as we're going through the calibration loop, our reference drifts with every loop we perform. While invariant TSC should be guaranteed to tick at a constant rate -- recent Intel manuals specifically use the language of The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. -- this is moot if our initial reference/source calibration is off, as we need that in order to go from TSC cycles to real time units.

At any rate, switching shouldn't do anything but make things more accurate, but to reference the issue again, there are also some concerns about when the support for it was introduced, and on which platforms it matters. With that in mind, we likely need to wait for that PR to shake out to make sure we have a good example of where we'll need to make our changes.

@tobz
Copy link
Member Author

tobz commented Sep 24, 2021

Thinking about this more...

The issue is specific to when the system is in suspend, which relates to being in a "sleep state", or in ACPI parlance: S1-S5. The aforementioned snippet from the Intel Developers Manual states that the TSC runs at a constant rate in P/C/T-states, but those are mutually exclusive with sleep states.

Thus, if quanta is in counter/TSC mode, it cannot correctly handle the transition from S0 (not suspended, essentially) to S1-5 and back to S0. Regardless of whether or not our calibration is correct, the TSC simply stops, so we're going to lose time.

Given the performance goals of quanta, there's likely two things should do here:

  • implement the CLOCK_BOOTTIME/CLOCK_MONOTONIC cascaded logic mentioned in the stdlib PR
  • document that quanta is not guaranteed to maintain wall-time through system suspend

The first just makes sense: if we can, we might as well make the monotonic reference clock as close to wall time accurate as possible. The second, well, also just makes sense: quanta is about performance, but time is important, so we need to raise this limitation.

There's likely a future change we could make to allow only using the monotonic reference clock, for when users want quanta for its ability to mock time, but also need that time to advance with wall time.

@tobz
Copy link
Member Author

tobz commented Dec 25, 2021

Thinking more about this still, I don't think switching to CLOCK_BOOTTIME (or equivalents for other platforms) is the correct answer.

Consider our thoughts from the last comment: while CLOCK_MONOTONIC does not advance during suspend, neither should the TSC. Further still, programs should not be calibrating at the very instant that a machine is going to sleep or waking up... and if they were, the loss of TSC advancement or CLOCK_MONOTONIC advancement would skew the calibration loop and lead to an obviously bad result that got averaged out.

The above thought about documenting that quanta does not track time across suspends and so to exercise caution when getting deltas from an Instant is still a correct decision, I believe. I would also add another item to that list, though: maybe we should make Clock avoid using the TSC if a proper calibration cannot be achieved. Using the OS-provided timing facilities is still very fast in most cases, but trading speed for inaccuracy is not a good trade-off except when made explicitly i.e. using Clock::recent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant