Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8319447: Improve performance of delayed task handling #23702

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

DougLea
Copy link
Contributor

@DougLea DougLea commented Feb 19, 2025

(Copied from https://bugs.openjdk.org/browse/JDK-8319447)

The problems addressed by this CR/PR are that ScheduledThreadPoolExecutor is both ill-suited for many (if not most) of its applications, and is a performance bottleneck (as seen especially in Loom and CompletableFuture usages). After considering many options over the years, the approach taken here is to connect (lazily, only if used) a form of ScheduledExecutorService (DelayScheduler) to any ForkJoinPool (including the commonPool), which can then use more efficient and scalable techniques to request and trigger delayed actions, periodic actions, and cancellations, as well as coordinate shutdown and termination mechanics (see the internal documentation in DelayScheduler.java for algotihmic details). This speeds up some Loom operations by almost an order of magnitude (and similarly for CompletableFuture). Further incremental improvements may be possible, but delay scheduling overhead is now unlikely to be a common performance concern.

We also introduce method submitWithTimeout to schedule a timeout that cancels or otherwise completes a submitted task that takes too long. Support for this very common usage was missing from the ScheduledExecutorService API, and workarounds that users have tried are wasteful, often leaky, and error-prone. This cannot be added to the ScheduledExecutorService interface because it relies on ForkJoinTask methods (such as completeExceptionally) to be available in user-supplied timeout actions. The need to allow a pluggable handler reflects experience with the similar CompletableFuture.orTimeout, which users have found not to be flexible enough, so might be subject of future improvements.

A DelayScheduler is optionally (on first use of a scheduling method) constructed and started as part of a ForkJoinPool, not any other kind of ExecutorService. It doesn't make sense to do so with the other j.u.c pool implementation ThreadPoolExecutor. ScheduledThreadPoolExecutor already extends it in incompatible ways (which is why we can't just improve or replace STPE internals). However, as discussed in internal documentation, the implementation isolates calls and callbacks in a way that could be extracted out into (package-private) interfaces if another j.u.c pool type is introduced.

Only one of the policy controls in ScheduledThreadPoolExecutor applies to ForkJoinPools with DelaySchedulers: new method cancelDelayedTasksOnShutdown controls whether quiescent shutdown should wait for delayed tasks to become enabled and execute, or to cancel them and terminate. The default (to wait) matches default settings of STPE. Also new method getDelayedTaskCount allows monitoring.

We don't expect any compatibility issues: In the unlikely event that someone else has added the four SES methods to a ForkJoinPool subclass, they will continue to work as overrides. It's hard to imagine that anyone has added a form of submitWithTimeout with the same signature (Callable callable, long timeout, TimeUnit unit, Consumer<ForkJoinTask> timeoutAction), or methods with names cancelDelayedTasksOnShutdown or getDelayedTaskCount.

A snag: for reasons that should now be deprecated, ForkJoinPool allow users to externally (via properties) set the commonPool to zero parallelism, disabling worker thread creation, which makes no sense if a DelayScheduler is used. This property setting was made available to JavaEE frameworks so they could ensure that no new threads would be created in service of parallelStream operations (which are structured to be (slowly) executable single-threadedly via caller joins). However, no other async uses would work, which led to workarounds in CompletableFuture (also SubmissionPublisher) to handle this case by generating other threads. It was another arguably wrong decision to do this as well. A better solution that is backward compatible is to internally override commonPool parallelism zero only if known-async methods are used. This preserves original intent, and passes jtreg/tck tests that check for lack of thread creation in parallelStreams and related usages that don't otherwise use any async j.u.c components (although with some changes in unnecessarily implementation-dependent tests that made assumptions about exactly which threads/pools were used). It does require some changes in the wording of disclaimers in CompletableFuture and elsewhere though. And eventually, all this should go away, although it's not clear know how to deprecate a property setting. For now, the class-level javadoc has been updated to discourage use.

One remaining issue is whether to expose the underlying ScheduledForkJoinTask type. It currently isn't, in part because it would also require exposing currently non-public intervening classes (mainly FJT.InterruptibleTask). The main disadvantage with not exposing is that the schedule methods merely document that the returned ScheduledFuture is a ForkJoinTask rather than include this in signature. This could be revisited in the future without introducing incompatibilities (but with some internal implementation challenges to remove reliance on non-public-ness).

As minor follow-ups, we might expand use of DelaySchedulers internally in j.u.c classes to replace some timed waits.

Edit
Delete


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8350493 to be approved

Issues

  • JDK-8319447: Improve performance of delayed task handling (Enhancement - P4)
  • JDK-8350493: Improve performance of delayed task handling (CSR)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23702/head:pull/23702
$ git checkout pull/23702

Update a local copy of the PR:
$ git checkout pull/23702
$ git pull https://git.openjdk.org/jdk.git pull/23702/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23702

View PR using the GUI difftool:
$ git pr show -t 23702

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23702.diff

Using Webrev

Link to Webrev Comment

@DougLea
Copy link
Contributor Author

DougLea commented Feb 23, 2025

As a bit of cleanup, I updated to regularize parameter checking (mainly null checks) in ForkJoin classes, not just those directly impacted when adding scheduling methods. Sorry to add so many boring diffs.

@sunmisc
Copy link

sunmisc commented Feb 25, 2025

@sunmisc You are right that it would be nice if there were a way to efficiently use getAndSet here because a failed reference CAS hits slow paths that vary across GCs. But all of the ways I know to do this are much worse.

After a few days of benchmarks, I realized that you would be absolutely right. Although I thought if we separate the head (for deleting) and the tail (for inserting) there would be less contention.
Even the fact that we can only modify the head (delete) in one thread without volatile does not help.
Perhaps I have made a mistake somewhere in the implementation

@DougLea
Copy link
Contributor Author

DougLea commented Feb 25, 2025

@sunmisc Thanks for independently trying alternatives. We both had reasons to suspect that other mechanics might work out as well or better, but none seem to.

@AlanBateman
Copy link
Contributor

I've done testing with the latest changes (to commit 9cc670b), and with the VirtualThread implementation changed to use it (pull/24030) and all looking good.

@dchuyko
Copy link
Member

dchuyko commented Mar 19, 2025

Not related to delayed task handling performance but related to the TC_MASK cleanup: as noted in the review of the TC masking fix [1], RC is not always masked accurately (though it's harmless), so maybe such cleanup could be made here as well.

[1] #24034 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org csr Pull request needs approved CSR before integration rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

6 participants