Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted clipboard issue workaround on Windows #8277

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eirikbakke
Copy link
Contributor

Try to work around clipboard issues on Windows by always calling Clipboard.setContents from the Event Dispatch Thread.

This attempted workaround was motivated by @xxDark's investigation here.

The problem mentioned was that IsGettingOwnership would be returning true when awt_toolkit.cpp in the JDK tries to handle the WM_DESTROYCLIPBOARD event from the Win32, leading to the event being left unhandled and the cached clipboard content in Clipboard.contents remaining stale.

Grepping through the OpenJDK codebase, the IsGettingOwnership flag seems only to be modified when Java_sun_awt_windows_WClipboard_openClipboard is called with a non-null argument for newOwner. This, in turn, happens only when Clipboard.setContents is called.

Since race condition is related to a flag touched only by Clipboard.setContents, I am inclined to test this workaround where we call setContents only from the Event Dispatch Thread.

Note that IsGettingOwnership is called from the Toolkit thread, not the Event Dispatch Thread; from @xxDark's explanation I understand that these are different. But there's a high chance that the Toolkit thread and the EDT are "better" synchronized than the Toolkit thread and a random RequestProcessor thread.

Unfortunately I have never been able to reproduce the clipboard bug reliably on my own machine, and I spent my time mostly on MacOS these days. So I'm leaving this PR here for others to try if they have the chance.

…board.setContents from the Event Dispatch Thread.
@eirikbakke eirikbakke added do not merge Don't merge this PR, it is not ready or just demonstration purposes. Platform [ci] enable platform tests (platform/*) UI User Interface ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Feb 24, 2025
@xxDark
Copy link

xxDark commented Feb 24, 2025

Hi. This is not the issue and won't really help. You can call setContents on any thread. See the change in this PR openjdk/jdk#23736 to see what needs to be changed in the OpenJDK. There is a race condition in an upcall from native code. The only way I see this can be fixed temporary in NetBeans is to write an agent that will transform handleContentsChanged to schedule the execution on the EDT when needed and synchronize on clipboard access.

@eirikbakke
Copy link
Contributor Author

@xxDark Just to confirm; did the change in that OpenJDK PR fix the original problem in NetBeans, rather than just avoiding the assertion failures?

Did you figure out how calling handleContentsChanged from the wrong thread could end up causing IsGettingOwnership to return true when it shouldn't?

Our challenge is that we can't easily change the JDK, but we can easily make changes to NetBeans, if a workaround can be found there.

@xxDark
Copy link

xxDark commented Feb 24, 2025

@xxDark Just to confirm; did the change in that OpenJDK PR fix the original problem in NetBeans, rather than just avoiding the assertion failures?

Did you figure out how calling handleContentsChanged from the wrong thread could end up causing IsGettingOwnership to return true when it shouldn't?

Our challenge is that we can't easily change the JDK, but we can easily make changes to NetBeans, if a workaround can be found there.

Yes, I've verified that this fixes the issue.
However, I've not got to figuring out how this fixes the issue. I may take a look tomorrow at what OpenClipboard does on Windows under the hood/take a deeper look into AWT innerworkings with the clipboard.
I made a java agent that patches WClipboard that can be used in NetBeans and can confirm that with it as well, I can't reproduce the issue. (Without it, I can reproduce it immediately)

@xxDark
Copy link

xxDark commented Feb 24, 2025

I've published the agent to https://github.com/xxDark/clipboard-agent. Someone from NetBeans team can adapt it as I've not figured out how to set the project up properly in IntelliJ, unfortunately.

@eirikbakke
Copy link
Contributor Author

@xxDark Whoa, I didn't know you could do that! Thanks for posting and documenting the agent code. I think we could adapt that, if nobody discovers a simpler workaround.

@neilcsmith-net
Copy link
Member

@eirikbakke well, we did just decide against adding an agent from #3386 as part of #7928 That also has clipboard implications to address in NB26.

@eirikbakke
Copy link
Contributor Author

@neilcsmith-net Yeah, it's a bit of a heavy-handed approach. I'd prefer if we could find a simpler workaround.

I still think it would be worth testing this PR, in case there are bugs that make setContents not fully thread-safe.

@xxDark
Copy link

xxDark commented Feb 24, 2025

I'd prefer if we could find a simpler workaround.

I can probably make a workaround that does not need a java agent specifically for NetBeans, but that will require working with JDK internals regardless. If that is OK to do so, then I can probably make a PR

@mbien
Copy link
Member

mbien commented Feb 24, 2025

if it is indeed a JDK bug, getting it in should have priority IMO. Workarounds are just distractions which would have to be removed again anyway most likely.

This bug exists since pre-apache times its not something what has to be rushed. Some JDK bugs are also backported, this is something what we can't do in NB since the major versions go only forward with no hotfixes - JDK update would cover that too.

JDK PRs won't even reach the mailing list before the paperwork is done. Would be good to get in touch with someone from the JDK client team but I don't know anyone (anymore) unfortunately. And just-to-be-sure it would be good to check that this is indeed fixing the issues of those who are affected.

If it would be linux I would have hosted a JDK dev build by now but I don't want to setup a win image in vbox right now.

@xxDark
Copy link

xxDark commented Feb 24, 2025

And just-to-be-sure it would be good to check that this is indeed fixing the issues of those who are affected.

I think that it would be great if some users could test the agent (edit: or, I could implement some hack in NetBeans to address this issue without the agent, if it is OK to reach into JDK internals, effectively achieving the same fix) to see if it fixes the issue. Would be awkward if it ends up not fixing the bug (although assert failures in the OpenJDK already show that something is wrong..).

@mbien
Copy link
Member

mbien commented Feb 24, 2025

I think that it would be great if some users could test the agent

yep. I am a bit busy right now, but any other PMC could take a look through the agent code and if its ok create a dev build of NB (via PR) for testing purposes which injects it on startup, then post a mail to dev/users to get feedback etc.

Thanks for putting so much effort into this btw @xxDark

@mbien
Copy link
Member

mbien commented Feb 24, 2025

(i mean. It doesn't even need a dev build. the instructions in the agent readme should be enough.)

@xxDark
Copy link

xxDark commented Feb 24, 2025

I think I've figured out what happens (this is just an educated guess because I didn't really follow what pfnWowEmptyClipBoard calls, but the symptoms of lost `WM_DESTROYCLIPBOARD` are here):
BOOL __stdcall OpenClipboard(HWND hWndNewOwner)
{
  DWORD emptyClipboard;
  DWORD result = NtUserOpenClipboard(hWndNewOwner, &emptyClipboard);
  if ( emptyClipboard )
    ClientEmptyClipboard();
  return result;
}

Call to OpenClipboard fails in the JDK, because both toolkit & EDT are trying to access it.
emptyClipboard never gets set to true (?), so ClientEmptyClipboard is never called. I think that ClientEmptyClipboard is a user-mode version of EmptyClipboard which calls NtUserEmptyClipboard. In MS documentation, there is a reference to WM_DESTROYCLIPBOARD (keep this code in mind).
ClientEmptyClipboard looks like this (and not called):

void ClientEmptyClipboard(void)
{
  struct _HANDLENODE *v0; // rbx
  struct _HANDLENODE *v1; // rdi

  RtlEnterCriticalSection(&gcsClipboard);
  v0 = (struct _HANDLENODE *)gphn;
  if ( gphn )
  {
    do
    {
      v1 = *(struct _HANDLENODE **)v0;
      if ( *((_QWORD *)v0 + 3) )
        DeleteClientClipboardHandle(v0);
      RtlFreeHeap(pUserHeap, 0, v0);
      v0 = v1;
    }
    while ( v1 );
  }
  gphn = 0LL;
  if ( pfnWowEmptyClipBoard )
    pfnWowEmptyClipBoard(); // Supposed to send WM_DESTROYCLIPBOARD here, but because OpenClipboard did not succeed, the message is silently discarded (?)
  RtlLeaveCriticalSection(&gcsClipboard);
}

So TL;DR LostOwnership never gets a chance to be called because WM_DESTROYCLIPBOARD does not get delivered. If we follow what native code does: supposed to clear current cached contents, but never gets a chance to do so.

Edit: There also could be a possibility that due to how the code is structured, JDK code calls CloseClipboard all the time as well, which again, introduces another function into the mix that is constantly called from multiple threads, regardless of whether the previous OpenClipboard call succeeded on another thread. Either way, this is 99% a race condition that somehow results in the window manager not sending WM_DESTROYCLIPBOARD~ or JDK ignoring the sent message due to the bogus check on line 1039 in awt_toolkit.cpp that at the right moment gets flipped to true (most likely!), discarding the message.

@neilcsmith-net
Copy link
Member

@xxDark curious on your thoughts on the patch at JetBrains/JetBrainsRuntime@6b2c6ff Mind you we know JBR doesn't fix the issue for some reporters here, and I also noticed people are reporting the issue isn't fixed there either, and are even referring to the bug report and replicator in NetBeans.

@xxDark
Copy link

xxDark commented Feb 25, 2025

@xxDark curious on your thoughts on the patch at JetBrains/JetBrainsRuntime@6b2c6ff Mind you we know JBR doesn't fix the issue for some reporters here, and I also noticed people are reporting the issue isn't fixed there either, and are even referring to the bug report and replicator in NetBeans.

Yep, sounds exactly like the issue in JDK we pinpointed.
JetBrains/JetBrainsRuntime@6b2c6ff#diff-ac6d3b12e337195a310db3fc283231f5b7e4e7c555eb0955c49fede89ed6046bR102 is the same I did with breakpoint trick "If one puts a breakpoint here https://github.com/openjdk/jdk/blob/a891630817844c8c42994da3b3110925ca4595a0/src/java.desktop/share/classes/sun/awt/datatransfer/SunClipboard.java#L135 and forcefully sets contents to null, magically, everything will work again."
Call to JetBrains/JetBrainsRuntime@6b2c6ff#diff-ac6d3b12e337195a310db3fc283231f5b7e4e7c555eb0955c49fede89ed6046bR122 on window activation if ownership is lost. "AWT is supposed to call lostSelectionOwnershipImpl from the toolkit event loop, but this never happens, leaving cached "contents" in SunClipboard poisoned."

@xxDark
Copy link

xxDark commented Feb 25, 2025

I think it doesn't fix the issue because to me it seems like a hacky attempt to fix the main issue with the race condition that was found. The toolkit is still calling OpenClipboard without proper synchronization on Java side.

@xxDark
Copy link

xxDark commented Feb 25, 2025

After more digging, to answer your question @eirikbakke: I can't say for sure the cause of this.
There are multiple factors:

  • Windows is closed source, so tracing the interaction between user mode and the kernel is difficult. I don't have enough knowledge to do this. Does pfnWowEmptyClipBoard post the related window message? Is it done in the kernel mode?
  • JDK on its own has issues with concurrent clipboard access, as was figured out
  • There is a lot of code without comments in AWT making investigation harder. For example, my current fix ensures that clipboard interactions from the troolkit thread are enqueued to EDT if there is one running. However, is this the correct fix? There is this comment https://github.com/openjdk/jdk/blob/cfeb7d6c964f63184c939f6f0625c6e7f1afdc31/src/java.desktop/windows/classes/sun/awt/windows/WClipboard.java#L67. While it talks about not calling the code on the toolkit thread, what if the opposite is also true and there are scenarios when some code should never be called on the EDT? There is no further elaboration on what "security holes" authors of this class talks about.
  • Further extending the second point, native code is also missing a lot of comments. Take the problematic AwtClipboard::IsGettingOwnership check (which I now think is probably the culprit). I don't have the time at this moment to do further diagnostics on this. From the sources:
    /* This flag is set while we call EmptyClipboard to indicate to
    WM_DESTROYCLIPBOARD handler that we are not losing ownership */

    But why is this flag needed? If you follow along, you would assume, that if the window messages are processed on the toolkit thread, this seem to make no sense. The only point at which this flag is set to true is here. Why? This method never blocks the caller thread and never waits for the WM_DESTROYCLIPBOARD to arrive. The WM_DESTROYCLIPBOARD arrives on the toolkit thread, but the caller can be any thread. So this is just a race condition and a check here seem bogus to me. Thats why a lot of users probably see the bug manifest only after some time. But for me it happens immediately. I did further experiments yesterday, and noticed that if I just spam CTRL+C on a non-empty line, which apparently copies the current line, this is enough to trigger the bug if you quickly swap to the secondary window.

One thing for certain is that there is a flaw at how JDK handles clipboard access IMO. Logic like this for example is equivalent to this:

try {
    lock.lock();
    ....
} finally {
    lock.unlock();
}

What if Lock::lock throws? finally closure will just blindly unlock the lock if another thread is currently holding it? Absolutely, if there is no protection against that. According to ReactOS source code, there is some protection in CloseClipboard. But I can't for sure that there is in Windows. Maybe the current logic is correct and intended to be this way. But this seems very wrong to me

@johntor
Copy link
Contributor

johntor commented Feb 27, 2025

I can confirm that clipboard-agent is working fine!

Nice job!
J!

@eirikbakke
Copy link
Contributor Author

eirikbakke commented Feb 28, 2025

@xxDark Thanks for the further investigation! That's a very useful writeup for the future.

@johntor Thanks for testing! Did you observe copy/paste bugs in the past, which were eliminated by the agent? EDIT: And in particular, I noticed in another thread that you have also been experimenting with changes to the SecurityManager settings. It would be useful to know how sure you are that the bug is fixed by the xxDark's "agent" workaround, as opposed to other changes you may have experimented with. E.g. disable the agent, run NetBeans until you see the cut/paste bugs, then enable the agent without making any other configuration changes, and see how long you can go without seeing the cut/paste bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) do not merge Don't merge this PR, it is not ready or just demonstration purposes. Platform [ci] enable platform tests (platform/*) UI User Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants