Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to Fix PauliSumCollector.collect_async warning under notebook(short-term solution) #4004

Merged
merged 15 commits into from
Jul 19, 2021

Conversation

BichengYing
Copy link
Collaborator

@BichengYing BichengYing commented Apr 8, 2021

As mentioned in #3654:

PauliSumCollector.collect which uses asyncio doesn't always mix well with notebooks - throws RuntimeError: This event loop is already running or a more mysterious papermill error (KeyError: 'language_info') - but collect_async seems to work fine

The current status is using collect_async. However, in the website, it shows the warning like:
image

Also, the Energy 0.0 is not desired value, which is because the collector is not finished.

The problem mainly is because notebook itself is running under the asyncio. Importing nest-asyncio maybe is the easist fix.

Note: nest-asyncio seems working well in most cases, but in the colab environment, run all can cause the cell of async function hang forever. Fortunately, it can be avoided by run cell one-by-one.

@google-cla google-cla bot added the cla: yes Makes googlebot stop complaining. label Apr 8, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@BichengYing BichengYing marked this pull request as ready for review April 8, 2021 15:56
@BichengYing BichengYing requested review from cduck, vtomole and a team as code owners April 8, 2021 15:56
@rmlarose
Copy link
Contributor

rmlarose commented Apr 9, 2021

Also, the Energy 0.0 is not desired value, which is because the collector is not finished.

Oh, I didn't realize this value was incorrect - thanks for this. I was mistakenly content with CI passing after a long battle, mea culpa. (Note the papermill error was totally unrelated.)

@@ -789,7 +803,7 @@
"collector = cirq.PauliSumCollector(circuit, psum, samples_per_term=10_000)\n",
"\n",
"# Provide a sampler. See also: collector.collect(...).\n",
"collector.collect_async(sampler=cirq.Simulator())\n",
"collector.collect(sampler=cirq.Simulator())\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run this cell, but it's taking an unreasonably long time to complete (5+ minutes). To minimize the impact on test times, we should reduce samples_per_term to a number that allows this to complete in 30 seconds or less.

If this requires choosing an unrealistically low value for samples_per_term, please also add a comment explaining why we're using the new value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange. It is super quick to run that. In google colab that line takes less than 300ms.
image
In the github action, the whole notebook execution is like one and half minutes.
image

I wonder why it takes you so long time to complete. (I guess maybe related to live lock?) Could you show your test environment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running this directly from the Colab link (modified to point to the latest commit on this PR) and the execution consistently hangs or crashes - I haven't had a run complete successfully yet.

Logs include IndexError: pop from an empty deque, but it's unclear whether that's coming from the Colab restart or from asyncio.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, I've been running the linked colab with the Runtime > Run all command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, if I click Run all, it will fail. But if click Run before that Cell and then manually shift+Enter run that cell can be finished quickly. I guess Run all added another layer of running environment caused this. I will take a look to understand this better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yes, if I click Run all, it will fail. But if click Run before that Cell and then manually shift+Enter run that cell can be finished quickly.

I can reproduce this result - Run before and then running this cell also succeeds for me. Thanks for digging into this!

Copy link
Collaborator Author

@BichengYing BichengYing Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not understand why this happens. What I found is the TensorFlow Federated colab has encountered similar asynchrony issue and they used nest_asyncio as well for solution. I tried to use Run all their colab, and they do not have similar issue :/

@balopat balopat assigned balopat and 95-martin-orion and unassigned balopat Jul 1, 2021
@95-martin-orion
Copy link
Collaborator

@ybc1991, have you been able to find any solution for this? Based on #4197 it appears that the issue is still occurring.

If we have no way to resolve this from our end (e.g. if it's caused by an issue with nest_asyncio that has no workaround), I'd consider moving forwards with the PR with clear warning messages on the affected cell noting that "Run all" will misbehave.

@BichengYing
Copy link
Collaborator Author

@ybc1991, have you been able to find any solution for this? Based on #4197 it appears that the issue is still occurring.

If we have no way to resolve this from our end (e.g. if it's caused by an issue with nest_asyncio that has no workaround), I'd consider moving forwards with the PR with clear warning messages on the affected cell noting that "Run all" will misbehave.

I don't find the solution to fix it. I do test on other environment such as local jupyter notebook and the note book in vscode. Both can finish correctly with run-all button. So as suggested, I added a warning cell there. PTAL.

"id": "5d8e48230d73"
},
"source": [
"**Warning: If you run this notebook in colab environment with \"Run all\", the process may hang forever or you may encounter crash. Please manually run this cell either press the play button to the left of the code, or use the keyboard shortcut \"Command/Ctrl+Enter\".**"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits:

  • in colab environment -> in a colab environment
  • either press...or use -> either by pressing...or using
  • Instead of the process, I recommend saying the cell below to be abundantly clear which cell is the source of the issue. (As it is, it's unclear if this refers to the cell above or below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@95-martin-orion 95-martin-orion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Would you mind either commenting on #4004 or opening a new issue to track the "run all" dilemma here? I'm happy to merge this PR as-is, I just want to keep tabs on this for the future.

@BichengYing BichengYing changed the title Solve PauliSumCollector.collect_async warning under notebook Attempt to Fix PauliSumCollector.collect_async warning under notebook(short-term solution) Jul 7, 2021
@BichengYing
Copy link
Collaborator Author

Sure, I have updated the title and content of this PR. To point out this is the short-term solution and the problem of "run all" in the top comments.

@95-martin-orion 95-martin-orion added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Jul 7, 2021
@CirqBot CirqBot added the front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. label Jul 7, 2021
@CirqBot
Copy link
Collaborator

CirqBot commented Jul 7, 2021

Automerge cancelled: A status check is failing.

@CirqBot CirqBot removed automerge Tells CirqBot to sync and merge this PR. (If it's running.) front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. labels Jul 7, 2021
@95-martin-orion
Copy link
Collaborator

The error is an instance of #4293, unrelated to this PR.

@balopat
Copy link
Contributor

balopat commented Jul 7, 2021

The error is an instance of #4293, unrelated to this PR.

I reran the checks, that seems to push the situation further. This is a nasty flake from npm, as I'm looking into it, retries might be the only way to get over it until they fix it upstream.

@balopat balopat added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Jul 8, 2021
@CirqBot
Copy link
Collaborator

CirqBot commented Jul 8, 2021

Automerge cancelled: A status check is failing.

@CirqBot CirqBot removed the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Jul 8, 2021
@95-martin-orion 95-martin-orion merged commit dc01372 into quantumlib:master Jul 19, 2021
@BichengYing BichengYing deleted the async_warning branch August 13, 2021 06:19
rht pushed a commit to rht/Cirq that referenced this pull request May 1, 2023
…(short-term solution) (quantumlib#4004)

* add nest_asyncio to solve collector under notebook environment

* Fix the typo in clifford_gate::test_commutes_pauli

* Revert "Fix the typo in clifford_gate::test_commutes_pauli"

This reverts commit b2d9269.

* Add warning above the cell

* Fix for nit

Co-authored-by: Cirq Bot <craiggidney+github+cirqbot@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Makes googlebot stop complaining.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants