Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state: avoid infinite recursion in JobStore #1084

Merged
merged 4 commits into from
Oct 11, 2022
Merged

Conversation

radeksimko
Copy link
Member

@radeksimko radeksimko commented Oct 7, 2022

Fixes #1065

I couldn't come up with a more light-weight test which would reproduce the bug. As mentioned in the comment, it (scheduling over 1 million jobs) takes about 3-4 minutes to finish on M1 Pro and consumes decent chunk of almost all cores + ~5-6GB of memory at peak.

The Windows GHA environment runs out of memory before the test can even finish. 🙊

There are a few options to consider:

  1. remove the test entirely (some risk that we re-introduce the bug)
  2. gate the test on ENV variable, such as LONG_TESTS=1
  3. gate the test on build tags, such as -tags=longtests

We could also consider not running the test on non-empty CI ENV variable (which most CI systems set, incl. GitHub Actions), but I'd argue that it's probably not as valuable to run that test locally for most of the time either.

@radeksimko radeksimko added the bug Something isn't working label Oct 7, 2022
@radeksimko radeksimko force-pushed the b-fix-jobstore-recursion branch 6 times, most recently from 81aa877 to f52db4a Compare October 7, 2022 16:47
@radeksimko radeksimko marked this pull request as ready for review October 7, 2022 17:23
@radeksimko radeksimko requested a review from a team as a code owner October 7, 2022 17:23
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (especially you) did quite some work to reduce CI run time over the last couple of weeks. Therefore, I would instead not run this as part of our usual CI pipeline.

What do you think about running this as a nightly job, similar to the benchmarks? That would still ensure that we catch this bug before a release.

@radeksimko
Copy link
Member Author

What do you think about running this as a nightly job, similar to the benchmarks?

We could, but that would have to be limited to linux and/or macos as it's just impossible to run that in the Windows GHA environment.

@radeksimko radeksimko force-pushed the b-fix-jobstore-recursion branch 2 times, most recently from 909a2be to f6d7476 Compare October 10, 2022 18:18
@radeksimko
Copy link
Member Author

@dbanck I have gated the test on longtest build tag and added a nightly workflow, as suggested, PTAL

@radeksimko radeksimko requested a review from dbanck October 10, 2022 18:19
@radeksimko radeksimko force-pushed the b-fix-jobstore-recursion branch from f6d7476 to 155ded9 Compare October 10, 2022 18:20
@radeksimko radeksimko added this to the v0.29.3 milestone Oct 10, 2022
@radeksimko radeksimko force-pushed the b-fix-jobstore-recursion branch from 155ded9 to 09be5f9 Compare October 10, 2022 18:25
Copy link
Member

@dbanck dbanck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would just update the name

Co-authored-by: Daniel Banck <dbanck@users.noreply.github.com>
@radeksimko radeksimko merged commit e8397dc into main Oct 11, 2022
@radeksimko radeksimko deleted the b-fix-jobstore-recursion branch October 11, 2022 09:42
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runtime: goroutine stack exceeds 1000000000-byte limit
2 participants