-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: fixedbug/issue10958.go failures #18589
Comments
CL https://golang.org/cl/35051 mentions this issue. |
Can you try upping the timeout in that test? See https://go-review.googlesource.com/35051 for an example:
I figure I get one try at not-flaky, if it fails, we yank the test. |
Do I have to launch the test via Because if I |
In test directory The data race is more or less required -- the threads have to enter their actually-but-not-obviously infinite loops, and those loops cannot contain calls. The test is whether the compiler is properly inserting rescheduling checks into those loops, and if they have the intended effect. |
|
That message is from an "old" compiler. There's a new phase in the one paired with that test, and the phase is present whether GOEXPERIMENT is set or not. |
Yeah sorry I set a bad
Still seeing failures with |
Too much concurrency in all.bash, perhaps? Thanks for checking this. |
Ok. I'll let you decide what to do with this issue. |
It's worth disabling. It's only for an experiment, and flaky tests are vile. |
A few thoughts: You should be able to reproduce it with just If you run with |
Thanks for the suggestion.
In the meantime, another CL was merged, the builders tested it and two failed on this test: freebsd/arm: https://build.golang.org/log/02cb9ac66d2d9a19feb6fb479917e709a50850d5 so yeah, it's really flaky. |
With a really high timeout (I tried 600) it does not fail, so I guess the machine is just choking a little on that test. It has been disabled now so we're fine. |
@ALTree can you say more about the machine? Is it a VM? Hardware? Software? How many cores? What is GOMAXPROCS? Are there other things running on the machine? It's possible the OS descheduled a thread for 10 seconds but that seems like a long time. |
Just to confirm that the timeout was correctly set to 10, I tried again (with
(note 10.025s at the end). No VM, it's real hardware. The machine is an old NeXtScale cluster (we'll dismiss it during 2017 I believe). OS is CentOS 6.5,
There are many nodes, and each node has Maybe what's happening is that
Not set.
Possibly. Since I've only requested about half a node, if another user asks for just a few cores the job manager could assign the job to the same node. The 8 cores I requested are mine though (and they're guaranteed). |
I think all we need to do here is increase the timeout, for Go 1.9. |
I modified the test (and the rescheduling code also) in this CL, it may work better now: https://go-review.googlesource.com/c/36206/ |
Seen on our openbsd-amd64 builder https://build.golang.org/log/86fd5a7f6d52845eeecce77a39bb5e799ebf2b6a Alex |
The last four openbsd-386 builds all failed with this too. I'm going to disable the test. |
CL https://golang.org/cl/40651 mentions this issue. |
Updates #18589 Change-Id: I2c3bbc8257c68295051bd2e63e1e11794d0609c3 Reviewed-on: https://go-review.googlesource.com/40651 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
CL https://golang.org/cl/42431 mentions this issue. |
5 shards, each of which spins up NumCPU processes, each of which is running at GOMAXPROCS=NumCPU, is too much for one machine. It makes my laptop unusable. It might also be in part responsible for test flakes that require a moderately responsive system, like #18589 (backedge scheduling) and #19276 (locklinear). It's possible that Go should be a better neighbor in general; that's #17969. In the meantime, fix this corner of the world. Builders snapshot the world and run shards on different machines, so keeping sharding high for them is good. This is a partial reversion of CL 18199. Fixes #20141. Change-Id: I123cf9436f4f4da3550372896265c38117b78071 Reviewed-on: https://go-review.googlesource.com/42431 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
5 shards, each of which spins up NumCPU processes, each of which is running at GOMAXPROCS=NumCPU, is too much for one machine. It makes my laptop unusable. It might also be in part responsible for test flakes that require a moderately responsive system, like golang#18589 (backedge scheduling) and golang#19276 (locklinear). It's possible that Go should be a better neighbor in general; that's golang#17969. In the meantime, fix this corner of the world. Builders snapshot the world and run shards on different machines, so keeping sharding high for them is good. This is a partial reversion of CL 18199. Fixes golang#20141. Change-Id: I123cf9436f4f4da3550372896265c38117b78071
Moved to 1.12, the test will likely change or be obsoleted when we add signal-based preemption. |
Punting to 1.13, we aren't going to do anything for this cycle. |
The
fixedbug/issue10958.go
test, introduced in 7f1ff65 ( cmd/compile: insert scheduling checks on loop backedges), is flaky.What version of Go are you using (
go version
)?go version devel +7f1ff65 Mon Jan 9 21:01:29 2017 +0000 linux/amd64
What operating system and processor architecture are you using (
go env
)?What did you do?
Ran
all.bash
3 times on a CentOS 6.5 server.What did you see?
It failed reliably with:
It happened on the linux/mips64 builder too:
https://build.golang.org/log/3f4be6b0c792fd179683d0046a2e6f178f8928d2
The text was updated successfully, but these errors were encountered: