-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: clean up async preemption loose ends #36365
Comments
Change https://golang.org/cl/229298 mentions this issue: |
Change https://golang.org/cl/229300 mentions this issue: |
Change https://golang.org/cl/229299 mentions this issue: |
Change https://golang.org/cl/230539 mentions this issue: |
Change https://golang.org/cl/230542 mentions this issue: |
Change https://golang.org/cl/230544 mentions this issue: |
Change https://golang.org/cl/230541 mentions this issue: |
Change https://golang.org/cl/230540 mentions this issue: |
Change https://golang.org/cl/230543 mentions this issue: |
Currently, newproc1 allocates, initializes, and schedules a new goroutine. We're about to change debug call injection in a way that will need to create a new goroutine without immediately scheduling it. To prepare for that, make scheduling the responsibility of newproc1's caller. Currently, there's exactly one caller (newproc), so this simply shifts that responsibility. For #36365. Change-Id: Idacd06b63e738982e840fe995d891bfd377ce23b Reviewed-on: https://go-review.googlesource.com/c/go/+/229298 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, when a debugger injects a call, that call happens on the goroutine where the debugger injected it. However, this requires significant runtime complexity that we're about to remove. To prepare for this, this CL switches to a different approach that leaves the interrupted goroutine parked and runs the debug call on a new goroutine. When the debug call returns, it resumes the original goroutine. This should be essentially transparent to debuggers. It follows the exact same call injection protocol and ensures the whole protocol executes indivisibly on a single OS thread. The only difference is that the current G and stack now change part way through the protocol. For #36365. Change-Id: I68463bfd73cbee06cfc49999606410a59dd8f653 Reviewed-on: https://go-review.googlesource.com/c/go/+/229299 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
A debugger can inject a call at almost any PC, which causes significant complications with stack scanning and growth. Currently, the runtime solves this using precise stack maps and register maps at nearly all PCs, but these extra maps require roughly 5% of the binary. These extra maps were originally considered worth this space because they were intended to be used for non-cooperative preemption, but are now used only for debug call injection. This CL switches from using precise maps to instead using conservative frame scanning, much like how non-cooperative preemption works. When a call is injected, the runtime flushes all potential pointer registers to the stack, and then treats that frame as well as the interrupted frame conservatively. The limitation of conservative frame scanning is that we cannot grow the goroutine stack. That's doable because the previous CL switched to performing debug calls on a new goroutine, where they are free to grow the stack. With this CL, there are no remaining uses of precise register maps (though we still use the unsafe-point information that's encoded in the register map PCDATA stream), and stack maps are only used at call sites. For #36365. Change-Id: Ie217b6711f3741ccc437552d8ff88f961a73cee0 Reviewed-on: https://go-review.googlesource.com/c/go/+/229300 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
PanicBounds and PanicExtend are lowered to runtime calls (with a non-Go ABI), but are not currently marked as calls. Since liveness analysis only emits stack maps at calls in the runtime, this means these panic call sites in the runtime won't get a stack map. These almost immediately turn into throws in the runtime, but there's still a chance they'll try to grow the stack first, which would lead to a different panic. To fix this, mark these operations as calls. Outside the runtime, we currently emit stack maps for everything that isn't an unsafe-point, so these panic calls get stack maps by default. However, we're about to move to emitting stack maps only at call sites, at which point this will start to matter outside the runtime as well. I confirmed that this has no effect on anything but PCDATA/FUNCDATA in runtime and net/http. For #36365. Change-Id: Ic5bb463fd152cc320c815dc04cf62005261ae169 Reviewed-on: https://go-review.googlesource.com/c/go/+/230539 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, this function conflates two (easily conflated!) concepts: whether a Value is a safe-point and whether it has a stack map. In particular, call Values may not be a safe-point, but may need a stack map anyway in case the called function grows the stack. Hence, rename this function to "hasStackMap", since that's really what it represents. For #36365. Change-Id: I89839de0be8db3be3f0d3a7fb5fcf0b0b6ebc98a Reviewed-on: https://go-review.googlesource.com/c/go/+/230540 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
The compiler currently conflates whether a Value has a stack map with whether it's an unsafe point. For the most part, unsafe-points don't have stack maps, so this is mostly fine, but call instructions can be both an unsafe-point *and* have a stack map. For example, none of the instructions in a nosplit function should be preemptible, but calls must still have stack maps in case the called function grows the stack or get preempted. Currently, the compiler can't distinguish this case, so calls in nosplit functions are marked as safe-points just because they have stack maps. This is particularly problematic if a nosplit function calls another nosplit function, since this can introduce a preemption point where there should be none. We realized this was a problem for split-stack prologues a while back, and CL 207349 changed the encoding of unsafe-points to use the register map index instead of the stack map index so we could record both a stack map and an unsafe-point at the same instruction. But this was never extended into the compiler. This CL fixes this problem in the compiler. We make LivenessIndex slightly more abstract by separating unsafe-point marks from stack and register map indexes. We map this to the PCDATA encoding later when producing Progs. This isn't enough to fix the whole problem for nosplit functions, because obj still adds prologues and marks those as preemptible, but it's a step in the right direction. I checked this CL by comparing maps before and after this change in the runtime and net/http. In net/http, unsafe-points match exactly; at anything that isn't an unsafe-point, both the stack and register maps are unchanged by this CL. In the runtime, at every point that was a safe-point before this change, the stack maps agree (and mostly the runtime doesn't have register maps at all now). In both, all CALLs (except write barrier calls) have stack maps. For #36365. Change-Id: I066628938b02e78be5c81a6614295bcf7cc566c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/230541 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
These are necessarily deeply non-preemptible, so there's no point in emitting stack maps for them. We already mark them as unsafe points, so this only affects the runtime, since user code does not emit stack maps at unsafe points. SSAGenState.PrepareCall also excludes them when it's sanity checking call stack maps. Right now this only drops a handful of unnecessary stack maps from the runtime, but we're about to start emitting stack maps only at calls for user code, too. At that point, this will matter much more. For #36365. Change-Id: Ib3abfedfddc8e724d933a064fa4d573500627990 Reviewed-on: https://go-review.googlesource.com/c/go/+/230542 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
We're about to switch to having significantly fewer maps in the liveness map, so switch from a dense representation to a sparse representation. Passes toolstash-check. For #36365. Change-Id: Icb17bd6ace17667a280bc5fba4039cae3020a8d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/230543 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently, we emit stack maps and register maps at almost every instruction. This was originally intended to support non-cooperative preemption, but was only ever used for debug call injection. Now debug call injection also uses conservative frame scanning. As a result, stack maps are only needed at call sites and register maps aren't needed at all except that we happen to also encode unsafe-point information in the register map PCDATA stream. This CL reduces stack maps to only appear at calls, and replace full register maps with just safe/unsafe-point information. This is all protected by the go115ReduceLiveness feature flag, which is defined in both runtime and cmd/compile. This CL significantly reduces binary sizes and also speeds up compiles and links: name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72% name old pcln-bytes new pcln-bytes delta BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08% name old time/op new time/op delta Template 178ms ± 7% 172ms ±14% -3.59% (p=0.005 n=19+19) Unicode 71.0ms ±12% 69.8ms ±10% ~ (p=0.126 n=18+18) GoTypes 655ms ± 8% 615ms ± 8% -6.11% (p=0.000 n=19+19) Compiler 3.27s ± 6% 3.15s ± 7% -3.69% (p=0.001 n=20+20) SSA 7.10s ± 5% 6.85s ± 8% -3.53% (p=0.001 n=19+20) Flate 124ms ±15% 116ms ±22% -6.57% (p=0.024 n=18+19) GoParser 156ms ±26% 147ms ±34% ~ (p=0.070 n=19+19) Reflect 406ms ± 9% 387ms ±21% -4.69% (p=0.028 n=19+20) Tar 163ms ±15% 162ms ±27% ~ (p=0.370 n=19+19) XML 223ms ±13% 218ms ±14% ~ (p=0.157 n=20+20) LinkCompiler 503ms ±21% 484ms ±23% ~ (p=0.072 n=20+20) ExternalLinkCompiler 1.27s ± 7% 1.22s ± 8% -3.85% (p=0.005 n=20+19) LinkWithoutDebugCompiler 294ms ±17% 273ms ±11% -7.16% (p=0.001 n=19+18) (https://perf.golang.org/search?q=upload:20200428.8) The binary size improvement is even slightly better when you include the CLs leading up to this. Relative to the parent of "cmd/compile: mark PanicBounds/Extend as calls": name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18% name old pcln-bytes new pcln-bytes delta BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92% (https://perf.golang.org/search?q=upload:20200428.9) For #36365. Change-Id: I69448e714f2a44430067ca97f6b78e08c0abed27 Reviewed-on: https://go-review.googlesource.com/c/go/+/230544 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Is this still a 1.15 issue? |
Moving to 1.16 for the remaining tasks. |
Currently, newproc1 allocates, initializes, and schedules a new goroutine. We're about to change debug call injection in a way that will need to create a new goroutine without immediately scheduling it. To prepare for that, make scheduling the responsibility of newproc1's caller. Currently, there's exactly one caller (newproc), so this simply shifts that responsibility. For golang#36365. Change-Id: Idacd06b63e738982e840fe995d891bfd377ce23b Reviewed-on: https://go-review.googlesource.com/c/go/+/229298 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, when a debugger injects a call, that call happens on the goroutine where the debugger injected it. However, this requires significant runtime complexity that we're about to remove. To prepare for this, this CL switches to a different approach that leaves the interrupted goroutine parked and runs the debug call on a new goroutine. When the debug call returns, it resumes the original goroutine. This should be essentially transparent to debuggers. It follows the exact same call injection protocol and ensures the whole protocol executes indivisibly on a single OS thread. The only difference is that the current G and stack now change part way through the protocol. For golang#36365. Change-Id: I68463bfd73cbee06cfc49999606410a59dd8f653 Reviewed-on: https://go-review.googlesource.com/c/go/+/229299 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
A debugger can inject a call at almost any PC, which causes significant complications with stack scanning and growth. Currently, the runtime solves this using precise stack maps and register maps at nearly all PCs, but these extra maps require roughly 5% of the binary. These extra maps were originally considered worth this space because they were intended to be used for non-cooperative preemption, but are now used only for debug call injection. This CL switches from using precise maps to instead using conservative frame scanning, much like how non-cooperative preemption works. When a call is injected, the runtime flushes all potential pointer registers to the stack, and then treats that frame as well as the interrupted frame conservatively. The limitation of conservative frame scanning is that we cannot grow the goroutine stack. That's doable because the previous CL switched to performing debug calls on a new goroutine, where they are free to grow the stack. With this CL, there are no remaining uses of precise register maps (though we still use the unsafe-point information that's encoded in the register map PCDATA stream), and stack maps are only used at call sites. For golang#36365. Change-Id: Ie217b6711f3741ccc437552d8ff88f961a73cee0 Reviewed-on: https://go-review.googlesource.com/c/go/+/229300 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
The compiler currently conflates whether a Value has a stack map with whether it's an unsafe point. For the most part, unsafe-points don't have stack maps, so this is mostly fine, but call instructions can be both an unsafe-point *and* have a stack map. For example, none of the instructions in a nosplit function should be preemptible, but calls must still have stack maps in case the called function grows the stack or get preempted. Currently, the compiler can't distinguish this case, so calls in nosplit functions are marked as safe-points just because they have stack maps. This is particularly problematic if a nosplit function calls another nosplit function, since this can introduce a preemption point where there should be none. We realized this was a problem for split-stack prologues a while back, and CL 207349 changed the encoding of unsafe-points to use the register map index instead of the stack map index so we could record both a stack map and an unsafe-point at the same instruction. But this was never extended into the compiler. This CL fixes this problem in the compiler. We make LivenessIndex slightly more abstract by separating unsafe-point marks from stack and register map indexes. We map this to the PCDATA encoding later when producing Progs. This isn't enough to fix the whole problem for nosplit functions, because obj still adds prologues and marks those as preemptible, but it's a step in the right direction. I checked this CL by comparing maps before and after this change in the runtime and net/http. In net/http, unsafe-points match exactly; at anything that isn't an unsafe-point, both the stack and register maps are unchanged by this CL. In the runtime, at every point that was a safe-point before this change, the stack maps agree (and mostly the runtime doesn't have register maps at all now). In both, all CALLs (except write barrier calls) have stack maps. For golang#36365. Change-Id: I066628938b02e78be5c81a6614295bcf7cc566c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/230541 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
These are necessarily deeply non-preemptible, so there's no point in emitting stack maps for them. We already mark them as unsafe points, so this only affects the runtime, since user code does not emit stack maps at unsafe points. SSAGenState.PrepareCall also excludes them when it's sanity checking call stack maps. Right now this only drops a handful of unnecessary stack maps from the runtime, but we're about to start emitting stack maps only at calls for user code, too. At that point, this will matter much more. For golang#36365. Change-Id: Ib3abfedfddc8e724d933a064fa4d573500627990 Reviewed-on: https://go-review.googlesource.com/c/go/+/230542 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
We're about to switch to having significantly fewer maps in the liveness map, so switch from a dense representation to a sparse representation. Passes toolstash-check. For golang#36365. Change-Id: Icb17bd6ace17667a280bc5fba4039cae3020a8d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/230543 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently, we emit stack maps and register maps at almost every instruction. This was originally intended to support non-cooperative preemption, but was only ever used for debug call injection. Now debug call injection also uses conservative frame scanning. As a result, stack maps are only needed at call sites and register maps aren't needed at all except that we happen to also encode unsafe-point information in the register map PCDATA stream. This CL reduces stack maps to only appear at calls, and replace full register maps with just safe/unsafe-point information. This is all protected by the go115ReduceLiveness feature flag, which is defined in both runtime and cmd/compile. This CL significantly reduces binary sizes and also speeds up compiles and links: name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72% name old pcln-bytes new pcln-bytes delta BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08% name old time/op new time/op delta Template 178ms ± 7% 172ms ±14% -3.59% (p=0.005 n=19+19) Unicode 71.0ms ±12% 69.8ms ±10% ~ (p=0.126 n=18+18) GoTypes 655ms ± 8% 615ms ± 8% -6.11% (p=0.000 n=19+19) Compiler 3.27s ± 6% 3.15s ± 7% -3.69% (p=0.001 n=20+20) SSA 7.10s ± 5% 6.85s ± 8% -3.53% (p=0.001 n=19+20) Flate 124ms ±15% 116ms ±22% -6.57% (p=0.024 n=18+19) GoParser 156ms ±26% 147ms ±34% ~ (p=0.070 n=19+19) Reflect 406ms ± 9% 387ms ±21% -4.69% (p=0.028 n=19+20) Tar 163ms ±15% 162ms ±27% ~ (p=0.370 n=19+19) XML 223ms ±13% 218ms ±14% ~ (p=0.157 n=20+20) LinkCompiler 503ms ±21% 484ms ±23% ~ (p=0.072 n=20+20) ExternalLinkCompiler 1.27s ± 7% 1.22s ± 8% -3.85% (p=0.005 n=20+19) LinkWithoutDebugCompiler 294ms ±17% 273ms ±11% -7.16% (p=0.001 n=19+18) (https://perf.golang.org/search?q=upload:20200428.8) The binary size improvement is even slightly better when you include the CLs leading up to this. Relative to the parent of "cmd/compile: mark PanicBounds/Extend as calls": name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18% name old pcln-bytes new pcln-bytes delta BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92% (https://perf.golang.org/search?q=upload:20200428.9) For golang#36365. Change-Id: I69448e714f2a44430067ca97f6b78e08c0abed27 Reviewed-on: https://go-review.googlesource.com/c/go/+/230544 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Change https://golang.org/cl/265120 mentions this issue: |
Now that we have the G register saved, we can enable asynchronous preemption for pure Go programs on darwin/arm64. Updates #38485, #36365. Change-Id: Ic654fa4dce369efe289b38d59cf1a184b358fe9e Reviewed-on: https://go-review.googlesource.com/c/go/+/265120 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@cherrymui Anything to be done here for 1.17? Thanks. |
No, I don't think we did or plan to do anything on this for 1.17. |
@golang/runtime, does the checklist on this issue need to be updated? I see that https://go.dev/cl/366735 and https://go.dev/cl/366734 were merged in 2021 (for |
Go 1.14 introduces asynchronous preemption so that tight loops can be preempted (#10958). However, there are still several loose ends to tie up. This is a general bug to keep track of remaining work:
darwin/arm(no longer supported)The text was updated successfully, but these errors were encountered: