aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/internal/obj/arm64/obj7.go
AgeCommit message (Collapse)Author
2025-10-07Revert "cmd/compile: redo arm64 LR/FP save and restore"Keith Randall
This reverts commit 719dfcf8a8478d70360bf3c34c0e920be7b32994. Reason for revert: Causing crashes. Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb Reviewed-on: https://go-review.googlesource.com/c/go/+/709755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-10-06cmd/compile: redo arm64 LR/FP save and restoreKeith Randall
Instead of storing LR (the return address) at 0(SP) and the FP (parent's frame pointer) at -8(SP), store them at framesize-8(SP) and framesize-16(SP), respectively. We push and pop data onto the stack such that we're never accessing anything below SP. The prolog/epilog lengths are unchanged (3 insns for a typical prolog, 2 for a typical epilog). We use 8 bytes more per frame. Typical prologue: STP.W (FP, LR), -16(SP) MOVD SP, FP SUB $C, SP Typical epilogue: ADD $C, SP LDP.P 16(SP), (FP, LR) RET The previous word where we stored LR, at 0(SP), is now unused. We could repurpose that slot for storing a local variable. The new prolog and epilog instructions are recognized by libunwind, so pc-sampling tools like perf should now be accurate. (TODO: except maybe after the first RET instruction? Have to look into that.) Update #73753 (fixes, for arm64) Update #57302 (Quim thinks this will help on that issue) Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8 Reviewed-on: https://go-review.googlesource.com/c/go/+/674615 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-08-15runtime: remove duff support for arm64Keith Randall
Change-Id: Ib290079a77a746a8512cd4638310b24164f6a930 Reviewed-on: https://go-review.googlesource.com/c/go/+/679456 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-24cmd/internal/obj: rip out argp adjustment for wrapper framesKeith Randall
The previous CL made this adjustment unnecessary. The argp field is no longer used by the runtime. Change-Id: I3491eeef4103c6653ec345d604c0acd290af9e8f Reviewed-on: https://go-review.googlesource.com/c/go/+/685356 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-21cmd/compile: for arm64 epilog, do SP increment with a single instructionKeith Randall
That way, the frame is atomically popped. Previously, for big frames the SP was unwound in two steps (because arm64 can only add constants up to 1<<12 in a single instruction). Fixes #73259 Change-Id: I382c249194ad7bc9fc19607c27487c58d90d49e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/689235 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-05Revert "cmd/compile: allow all of the preamble to be preemptible"Keith Randall
This reverts commits 3f3782feed6e0726ddb08afd32dad7d94fbb38c6 (CL 648518) b386b628521780c048af14a148f373c84e687b26 (CL 668475) Fixes #73542 Change-Id: I218851c5c0b62700281feb0b3f82b6b9b97b910d Reviewed-on: https://go-review.googlesource.com/c/go/+/670055 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-25cmd/compile: allow all of the preamble to be preemptibleKeith Randall
We currently make some parts of the preamble unpreemptible because it confuses morestack. See comments in the code. Instead, have morestack handle those weird cases so we can remove unpreemptible marks from most places. This CL makes user functions preemptible everywhere if they have no write barriers (at least, on x86). In cmd/go the fraction of functions that need preemptible markings drops from 82% to 36%. Makes the cmd/go binary 0.3% smaller. Update #35470 Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb Reviewed-on: https://go-review.googlesource.com/c/go/+/648518 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-10-31cmd/internal/obj/arm64: make sure prologue and epilogue are pattern matched ↵Hao Liu
for small frames CL 379075 implemented function prologue/epilogue with STP/LDP. To fix issue #53374, CL 412474 reverted the prologue STP change for small frames, and the LDP in epilogue was kept. The current instructions are: prologue: MOVD.W R30, -offset(RSP) MOVD R29, -8(RSP) epilogue: LDP -8(RSP), (R29, R30) ADD $offset, RSP, RSP It seems a bit strange, as: 1) The prolog and epilogue are not in the same pattern (either STR-LDR, or STP-LDP). 2) Go Internal ABI defines that R30 is saved at 0(RSP) and R29 is saved at -8(RSP), so we can not use a single STP.W/LDP.P to save/restore LR&FP and adjust SP. Changing the ABI causes too much complexity, and the benefit is not that big. This patch reverts the small frames' epilogue change in CL 379075. It converts LDP in the epilogue to LDR-LDR. Another solution is to re-apply the STP change in prologue, which requires to fix #53609. This seems the easier and safer solution in the mean time. The new instructions are: prologue: MOVD.W R30, -offset(RSP) MOVD R29, -8(RSP) epilogue: MOVD -8(RSP), R29 MOVD.P offset(RSP), R30 The current pattern may cause performance issues in Store-Forwarding on micro-architectures like AmpereOne. Assuming a function call in the middle of such code is short enough that the stores are still around, then the LDP executes and it may wait longer to get the results from separated stores in Store Buffers other than single STP. Store-Forwarding aims to improve the efficiency of the processor by allowing data to be forwarded directly from a store operation to a subsequent load operation when certain conditions are met. See the paper: "Memory Barriers: a Hardware View for Software Hackers" (chapter 3.2: Store Forwarding). The performance of following ARM64 Linux servers were tested: 1) AmpereOne (ARM v8.6+) from Ampere Computing. 2) Ampere Altra (ARM Neoverse N1) from Ampere Computing. 3) Graviton2 (ARM Neoverse N1) from AWS. The effect of this change depends the hardware implementation of store-forwarding. It can obviously improve AmpereOne, especially for small functions that are frequently called and returned quickly. E.g., JSON Marshal/Unmarshal benchmarks on AmpereOne: goos: linux goarch: arm64 pkg: encoding/json │ ampere-one.base │ ampere-one.new │ │ sec/op │ sec/op vs base │ CodeMarshal-8 882.1µ ± 1% 779.6µ ± 1% -11.62% (p=0.000 n=10) CodeMarshalError-8 961.5µ ± 0% 855.7µ ± 1% -11.01% (p=0.000 n=10) MarshalBytes/32-8 207.6n ± 1% 187.8n ± 0% -9.52% (p=0.000 n=10) MarshalBytes/256-8 501.0n ± 1% 482.6n ± 1% -3.68% (p=0.000 n=10) MarshalBytes/4096-8 5.336µ ± 1% 5.074µ ± 1% -4.92% (p=0.000 n=10) MarshalBytesError/32-8 242.3µ ± 2% 205.7µ ± 3% -15.08% (p=0.000 n=10) MarshalBytesError/256-8 242.4µ ± 1% 205.2µ ± 2% -15.35% (p=0.000 n=10) MarshalBytesError/4096-8 247.9µ ± 0% 210.1µ ± 1% -15.24% (p=0.000 n=10) MarshalMap-8 150.8n ± 1% 145.7n ± 0% -3.35% (p=0.000 n=10) EncodeMarshaler-8 50.30n ± 26% 54.48n ± 6% ~ (p=0.739 n=10) CodeUnmarshal-8 4.796m ± 2% 4.055m ± 1% -15.45% (p=0.000 n=10) CodeUnmarshalReuse-8 4.260m ± 1% 3.496m ± 1% -17.94% (p=0.000 n=10) UnmarshalString-8 73.89n ± 1% 65.83n ± 1% -10.91% (p=0.000 n=10) UnmarshalFloat64-8 60.63n ± 1% 58.66n ± 25% ~ (p=0.143 n=10) UnmarshalInt64-8 55.62n ± 1% 53.25n ± 22% ~ (p=0.468 n=10) UnmarshalMap-8 255.3n ± 1% 230.3n ± 1% -9.77% (p=0.000 n=10) UnmarshalNumber-8 467.2n ± 1% 367.0n ± 0% -21.43% (p=0.000 n=10) geomean 6.224µ 5.605µ -9.94% Other ARM64 micro-architectures may be not affected so much by such issue. E.g., benchmarks on Ampere Altra and Graviton2 show slight improvements: │ altra.base │ altra.new │ │ sec/op │ sec/op vs base │ CodeMarshal-8 980.1µ ± 1% 977.3µ ± 1% ~ (p=0.912 n=10) CodeMarshalError-8 1.109m ± 3% 1.096m ± 5% ~ (p=0.971 n=10) MarshalBytes/32-8 246.8n ± 1% 245.4n ± 0% -0.55% (p=0.002 n=10) MarshalBytes/256-8 590.9n ± 1% 606.6n ± 1% +2.67% (p=0.000 n=10) MarshalBytes/4096-8 6.351µ ± 1% 6.376µ ± 1% ~ (p=0.183 n=10) MarshalBytesError/32-8 245.3µ ± 2% 246.1µ ± 2% ~ (p=0.684 n=10) MarshalBytesError/256-8 245.5µ ± 1% 248.7µ ± 2% ~ (p=0.218 n=10) MarshalBytesError/4096-8 254.2µ ± 1% 254.9µ ± 1% ~ (p=0.481 n=10) MarshalMap-8 152.7n ± 2% 151.5n ± 3% ~ (p=0.782 n=10) EncodeMarshaler-8 45.95n ± 7% 42.88n ± 5% -6.70% (p=0.014 n=10) CodeUnmarshal-8 5.121m ± 4% 5.125m ± 3% ~ (p=0.579 n=10) CodeUnmarshalReuse-8 4.616m ± 3% 4.634m ± 2% ~ (p=0.529 n=10) UnmarshalString-8 72.12n ± 2% 72.20n ± 2% ~ (p=0.912 n=10) UnmarshalFloat64-8 64.44n ± 5% 63.20n ± 4% ~ (p=0.393 n=10) UnmarshalInt64-8 61.49n ± 2% 58.14n ± 4% -5.45% (p=0.002 n=10) UnmarshalMap-8 263.6n ± 2% 266.2n ± 1% ~ (p=0.196 n=10) UnmarshalNumber-8 464.7n ± 1% 464.0n ± 0% ~ (p=0.566 n=10) geomean 6.617µ 6.575µ -0.64% │ graviton2.base │ graviton2.new │ │ sec/op │ sec/op vs base │ CodeMarshal-8 1.122m ± 0% 1.118m ± 1% ~ (p=0.052 n=10) CodeMarshalError-8 1.216m ± 1% 1.214m ± 0% ~ (p=0.631 n=10) MarshalBytes/32-8 289.9n ± 0% 280.8n ± 0% -3.17% (p=0.000 n=10) MarshalBytes/256-8 675.9n ± 0% 664.7n ± 0% -1.66% (p=0.000 n=10) MarshalBytes/4096-8 6.884µ ± 0% 6.885µ ± 0% ~ (p=0.565 n=10) MarshalBytesError/32-8 293.1µ ± 2% 288.9µ ± 2% ~ (p=0.123 n=10) MarshalBytesError/256-8 296.0µ ± 3% 289.0µ ± 1% -2.36% (p=0.019 n=10) MarshalBytesError/4096-8 300.4µ ± 1% 295.6µ ± 0% -1.60% (p=0.000 n=10) MarshalMap-8 168.8n ± 1% 168.8n ± 1% ~ (p=1.000 n=10) EncodeMarshaler-8 53.77n ± 8% 50.05n ± 12% ~ (p=0.579 n=10) CodeUnmarshal-8 5.875m ± 2% 5.882m ± 1% ~ (p=0.796 n=10) CodeUnmarshalReuse-8 5.383m ± 1% 5.366m ± 0% ~ (p=0.631 n=10) UnmarshalString-8 74.59n ± 1% 73.99n ± 0% -0.80% (p=0.001 n=10) UnmarshalFloat64-8 68.52n ± 7% 64.19n ± 18% ~ (p=0.868 n=10) UnmarshalInt64-8 65.32n ± 13% 62.24n ± 8% ~ (p=0.138 n=10) UnmarshalMap-8 290.1n ± 0% 291.3n ± 0% +0.43% (p=0.010 n=10) UnmarshalNumber-8 514.4n ± 0% 499.4n ± 0% -2.93% (p=0.000 n=10) geomean 7.459µ 7.317µ -1.91% Change-Id: If27386fc5f514b76bdaf2012c2ce86cc65f7ca5b Reviewed-on: https://go-review.googlesource.com/c/go/+/621775 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-06-25cmd/internal/obj/arm64: fix return with registerJoel Sing
ARM64 allows for a register to be specified with a return instruction. While the assembler parsing and encoding currently supports this, the preprocess function uses LR unconditionally. Correct this such that if a register is specified, the register is used. Change-Id: I708f6c7e910d141559b60d2d5ee76ae2e1dc3a0e Reviewed-on: https://go-review.googlesource.com/c/go/+/592796 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-08cmd/internal/obj/arm64: fix frame pointer restore in epilogueKeith Randall
For leaf but nonzero-frame functions. Currently we're not restoring it properly. We also need to restore it before popping the stack frame, so that the frame won't get clobbered by a signal handler in the meantime. Fixes #63830 Needs a test, but I'm not at all sure how we would actually do that. Leaving for inspiration. Change-Id: I273a25f2a838f05a959c810145cccc5428eaf164 Reviewed-on: https://go-review.googlesource.com/c/go/+/538635 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Eric Fang <eric.fang@arm.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2023-10-18cmd/internal/obj/arm64: replace the migrated url addresscui fliter
Change-Id: I36a0f0989d37bef45ea8778da799b56a7e9a0c30 Reviewed-on: https://go-review.googlesource.com/c/go/+/529515 Run-TryBot: shuang cui <imcusg@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2023-09-08cmd/internal/obj: mark unspill code in prologue preemptiblezhouguangyuan
The UnspillReg code should always be preemptible because all the arg registers will be saved by runtime.asyncpreempt. Change-Id: Ie36b5d0cdd1275efcb95661354d83be2e1b00a86 Reviewed-on: https://go-review.googlesource.com/c/go/+/526235 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-08-25cmd/internal/obj/arm64: load large constants into vector registers from rodataJoel Sing
Load large constants into vector registers from rodata, instead of placing them in the literal pool. This treats VMOVQ/VMOVD/VMOVS the same as FMOVD/FMOVS and makes use of the existing mechanism for storing values in rodata. Two additional instructions are required for a load, however these instructions are used infrequently and already have a high latency. Updates #59615 Change-Id: I54226730267689963d73321e548733ae2d66740e Reviewed-on: https://go-review.googlesource.com/c/go/+/515617 Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-04-21internal/abi, runtime, cmd: merge StackSmall, StackBig consts into internal/abiAustin Clements
For #59670. Change-Id: I91448363be2fc678964ce119d85cd5fae34a14da Reviewed-on: https://go-review.googlesource.com/c/go/+/486975 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Auto-Submit: Austin Clements <austin@google.com>
2023-04-21internal/abi, runtime, cmd: merge funcFlag_* consts into internal/abiAustin Clements
For #59670. Change-Id: Ie784ba4dd2701e4f455e1abde4a6bfebee4b1387 Reviewed-on: https://go-review.googlesource.com/c/go/+/485496 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Auto-Submit: Austin Clements <austin@google.com>
2023-04-20Revert "internal/abi, runtime, cmd: merge StackSmall, StackBig consts into ↵Austin Clements
internal/abi" This reverts commit CL 486379. Submitted out of order and breaks bootstrap. Change-Id: Ie20a61cc56efc79a365841293ca4e7352b02d86b Reviewed-on: https://go-review.googlesource.com/c/go/+/486917 TryBot-Bypass: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>
2023-04-20internal/abi, runtime, cmd: merge StackSmall, StackBig consts into internal/abiAustin Clements
For #59670. Change-Id: I04a17079b351b9b4999ca252825373c17afb8a88 Reviewed-on: https://go-review.googlesource.com/c/go/+/486379 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-05cmd/compile: add late lower pass for last rules to runeric fang
Usually optimization rules have corresponding priorities, some need to be run first, some run next, and some run last, which produces the best code. But currently our optimization rules have no priority, this CL adds a late lower pass that runs those rules that need to be run at last, such as split unreasonable constant folding. This pass can be seen as the second round of the lower pass. For example: func foo(a, b uint64) uint64 { d := a+0x1234568 d1 := b+0x1234568 return d&d1 } The code generated by the master branch: 0x0004 00004 ADD $19088744, R0, R2 // movz+movk+add 0x0010 00016 ADD $19088744, R1, R1 // movz+movk+add 0x001c 00028 AND R1, R2, R0 This is because the current constant folding optimization rules do not take into account the range of constants, causing the constant to be loaded repeatedly. This CL splits these unreasonable constants folding in the late lower pass. With this CL the generated code: 0x0004 00004 MOVD $19088744, R2 // movz+movk 0x000c 00012 ADD R0, R2, R3 0x0010 00016 ADD R1, R2, R1 0x0014 00020 AND R1, R3, R0 This CL also adds constant folding optimization for ADDS instruction. In addition, in order not to introduce the codegen regression, an optimization rule is added to change the addition of a negative number into a subtraction of a positive number. go1 benchmarks: name old time/op new time/op delta BinaryTree17-8 1.22s ± 1% 1.24s ± 0% +1.56% (p=0.008 n=5+5) Fannkuch11-8 1.54s ± 0% 1.53s ± 0% -0.69% (p=0.016 n=4+5) FmtFprintfEmpty-8 14.1ns ± 0% 14.1ns ± 0% ~ (p=0.079 n=4+5) FmtFprintfString-8 26.0ns ± 0% 26.1ns ± 0% +0.23% (p=0.008 n=5+5) FmtFprintfInt-8 32.3ns ± 0% 32.9ns ± 1% +1.72% (p=0.008 n=5+5) FmtFprintfIntInt-8 54.5ns ± 0% 55.5ns ± 0% +1.83% (p=0.008 n=5+5) FmtFprintfPrefixedInt-8 61.5ns ± 0% 62.0ns ± 0% +0.93% (p=0.008 n=5+5) FmtFprintfFloat-8 72.0ns ± 0% 73.6ns ± 0% +2.24% (p=0.008 n=5+5) FmtManyArgs-8 221ns ± 0% 224ns ± 0% +1.22% (p=0.008 n=5+5) GobDecode-8 1.91ms ± 0% 1.93ms ± 0% +0.98% (p=0.008 n=5+5) GobEncode-8 1.40ms ± 1% 1.39ms ± 0% -0.79% (p=0.032 n=5+5) Gzip-8 115ms ± 0% 117ms ± 1% +1.17% (p=0.008 n=5+5) Gunzip-8 19.4ms ± 1% 19.3ms ± 0% -0.71% (p=0.016 n=5+4) HTTPClientServer-8 27.0µs ± 0% 27.3µs ± 0% +0.80% (p=0.008 n=5+5) JSONEncode-8 3.36ms ± 1% 3.33ms ± 0% ~ (p=0.056 n=5+5) JSONDecode-8 17.5ms ± 2% 17.8ms ± 0% +1.71% (p=0.016 n=5+4) Mandelbrot200-8 2.29ms ± 0% 2.29ms ± 0% ~ (p=0.151 n=5+5) GoParse-8 1.35ms ± 1% 1.36ms ± 1% ~ (p=0.056 n=5+5) RegexpMatchEasy0_32-8 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.444 n=4+5) RegexpMatchEasy0_1K-8 131ns ±11% 118ns ± 6% ~ (p=0.056 n=5+5) RegexpMatchEasy1_32-8 22.9ns ± 0% 22.9ns ± 0% ~ (p=0.905 n=4+5) RegexpMatchEasy1_1K-8 126ns ± 0% 127ns ± 0% ~ (p=0.063 n=4+5) RegexpMatchMedium_32-8 486ns ± 5% 483ns ± 0% ~ (p=0.381 n=5+4) RegexpMatchMedium_1K-8 15.4µs ± 1% 15.5µs ± 0% ~ (p=0.151 n=5+5) RegexpMatchHard_32-8 687ns ± 0% 686ns ± 0% ~ (p=0.103 n=5+5) RegexpMatchHard_1K-8 20.7µs ± 0% 20.7µs ± 1% ~ (p=0.151 n=5+5) Revcomp-8 175ms ± 2% 176ms ± 3% ~ (p=1.000 n=5+5) Template-8 20.4ms ± 6% 20.1ms ± 2% ~ (p=0.151 n=5+5) TimeParse-8 112ns ± 0% 113ns ± 0% +0.97% (p=0.016 n=5+4) TimeFormat-8 156ns ± 0% 145ns ± 0% -7.14% (p=0.029 n=4+4) Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da Reviewed-on: https://go-review.googlesource.com/c/go/+/425134 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-09-20cmd/compile: Add some CMP and CMN optimization rules on arm64eric fang
This CL adds some optimizaion rules: 1, Converts CMP to CMN, or vice versa, when comparing with a negative number. 2, For equal and not equal comparisons, CMP can be converted to CMN in some cases. In theory we could do the same optimization for LT, LE, GT and GE, but need to account for overflow, this CL doesn't handle them. There are no noticeable performance changes. Change-Id: Ia49266c019ab7908ebc9510c2f02e121b1607869 Reviewed-on: https://go-review.googlesource.com/c/go/+/429795 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-08-31cmd/internal/obj/arm64: allow transition from $0 to ZR for MSReric fang
Previously the first operand of MSR could be $0, which would be converted to the ZR register. This is prohibited by CL 404316, this CL restores this instruction format. Change-Id: I5b5be59e76aa58423a0fb96942d1b2a9de62e311 Reviewed-on: https://go-review.googlesource.com/c/go/+/426198 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-08-29cmd/compile/obj/arm64: fix encoding error of FMOVD/FMOVS $0|ZReric fang
Previously the first operand of FMOVD and FMOVS could be $0, which would be converted to the ZR register. This is prohibited by CL 404316, also it broken the encoding of "FMOVD/FMOVS ZR, Rn", this CL restores this instruction format and fixes the encoding issue. Fixes #54655. Fixes #54729. Change-Id: I9c42cd41296bed7ffd601609bd8ecaa27d11e659 Reviewed-on: https://go-review.googlesource.com/c/go/+/425188 Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-23cmd/internal/obj/arm64: remove the transition from $0 to ZReric fang
Previously we convert $0 to the ZR register for some reasons, which causes two problems: 1. Confusion, the special case of the ZR register needs to be considered when dealing with constants. For encoding, some places we encode ZR, and some places we encode $0, although we have converted $0 to ZR. 2. Unexpected instruction format. All instructions that support ZR register operands can be replaced by $0. This patch removes this conversion. Note that this patch may cause previously unintendedly supported instruction formats to no longer be supported. Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c Reviewed-on: https://go-review.googlesource.com/c/go/+/404316 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-06-29cmd/internal/obj/arm64: save LR and SP in one instruction for small framesCherry Mui
When we create a thread with signals blocked. But glibc's pthread_sigmask doesn't really allow us to block SIGSETXID. So we may get a signal early on before the signal stack is set. If we get a signal on the current stack, it will clobber anything below the SP. This CL makes it to save LR and decrement SP in a single MOVD.W instruction for small frames, so we don't write below the SP. We used to use a single MOVD.W instruction before CL 379075. CL 379075 changed to use an STP instruction to save the LR and FP, then decrementing the SP. This CL changes it back, just this part (epilogues and large frame prologues are unchanged). For small frames, it is the same number of instructions either way. This decreases the size of a "small" frame from 0x1f0 to 0xf0. For frame sizes in between, it could benefit from using an STP instruction instead of using the prologue for the "large" frame case. We don't bother it for now as this is a stop-gap solution anyway. This only addresses the issue with small frames. Luckily, all functions from thread entry to setting up the signal stack have samll frames. Other possible ideas: - Expand the unwind info metadata, separate SP delta and the location of the return address, so we can express "SP is decremented but the return address is in the LR register". Then we can always create the frame first then write the LR, without writing anything below the SP (except the frame pointer at SP-8, which is minor because it doesn't really affect program execution). - Set up the signal stack immediately in mstart in assembly. For Go 1.19 we do this simple fix. We plan to do the metadata fix in Go 1.20 ( #53609 ). Other LR architectures are addressed in CL 413428. Fix #53374. Change-Id: I9d6582ab14ccb06ac61ad43852943d9555e22ae5 Reviewed-on: https://go-review.googlesource.com/c/go/+/412474 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Eric Fang <eric.fang@arm.com>
2022-06-28cmd/internal/obj/arm64: fix BITCON constant printing erroreric fang
For some 32-bit instructions whose first operand is a constant, we copy the lower 32 bits of the constant into the upper 32 bits in progedit, which leads to the wrong value being printed in -S output. The purpose of this is that we don't need to distinguish between 32-bit and 64-bit constants when checking C_BITCON, this CL puts the modified value in a temporary variable, so that the constant operand of the instruction will not be modified. Fixes #53551 Change-Id: I40ee9223b4187bff1c0a1bab7eb508fcb30325f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/414374 Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2022-03-15cmd/internal/obj/arm64: refactor the handling of shifted RSPeric fang
Some arithmetic operation instructions such as ADD and SUB support two formats of left shift (<<) operation, namely shifted register format and extended register format. And the encoding, supported registers and shifted amount are both different. The assembly parser doesn't distinguish them and parses them into TYPE_SHIFT type, because the parser can't tell them apart and in most cases extended left-shift can be replaced by shifted left-shift. The only exception is when the second source register or the destination register is RSP. This CL converts this case into the extended format in the preprocess stage, which helps to simplify some of the logic of the new assembler implementation and also makes this situation look more reasonable. Change-Id: I2cd7d2d663b38a7ba77a9fef1092708b8cb9bc3d Reviewed-on: https://go-review.googlesource.com/c/go/+/311709 Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-03-09cmd/internal/obj/arm64: optimize function prologue/epilogue with STP/LDPeric fang
In function prologue and epilogue, we save and restore FP and LR registers, and adjust RSP. The current instruction sequence is as follow. For frame size <= 240B, prologue: MOVD.W R30, -offset(RSP) MOVD R29, -8(RSP) epilogue: MOVD -8(RSP), R29 MOVD.P offset(RSP), R30 For frame size > 240B, prologue: SUB $offset, RSP, R27 MOVD R30, (R27) MOVD R27, RSP MOVD R29, -8(RSP) epilogue: MOVD -8(RSP), R29 MOVD (RSP), R30 ADD $offset, RSP Each sequence uses two load or store instructions, actually we can load or store two registers with one LDP or STP instruction. This CL changes the sequences as follow. For frame size <= 496B, prologue: STP (R29, R30), -(offset+8)(RSP) SUB $offset, RSP, RSP epilogue: LDP -8(RSP), (R29, R30) ADD $offset, RSP, RSP For frame size > 496B, prologue: SUB $offset, RSP, R20 STP (R29, R30), -8(R20) MOVD R20, RSP epilogue: LDP -8(RSP), (R29, R30) ADD $offset, RSP, RSP Change-Id: Ia58af85fc81cce9b7c393dc38df43bffb203baad Reviewed-on: https://go-review.googlesource.com/c/go/+/379075 Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-03-08cmd/internal/obj/arm64: optimize stacksplit prologue for small stackeric fang
When framesize <= objabi.StackSmall, 128B, the stacksplit prologue is: MOVD 16(g), R16 MOVD SP, R17 CMP R16, R17 BLS morestack_label The second instruction is not necessary, we can compare R16 with SP directly, so the sequence becomes: MOVD 16(g), R16 CMP R16, SP BLS morestack_label This CL removes this instruction. Change-Id: I0567ac52e9be124880957271951e1186da203612 Reviewed-on: https://go-review.googlesource.com/c/go/+/379076 Trust: Eric Fang <eric.fang@arm.com> Run-TryBot: Eric Fang <eric.fang@arm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-11-05cmd/{asm,compile,internal/obj}: add "maymorestack" supportAustin Clements
This adds a debugging hook for optionally calling a "maymorestack" function in the prologue of any function that might call morestack (whether it does at run time or not). The maymorestack function will let us improve lock checking and add debugging modes that stress function preemption and stack growth. Passes toolstash-check -all (except on js/wasm, where toolstash appears to be broken) Fixes #48297. Change-Id: I27197947482b329af75dafb9971fc0d3a52eaf31 Reviewed-on: https://go-review.googlesource.com/c/go/+/359795 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-07cmd/compile: add prefetch intrinsic supportRuslan Andreev
This CL provide new intrinsics to emit prefetch instructions for AMD64 and ARM64 platforms: Prefetch - prefetches data from memory address to cache; PrefetchStreamed - prefetches data from memory address, with a hint that this data is being streamed. This patch also provides prefetch calls pointed by RSC inside scanobject and greyobject of GC mark logic. Performance results provided by Michael: https://perf.golang.org/search?q=upload:20210901.9 Benchmark parameters: tree2 -heapsize=1000000000 -cpus=8 tree -n=18 parser peano Benchmarks AMD64 (Xeon - Cascade Lake): name old time/op new time/op delta Tree2-8 36.1ms ± 6% 33.4ms ± 5% -7.65% (p=0.000 n=9+9) Tree-8 326ms ± 1% 324ms ± 1% -0.44% (p=0.006 n=9+10) Parser-8 2.75s ± 1% 2.71s ± 1% -1.47% (p=0.008 n=5+5) Peano-8 63.1ms ± 1% 63.0ms ± 1% ~ (p=0.730 n=9+9) [Geo mean] 213ms 207ms -2.45% Benchmarks ARM64 (Kunpeng 920): name old time/op new time/op delta Tree2-8 50.3ms ± 8% 44.1ms ± 5% -12.24% (p=0.000 n=10+9) Tree-8 494ms ± 1% 493ms ± 1% ~ (p=0.684 n=10+10) Parser-8 3.99s ± 1% 3.93s ± 1% -1.37% (p=0.016 n=5+5) Peano-8 84.4ms ± 0% 84.1ms ± 1% ~ (p=0.068 n=8+10) [Geo mean] 302ms 291ms -3.67% Change-Id: I43e10bc2f9512dc49d7631dd8843a79036fa43d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/328289 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-08-18cmd/internal/obj/arm64: fix the encoding error when operating with ZReric fang
Some arm64 instructions accept ZR as its destination register, such as MOVD, AND, ADD etc. although it doesn't seem to make much sense, but we should make sure the encoding is correct. However there exists some encoding mistakes in the current assembler, they are: 1, 'MOVD $1, ZR' is incorrectly encoded as 'MOVD $1, ZR' + '0x00000000'. 2, 'AND $1, R2, ZR' is incorrectly encoded as 'MOVD $1, R27' + 'AND R27, R2, ZR' + '0x00000000'. 3, 'AND $1, ZR' is incorrectly encoded as 'AND $1, ZR, RSP'. Obviously the first two encoding errors can cause SIGILL, and the third one will rewrite RSP. At the same time, I found some weird encodings but they don't cause errors. 4, 'MOVD $0x0001000100010001, ZR' is encoded as 'MOVW $1, ZR' + 'MOVKW $(1<<16), ZR'. 5, 'AND $0x0001000100010001, R2, ZR' is encoded as 'MOVD $1, R27' + 'MOVK $(1<<16), R27' + 'MOVK $(1<<32), R27'. Some of these issues also apply to 32-bit versions of these instructions. These problems are not very complicated, and are basically caused by the improper adaptation of the class of the constant to the entry in the optab. But the relationship between these constant classes is a bit complicated, so I don't know how to deal with issue 4 and 5, because they won't cause errors, so this CL didn't deal with them. This CL fixed the first three issues. Issue 1: before: 'MOVD $1, ZR' => 'MOVD $1, ZR' + '0x00000000'. after: 'MOVD $1, ZR' => 'MOVD $1, ZR'. Issue 2: before: 'AND $1, R2, ZR' => 'MOVD $1, R27' + 'AND R27, R2, ZR' + '0x00000000'. after: 'AND $1, R2, ZR' => 'ORR $1, ZR, R27' + 'AND R27, R2, ZR'. Issue 3: before: 'AND $1, ZR' => 'AND $1, ZR, RSP'. after: 'AND $1, ZR' => 'ORR $1, ZR, R27' + 'AND R27, ZR, ZR'. Change-Id: I3c889079229f847b863ad56c88966be12d947202 Reviewed-on: https://go-review.googlesource.com/c/go/+/329750 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-06-02[dev.typeparams] runtime: mark assembly functions called directly from ↵Cherry Mui
compiler ABIInternal For functions such as gcWriteBarrier and panicIndexXXX, the compiler generates ABIInternal calls directly. And they must not use wrappers because it follows a special calling convention or the caller's PC is used. Mark them as ABIInternal. Note that even though they are marked as ABIInternal, they don't actually use the internal ABI, i.e. regabiargs is not honored for now. Now all.bash passes with GOEXPERIMENT=regabiwrappers (at least on macOS). Change-Id: I87e41964e6dc4efae03e8eb636ae9fa1d99285bb Reviewed-on: https://go-review.googlesource.com/c/go/+/323934 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-05-26[dev.typeparams] cmd/compile: add morestack arg spilling code on ARM64Cherry Mui
Spill arg registers before calling morestack, and reload after. Change-Id: I09404def321b8f935d5e8836a46ccae8256d0d55 Reviewed-on: https://go-review.googlesource.com/c/go/+/322853 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2021-05-26[dev.typeparams] cmd/internal/obj/arm64: use ABI-compatible registers in ↵Cherry Mui
function prologue Avoid using R1, R2, etc. in function prologue, which may carry live argument values. Change-Id: I80322b3f7e8fda7aaff622aaa99bc76d02e09727 Reviewed-on: https://go-review.googlesource.com/c/go/+/322852 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: David Chase <drchase@google.com>
2021-04-16internal/buildcfg: move build configuration out of cmd/internal/objabiRuss Cox
The go/build package needs access to this configuration, so move it into a new package available to the standard library. Change-Id: I868a94148b52350c76116451f4ad9191246adcff Reviewed-on: https://go-review.googlesource.com/c/go/+/310731 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-05cmd/internal/obj/arm64: simplify huge frame prologueAustin Clements
CL 307010 for arm64. Change-Id: I6c6e1bd6065df059e50c3632a9eb669b64fce899 Reviewed-on: https://go-review.googlesource.com/c/go/+/307050 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-22cmd/internal/obj/arm64: mark functions with small stacks NOSPLITeric fang
This change omits the stack check on arm64 when the size of a stack frame is less than obj.StackSmall. The effect is not very significant, because CL 92040 has set the leaf function with a framesize of 0 to NOFRAME, which makes the code prologue on arm64 much closer to other architectures. But it is not without effect, for example, it is effective for std library functions such as runtime.usleep, fmt.isSpace, etc. Since this CL is very simple, I think this optimization is worthwhile. compilecmp results on linux/arm64: name old time/op new time/op delta Template 284ms ± 1% 283ms ± 1% -0.29% (p=0.000 n=50+50) Unicode 125ms ± 2% 125ms ± 1% ~ (p=0.445 n=49+49) GoTypes 1.70s ± 1% 1.69s ± 1% -0.36% (p=0.000 n=50+50) Compiler 124ms ± 1% 124ms ± 1% -0.31% (p=0.003 n=48+48) SSA 12.7s ± 1% 12.7s ± 1% ~ (p=0.117 n=50+50) Flate 172ms ± 1% 171ms ± 1% -0.55% (p=0.000 n=50+50) GoParser 265ms ± 1% 264ms ± 1% -0.23% (p=0.000 n=47+48) Reflect 653ms ± 1% 646ms ± 1% -1.12% (p=0.000 n=48+50) Tar 246ms ± 1% 245ms ± 1% -0.41% (p=0.000 n=46+47) XML 328ms ± 1% 327ms ± 1% -0.18% (p=0.020 n=46+50) LinkCompiler 599ms ± 1% 598ms ± 1% ~ (p=0.237 n=50+49) ExternalLinkCompiler 1.87s ± 1% 1.87s ± 1% -0.18% (p=0.000 n=50+50) LinkWithoutDebugCompiler 365ms ± 1% 364ms ± 2% ~ (p=0.131 n=50+50) [Geo mean] 490ms 488ms -0.32% name old alloc/op new alloc/op delta Template 38.8MB ± 1% 38.8MB ± 1% +0.16% (p=0.013 n=47+49) Unicode 28.4MB ± 0% 28.4MB ± 0% ~ (p=0.512 n=46+44) GoTypes 169MB ± 1% 169MB ± 1% ~ (p=0.628 n=50+50) Compiler 23.2MB ± 1% 23.2MB ± 1% ~ (p=0.424 n=46+44) SSA 1.55GB ± 0% 1.55GB ± 0% ~ (p=0.603 n=48+50) Flate 23.7MB ± 1% 23.8MB ± 1% ~ (p=0.797 n=50+50) GoParser 35.3MB ± 1% 35.3MB ± 1% ~ (p=0.932 n=49+49) Reflect 85.0MB ± 0% 84.9MB ± 0% -0.05% (p=0.038 n=45+40) Tar 34.4MB ± 1% 34.5MB ± 1% ~ (p=0.288 n=50+50) XML 43.8MB ± 2% 43.9MB ± 2% ~ (p=0.798 n=46+49) LinkCompiler 136MB ± 0% 136MB ± 0% ~ (p=0.750 n=50+50) ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.852 n=50+50) LinkWithoutDebugCompiler 84.1MB ± 0% 84.1MB ± 0% ~ (p=0.890 n=50+50) [Geo mean] 70.4MB 70.4MB +0.01% file before after Δ % addr2line 4006004 4006012 +8 +0.000% asm 4936863 4936919 +56 +0.001% buildid 2594947 2594859 -88 -0.003% cgo 4399702 4399806 +104 +0.002% compile 22233139 22233107 -32 -0.000% cover 4443681 4443785 +104 +0.002% dist 3365902 3365806 -96 -0.003% doc 3776175 3776231 +56 +0.001% fix 3218624 3218552 -72 -0.002% nm 3923345 3923329 -16 -0.000% objdump 4295473 4295673 +200 +0.005% pack 2390561 2390497 -64 -0.003% pprof 12866419 12866275 -144 -0.001% test2json 2587113 2587129 +16 +0.001% trace 9609814 9609710 -104 -0.001% vet 6790272 6791048 +776 +0.011% total 106832751 106833455 +704 +0.001% Updates #13379 (for arm64) Change-Id: I07664ab0b978c66c0b18b8482222e9ba3772290d Reviewed-on: https://go-review.googlesource.com/c/go/+/302853 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-03-04cmd/internal/obj/asm64: add support for moving BITCON to RSPeric fang
Constant of BITCON type can be moved into RSP by MOVD or MOVW instructions directly, this CL enables this format of these two instructions. For 32-bit ADDWop instructions with constant, rewrite the high 32-bit to be a repetition of the low 32-bit, just as ANDWop instructions do, so that we can optimize ADDW $bitcon, Rn, Rt as: MOVW $bitcon, Rtmp ADDW Rtmp, Rn, Rt The original code is: MOVZ $bitcon_low, Rtmp MOVK $bitcon_high,Rtmp ADDW Rtmp, Rn, Rt Change-Id: I30e71972bcfd6470a8b6e6ffbacaee79d523805a Reviewed-on: https://go-review.googlesource.com/c/go/+/289649 Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-19runtime: enable framepointer on all arm64Russ Cox
Frame pointers were already enabled on linux, darwin, ios, but not freebsd, android, openbsd, netbsd. But the space was reserved on all platforms, leading to two different arm64 framepointer conditions in different parts of the code, one of which had no name (framepointer_enabled || GOARCH == "arm64", which might have been "framepointer_space_reserved"). So on the disabled systems, the stack layouts were still set up for frame pointers and the only difference was not actually maintaining the FP register in the generated code. Reduce complexity by just enabling the frame pointer completely on all the arm64 systems. This commit passes on freebsd, android, netbsd. I have not been able to try it on openbsd. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I83bd23369d24b76db4c6a648fa74f6917819a093 Reviewed-on: https://go-review.googlesource.com/c/go/+/288814 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-19cmd/asm, cmd/link, runtime: introduce FuncInfo flag bitsRuss Cox
The runtime traceback code has its own definition of which functions mark the top frame of a stack, separate from the TOPFRAME bits that exist in the assembly and are passed along in DWARF information. It's error-prone and redundant to have two different sources of truth. This CL provides the actual TOPFRAME bits to the runtime, so that the runtime can use those bits instead of reinventing its own category. This CL also adds a new bit, SPWRITE, which marks functions that write directly to SP (anything but adding and subtracting constants). Such functions must stop a traceback, because the traceback has no way to rederive the SP on entry. Again, the runtime has its own definition which is mostly correct, but also missing some functions. During ordinary goroutine context switches, such functions do not appear on the stack, so the incompleteness in the runtime usually doesn't matter. But profiling signals can arrive at any moment, and the runtime may crash during traceback if it attempts to unwind an SP-writing frame and gets out-of-sync with the actual stack. The runtime contains code to try to detect likely candidates but again it is incomplete. Deriving the SPWRITE bit automatically from the actual assembly code provides the complete truth, and passing it to the runtime lets the runtime use it. This CL is part of a stack adding windows/arm64 support (#36439), intended to land in the Go 1.17 cycle. This CL is, however, not windows/arm64-specific. It is cleanup meant to make the port (and future ports) easier. Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58 Reviewed-on: https://go-review.googlesource.com/c/go/+/288800 Trust: Russ Cox <rsc@golang.org> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2020-10-16cmd/internal/obj: move LSym.Func into LSym.ExtraRuss Cox
This creates space for a different kind of extension field in LSym without making the struct any larger. (There are many LSym, so we care about keeping the struct small.) Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/243943 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-06runtime: use sigaltstack on macOS/ARM64Cherry Zhang
Currently we don't use sigaltstack on darwin/arm64, as is not supported on iOS. However, it is supported on macOS. Use it. (iOS remains unchanged.) Change-Id: Icc154c5e2edf2dbdc8ca68741ad9157fc15a72ee Reviewed-on: https://go-review.googlesource.com/c/go/+/256917 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-08-31cmd/compile,cmd/asm: simplify recording of branch targets, take 2Keith Randall
We currently use two fields to store the targets of branches. Some phases use p.To.Val, some use p.Pcond. Rewrite so that every branch instruction uses p.To.Val. p.From.Val is also used in rare instances. Introduce a Pool link for use by arm/arm64, instead of repurposing Pcond. This is a cleanup CL in preparation for some stack frame CLs. Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405 Reviewed-on: https://go-review.googlesource.com/c/go/+/251437 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-28Revert "cmd/compile,cmd/asm: simplify recording of branch targets"Keith Randall
This reverts CL 243318. Reason for revert: Seems to be crashing some builders. Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756 Reviewed-on: https://go-review.googlesource.com/c/go/+/251169 Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-08-27cmd/compile,cmd/asm: simplify recording of branch targetsKeith Randall
We currently use two fields to store the targets of branches. Some phases use p.To.Val, some use p.Pcond. Rewrite so that every branch instruction uses p.To.Val. p.From.Val is also used in rare instances. Introduce a Pool link for use by arm/arm64, instead of repurposing Pcond. This is a cleanup CL in preparation for some stack frame CLs. Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8 Reviewed-on: https://go-review.googlesource.com/c/go/+/243318 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-27runtime: framepointers are no longer an experiment - hard code themKeith Randall
I think they are no longer experimental status. Might as well promote them to permanent. Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f Reviewed-on: https://go-review.googlesource.com/c/go/+/249857 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2020-08-20cmd/internal/obj: stop removing NOPs from instruction streamKeith Randall
This has already been done for s390x, ppc64. This CL is for all the other architectures. Fixes #40796 Change-Id: Idd1816e057df63022d47e99fa06617811d8c8489 Reviewed-on: https://go-review.googlesource.com/c/go/+/248684 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-13cmd/internal/obj: add stmt prologueEnd to DWARF for arm64David Chase
Change-Id: I7e9ec2835f1a7d9821dff3e868aebf07fece8137 Reviewed-on: https://go-review.googlesource.com/c/go/+/223297 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-27cmd/internal/obj: mark split-stack prologue nonpreemptibleCherry Zhang
When there are both a synchronous preemption request (by clobbering the stack guard) and an asynchronous one (by signal), the running goroutine may observe the synchronous request first in stack bounds check, and go to the path of calling morestack. If the preemption signal arrives at this point before the call to morestack, the goroutine will be asynchronously preempted, entering the scheduler. When it is resumed, the scheduler clears the preemption request, unclobbers the stack guard. But the resumed goroutine will still call morestack, as it is already on its way. morestack will, as there is no preemption request, double the stack unnecessarily. If this happens multiple times, the stack may grow too big, although only a small amount is actually used. To fix this, we mark the stack bounds check and the call to morestack async-nonpreemptible, starting after the memory instruction (mostly a load, on x86 CMP with memory). Not done for Wasm as it does not support async preemption. Fixes #35470. Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/207350 Reviewed-by: David Chase <drchase@google.com>
2019-11-12cmd/internal/obj/arm64: save LR after decrementing SP on darwinCherry Zhang
iOS does not support SA_ONSTACK. The signal handler runs on the G stack. Any writes below the SP may be clobbered by the signal handler (even without call injection). So we save LR after decrementing SP on iOS. Updates #35439. Change-Id: Ia6d7a0669e0bcf417b44c031d2e26675c1184165 Reviewed-on: https://go-review.googlesource.com/c/go/+/206418 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2019-11-08cmd/internal/obj/arm64: make function epilogue async-signal safeCherry Zhang
When the frame size is large, we generate MOVD.P 0xf0(SP), LR ADD $(framesize-0xf0), SP This is problematic: after the first instruction, we have a partial frame of size (framesize-0xf0). If we try to unwind the stack at this point, we'll try to read the LR from the stack at 0(SP) (the new SP) as the frame size is not 0. But this slot does not contain a valid LR. Fix this by not changing SP in two instructions. Instead, generate MOVD (SP), LR ADD $framesize, SP This affects not only async preemption but also profiling. So we change the generated instructions, instead of marking unsafe point. Change-Id: I4e78c62d50ffc4acff70ccfbfec16a5ccae17f24 Reviewed-on: https://go-review.googlesource.com/c/go/+/206057 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>