aboutsummaryrefslogtreecommitdiff
path: root/test
AgeCommit message (Collapse)Author
2025-06-26cmd/compile/internal/escape: evaluate any side effects when rewriting with ↵thepudds
literals CL 649035 and CL 649079 updated escape analysis to rewrite certain operands in OMAKE and OCONVIFACE nodes from non-constant expressions to basic literals that evaluate to the same value. However, when doing that rewriting, we need to evaluate any side effects prior to replacing the expression, which is what this CL now does. Issue #74379 reported a problem with OCONVIFACE nodes due to CL 649079. CL 649035 has essentially the same issue with OMAKE nodes. To illustrate that, we add a test for the OMAKE case in fixedbugs/issue74379b.go, which fails without this change. To avoid introducing an unnecessary temporary for OMAKE nodes, we also conditionalize the main work of CL 649035 on whether the OMAKE operand is already an OLITERAL. CL 649555 and CL 649078 were related changes that created read-only global storage for composite literals used in an interface conversion. This CL adds a test in fixedbugs/issue74379c.go to illustrate that they do not have the same problem. Updates #71359 Fixes #74379 Change-Id: I6645575ef34f1fe2b0241a22dc205875d66b7ada Reviewed-on: https://go-review.googlesource.com/c/go/+/684116 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-06-09cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)Paul Murphy
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add a modified version of isPPC64WordRotateMask which valids the mask is contiguous and fits inside a uint32. I don't this is possible when merging SRDconst, the first check should always reject such combines. But, be extra careful and do it there too. Fixes #73153 Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435 Reviewed-on: https://go-review.googlesource.com/c/go/+/679775 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-06-04Revert "cmd/compile: Enable inlining of tail calls"Cherry Mui
This reverts CL 650455 and CL 655816. Reason for revert: it causes #73747. Properly fixing it gets into trickiness with defer/recover, wrapper, and inlining. We're late in the Go 1.25 release cycle. Fixes #73747. Change-Id: Ifb343d522b18fec3fec73a7c886678032ac8e4df Reviewed-on: https://go-review.googlesource.com/c/go/+/678575 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-06-04test: add another regression test for issue 73309Cuong Manh Le
Fixed #73309 Change-Id: Id715b9c71c95c92143a7fdb5a66b24305346dd3b Reviewed-on: https://go-review.googlesource.com/c/go/+/678415 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-28cmd/compile: do nil check before calling duff functions, on arm64 and amd64Keith Randall
On these platforms, we set up a frame pointer record below the current stack pointer, so when we're in duffcopy or duffzero, we get a reasonable traceback. See #73753. But because this frame pointer record is below SP, it is vulnerable. Anything that adds a new stack frame to the stack might clobber it. Which actually happens in #73748 on amd64. I have not yet come across a repro on arm64, but might as well be safe here. The only real situation this could happen is when duffzero or duffcopy is passed a nil pointer. So we can just avoid the problem by doing the nil check outside duffzero/duffcopy. That way we never add a frame below duffzero/duffcopy. (Most other ways to get a new frame below the current one, like async preempt or debugger-generated calls, don't apply to duffzero/duffcopy because they are runtime functions; we're not allowed to preempt there.) Longer term, we should stop putting stuff below SP. #73753 will include that as part of its remit. But that's not for 1.25, so we'll do the simple thing for 1.25 for this issue. Fixes #73748 Change-Id: I913c49ee46dcaee8fb439415a4531f7b59d0f612 Reviewed-on: https://go-review.googlesource.com/c/go/+/676916 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-27cmd/compile/internal/walk: use original type for composite literals in addrTempthepudds
When creating a new *ir.Name or *ir.LinksymOffsetExpr to represent a composite literal stored in the read-only data section, we should use the original type of the expression that was found via ir.ReassignOracle.StaticValue. (This is needed because the StaticValue method can traverse through OCONVNOP operations to find its final result.) Otherwise, the compilation may succeed, but the linker might erroneously conclude that a type is not used and prune an itab when it should not, leading to a call at execution-time to runtime.unreachableMethod, which throws "fatal error: unreachable method called. linker bug?". The tests exercise both the case of a zero value struct literal that can be represented by the read-only runtime.zeroVal, which was the case of the simplified example from #73888, and also modifies that example to test the non zero value struct literal case. This CL makes two similar changes for those two cases. We can get either of the tests we are adding to fail independently if we only make a single corresponding change. Fixes #73888 Updates #71359 Change-Id: Ifd91f445cc168ab895cc27f7964a6557d5cc32e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/676517 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-22cmd/compile: do not shapify when reading reshaping exprCuong Manh Le
Fixes #71184 Change-Id: I22e7ae5203311e86a90502bfe155b0597007887d Reviewed-on: https://go-review.googlesource.com/c/go/+/641955 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2025-05-22cmd/compile: fix ICE with recursive alias type parameterCuong Manh Le
CL 585399 fixed an initialization loop during IR contruction that involving alias type, by avoiding publishing alias declarations until the RHS type expression has been constructed. There's an assertion to ensure that the alias's type must be the same during the initialization. However, that assertion is too strict, since we may construct different instances of the same type, if the type is an instantination of generic type. To fix this, we could use types.IdenticalStrict to ensure that these types matching exactly. Updates #66873. Updates #73309. Change-Id: I2559bed37e21615854333fb1057d7349406e6a1b Reviewed-on: https://go-review.googlesource.com/c/go/+/668175 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-22cmd/compile: fix ICE when transforming loopvarCuong Manh Le
When transforming for loop variables, the compiler does roughly following steps: (1) prebody = {z := z' for z in leaked} ... (4) init' = (init : s/z/z' for z in leaked) However, the definition of z is not updated to `z := z'` statement, causing ReassignOracle incorrectly use the new init statement with z' instead of z, trigger the ICE. Fixing this by updating the correct/new definition statement for z during the prebody initialization. Fixes #73823 Change-Id: Ice2a6741be7478506c58f4000f591d5582029136 Reviewed-on: https://go-review.googlesource.com/c/go/+/675475 Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2025-05-21cmd/compile/internal/ssa: eliminate string copies for calls to unique.MakeJake Bailey
unique.Make always copies strings passed into it, so it's safe to not copy byte slices converted to strings either. Handle this just like map accesses with string(b) as keys. This CL only handles unique.Make(string(b)), not nested cases like unique.Make([2]string{string(b1), string(b2)}); this could be done in a followup CL but the map lookup code in walk is sufficiently different than the call handling code that I didn't attempt it. (SSA is much easier). Fixes #71926 Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85 Reviewed-on: https://go-review.googlesource.com/c/go/+/672135 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21cmd/compile, unique: model data flow of non-string pointersCherry Mui
Currently, hash/maphash.Comparable escapes its parameter if it contains non-string pointers, but does not escape strings or types that contain strings but no other pointers. This is achieved by a compiler intrinsic. unique.Make does something similar: it stores its parameter to a central map, with strings cloned. So from the escape analysis's perspective, the non-string pointers are passed through, whereas string pointers are not. We currently cannot model this type of type-dependent data flow directly in Go. So we do this with a compiler intrinsic. In fact, we can unify this and the intrinsic above. Tests are from Jake Bailey's CL 671955 (thanks!). Fixes #73680. Change-Id: Ia6a78e09dee39f8d9198a16758e4b5322ee2c56a Reviewed-on: https://go-review.googlesource.com/c/go/+/675156 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Jake Bailey <jacob.b.bailey@gmail.com>
2025-05-21cmd/compile/internal/walk: use global zeroVal in interface conversions for ↵thepudds
zero values This is a small-ish adjustment to the change earlier in our stack in CL 649555, which started creating read-only global storage for a composite literal used in an interface conversion and setting the interface data pointer to point to that global storage. In some cases, there are execution-time performance benefits to point to runtime.zeroVal in particular. In reflect, pointer checks against the runtime.zeroVal memory address are used to side-step some work, such as in reflect.Value.Set and reflect.Value.IsZero. In this CL, we therefore dig up the zeroVal symbol, and we use the machinery from earlier in our stack to use a pointer to zeroVal for the interface data pointer if we see examples like: sink = S{} or: s := S{} sink = s CL 649076 (also earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. We add a benchmark in reflect to show examples of performance benefit. The left column is our immediately prior CL 649555, and the right is this CL. (The arrays of structs here do not seem to benefit, which we attempt to address in our next CL). goos: linux goarch: amd64 pkg: reflect cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ cl-649555 │ new │ │ sec/op │ sec/op vs base │ Zero/IsZero/ByteArray/size=16-4 4.176n ± 0% 4.171n ± 0% ~ (p=0.151 n=20) Zero/IsZero/ByteArray/size=64-4 6.921n ± 0% 3.864n ± 0% -44.16% (p=0.000 n=20) Zero/IsZero/ByteArray/size=1024-4 21.210n ± 0% 3.878n ± 0% -81.72% (p=0.000 n=20) Zero/IsZero/BigStruct/size=1024-4 25.505n ± 0% 5.061n ± 0% -80.15% (p=0.000 n=20) Zero/IsZero/SmallStruct/size=16-4 4.188n ± 0% 4.191n ± 0% ~ (p=0.106 n=20) Zero/IsZero/SmallStructArray/size=64-4 8.639n ± 0% 8.636n ± 0% ~ (p=0.973 n=20) Zero/IsZero/SmallStructArray/size=1024-4 79.99n ± 0% 80.06n ± 0% ~ (p=0.213 n=20) Zero/IsZero/Time/size=24-4 7.232n ± 0% 3.865n ± 0% -46.56% (p=0.000 n=20) Zero/SetZero/ByteArray/size=16-4 13.47n ± 0% 13.09n ± 0% -2.78% (p=0.000 n=20) Zero/SetZero/ByteArray/size=64-4 14.14n ± 0% 13.70n ± 0% -3.15% (p=0.000 n=20) Zero/SetZero/ByteArray/size=1024-4 24.22n ± 0% 20.18n ± 0% -16.68% (p=0.000 n=20) Zero/SetZero/BigStruct/size=1024-4 24.24n ± 0% 20.18n ± 0% -16.73% (p=0.000 n=20) Zero/SetZero/SmallStruct/size=16-4 13.45n ± 0% 13.10n ± 0% -2.60% (p=0.000 n=20) Zero/SetZero/SmallStructArray/size=64-4 14.12n ± 0% 13.69n ± 0% -3.05% (p=0.000 n=20) Zero/SetZero/SmallStructArray/size=1024-4 24.62n ± 0% 21.61n ± 0% -12.26% (p=0.000 n=20) Zero/SetZero/Time/size=24-4 13.59n ± 0% 13.40n ± 0% -1.40% (p=0.000 n=20) geomean 14.06n 10.19n -27.54% Finally, here are results from the benchmark example from #71323. Note however that almost all the benefit shown here is from our earlier CL 649555, which is a more general purpose change and eliminates the allocation using a different read-only global than this CL. │ go1.24 │ new │ │ sec/op │ sec/op vs base │ InterfaceAny 112.6000n ± 5% 0.8078n ± 3% -99.28% (p=0.000 n=20) ReflectValue 11.63n ± 2% 11.59n ± 0% ~ (p=0.330 n=20) │ go1.24.out │ new.out │ │ B/op │ B/op vs base │ InterfaceAny 224.0 ± 0% 0.0 ± 0% -100.00% (p=0.000 n=20) ReflectValue 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=20) ¹ │ go1.24.out │ new.out │ │ allocs/op │ allocs/op vs base │ InterfaceAny 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=20) ReflectValue 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=20) ¹ Updates #71359 Updates #71323 Change-Id: I64d8cf1a7900f011d2ec59b948388aeda1150676 Reviewed-on: https://go-review.googlesource.com/c/go/+/649078 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>
2025-05-21cmd/compile/internal/walk: convert composite literals to interfaces without ↵thepudds
allocating Today, this interface conversion causes the struct literal to be heap allocated: var sink any func example1() { sink = S{1, 1} } For basic literals like integers that are directly used in an interface conversion that would otherwise allocate, the compiler is able to use read-only global storage (see #18704). This CL extends that to struct and array literals as well by creating read-only global storage that is able to represent for example S{1, 1}, and then using a pointer to that storage in the interface when the interface conversion happens. A more challenging example is: func example2() { v := S{1, 1} sink = v } In this case, the struct literal is not directly part of the interface conversion, but is instead assigned to a local variable. To still avoid heap allocation in cases like this, in walk we construct a cache that maps from expressions used in interface conversions to earlier expressions that can be used to represent the same value (via ir.ReassignOracle.StaticValue). This is somewhat analogous to how we avoided heap allocation for basic literals in CL 649077 earlier in our stack, though here we also need to do a little more work to create the read-only global. CL 649076 (also earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. See the writeup in #71359 for details. Fixes #71359 Fixes #71323 Updates #62653 Updates #53465 Updates #8618 Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90 Reviewed-on: https://go-review.googlesource.com/c/go/+/649555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-21cmd/compile: fix offset calculation error in memcombineJunyang Shao
Fixes #73812 Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196 Reviewed-on: https://go-review.googlesource.com/c/go/+/675175 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21cmd/compile/internal/escape: propagate constants to interface conversions to ↵thepudds
avoid allocs Currently, the integer value in the following interface conversion gets heap allocated: v := 1000 fmt.Println(v) In contrast, this conversion does not currently cause the integer value to be heap allocated: fmt.Println(1000) The second example is able to avoid heap allocation because of an optimization in walk (by Josh in #18704 and related issues) that recognizes a literal is being used. In the first example, that optimization is currently thwarted by the literal getting assigned to a local variable prior to use in the interface conversion. This CL propagates constants to interface conversions like in the first example to avoid heap allocations, instead using a read-only global. The net effect is roughly turning the first example into the second. One place this comes up in practice currently is with logging or debug prints. For example, if we have something like: func conditionalDebugf(format string, args ...interface{}) { if debugEnabled { fmt.Fprintf(io.Discard, format, args...) } } Prior to this CL, this integer is heap allocated, even when the debugEnabled flag is false, and even when the compiler inlines conditionalDebugf: v := 1000 conditionalDebugf("hello %d", v) With this CL, the integer here is no longer heap allocated, even when the debugEnabled flag is enabled, because the compiler can now see that it can use a read-only global. See the writeup in #71359 for more details. CL 649076 (earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. Updates #71359 Updates #62653 Updates #53465 Updates #8618 Change-Id: I19a51e74b36576ebb0b9cf599267cbd2bd847ce4 Reviewed-on: https://go-review.googlesource.com/c/go/+/649079 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-21cmd/compile: add rules about ORN and ANDNXiaolin Zhao
Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279880 279776 -104 -0.0372% asm 556638 556410 -228 -0.0410% buildid 272272 272072 -200 -0.0735% cgo 481522 481318 -204 -0.0424% compile 2457788 2457580 -208 -0.0085% covdata 323384 323280 -104 -0.0322% cover 518450 518234 -216 -0.0417% dist 340790 340686 -104 -0.0305% distpack 282456 282252 -204 -0.0722% doc 789932 789688 -244 -0.0309% fix 324332 324228 -104 -0.0321% link 704622 704390 -232 -0.0329% nm 277132 277028 -104 -0.0375% objdump 507862 507758 -104 -0.0205% pack 221774 221674 -100 -0.0451% pprof 1469816 1469552 -264 -0.0180% test2json 254836 254732 -104 -0.0408% trace 1100002 1099738 -264 -0.0240% vet 781078 780874 -204 -0.0261% go 1529116 1528848 -268 -0.0175% gofmt 318556 318448 -108 -0.0339% total 13792238 13788566 -3672 -0.0266% Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef Reviewed-on: https://go-review.googlesource.com/c/go/+/674335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-20cmd/compile: fix the implementation of NORconst on loong64Xiaolin Zhao
In the loong64 instruction set, there is no NORI instruction, so the immediate value in NORconst need to be stored in register and then use the three-register NOR instruction. Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139 Reviewed-on: https://go-review.googlesource.com/c/go/+/673836 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20cmd/compile: memcombine different size storesJunyang Shao
This CL implements the TODO in combineStores to allow combining stores of different sizes, as long as the total size aligns to 2, 4, 8. Fixes #72832. Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/661855 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20cmd/compile: fold negation into addition/subtraction on arm64Julian Zhu
Fold negation into addition/subtraction and avoid double negation. platform: linux/arm64 file before after Δ % addr2line 3628108 3628116 +8 +0.000% asm 6208353 6207857 -496 -0.008% buildid 3460682 3460418 -264 -0.008% cgo 5572988 5572492 -496 -0.009% compile 26042159 26041039 -1120 -0.004% cover 6304328 6303472 -856 -0.014% dist 4139330 4139098 -232 -0.006% doc 9429305 9428065 -1240 -0.013% fix 3997189 3996733 -456 -0.011% link 8212128 8210280 -1848 -0.023% nm 3620056 3619696 -360 -0.010% objdump 5920289 5919233 -1056 -0.018% pack 2892250 2891778 -472 -0.016% pprof 17094569 17092745 -1824 -0.011% test2json 3335825 3335529 -296 -0.009% trace 15842080 15841456 -624 -0.004% vet 9472194 9471106 -1088 -0.011% go 19081541 19081509 -32 -0.000% total 154253374 154240622 -12752 -0.008% platform: darwin/arm64 file before after Δ % compile 27152002 27135490 -16512 -0.061% link 8372914 8356402 -16512 -0.197% go 19154802 19154778 -24 -0.000% total 157734180 157701132 -33048 -0.021% Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d Reviewed-on: https://go-review.googlesource.com/c/go/+/673715 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20cmd/compile/internal/escape: additional constant and zero value tests and ↵thepudds
logging This adds additional logging for the work that walk does to reduce how often an interface conversion results in an allocation. Also, as part of #71359, we will be updating how escape analysis and walk handle basic literals, composite literals, and zero values, so add some tests that uses this new logging. By the end of our CL stack, we address all of these tests. Updates #71359 Change-Id: I43fde8343d9aacaec1e05360417908014a86c8bd Reviewed-on: https://go-review.googlesource.com/c/go/+/649076 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-19cmd/compile: allocate backing store for append on the stackKeith Randall
When appending, if the backing store doesn't escape and a constant-sized backing store is big enough, use a constant-sized stack-allocated backing store instead of allocating it from the heap. cmd/go is <0.1% bigger. As an example of how this helps, if you edit strings/strings.go:FieldsFunc to replace spans := make([]span, 0, 32) with var spans []span then this CL removes the first 2 allocations that are part of the growth sequence: │ base │ exp │ │ allocs/op │ allocs/op vs base │ FieldsFunc/ASCII/16-24 3.000 ± ∞ ¹ 2.000 ± ∞ ¹ -33.33% (p=0.008 n=5) FieldsFunc/ASCII/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5) FieldsFunc/ASCII/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5) FieldsFunc/ASCII/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5) FieldsFunc/ASCII/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5) FieldsFunc/Mixed/16-24 2.000 ± ∞ ¹ 2.000 ± ∞ ¹ ~ (p=1.000 n=5) FieldsFunc/Mixed/256-24 7.000 ± ∞ ¹ 5.000 ± ∞ ¹ -28.57% (p=0.008 n=5) FieldsFunc/Mixed/4096-24 11.000 ± ∞ ¹ 9.000 ± ∞ ¹ -18.18% (p=0.008 n=5) FieldsFunc/Mixed/65536-24 18.00 ± ∞ ¹ 16.00 ± ∞ ¹ -11.11% (p=0.008 n=5) FieldsFunc/Mixed/1048576-24 30.00 ± ∞ ¹ 28.00 ± ∞ ¹ -6.67% (p=0.008 n=5) (Of course, people have spotted and fixed a bunch of allocation sites like this, but now we're ~automatically doing it everywhere going forward.) No significant increases in frame sizes in cmd/go. Change-Id: I301c4d9676667eacdae0058960321041d173751a Reviewed-on: https://go-review.googlesource.com/c/go/+/664299 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>
2025-05-19cmd/compile: derive bounds on signed %N for N a power of 2Keith Randall
-N+1 <= x % N <= N-1 This is useful for cases like: func setBit(b []byte, i int) { b[i/8] |= 1<<(i%8) } The shift does not need protection against larger-than-7 cases. (It does still need protection against <0 cases.) Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2 Reviewed-on: https://go-review.googlesource.com/c/go/+/672995 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-19cmd/compile: fold negation into addition/subtraction on mipsxJulian Zhu
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-19go/types, types2: improve error message for init without bodyMark Freeman
Change-Id: I8a684965e88e0e33a6ff33a16e08d136e3267f7e Reviewed-on: https://go-review.googlesource.com/c/go/+/663636 TryBot-Bypass: Mark Freeman <mark@golang.org> Auto-Submit: Mark Freeman <mark@golang.org> Reviewed-by: Robert Griesemer <gri@google.com>
2025-05-16cmd/compile: fold negation into addition/subtraction on mips64xJulian Zhu
Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 4007310 4007470 +160 +0.004% asm 7007636 7007436 -200 -0.003% buildid 3839268 3838972 -296 -0.008% cgo 6353466 6352738 -728 -0.011% compile 30426920 30426896 -24 -0.000% cover 7005408 7004744 -664 -0.009% dist 4651192 4650872 -320 -0.007% doc 10606050 10606034 -16 -0.000% fix 4446414 4446390 -24 -0.001% link 9237736 9237024 -712 -0.008% nm 3999107 3999323 +216 +0.005% objdump 6762424 6762144 -280 -0.004% pack 3270757 3270493 -264 -0.008% pprof 19428299 19361939 -66360 -0.342% test2json 3717345 3717217 -128 -0.003% trace 17382273 17381657 -616 -0.004% vet 10689481 10688985 -496 -0.005% go 19118769 19118609 -160 -0.001% total 171949855 171878943 -70912 -0.041% Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/673555 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-15cmd/compile: allow load-op merging in additional situationsKeith Randall
x += *p We want to do this with a single load+add operation on amd64. The tricky part is that we don't want to combine if there are other uses of x after this instruction. Implement a simple detector that seems to capture a common situation - x += *p is in a loop, and the other use of x is after loop exit. In that case, it does not hurt to do the load+add combo. Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c Reviewed-on: https://go-review.googlesource.com/c/go/+/672957 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-15cmd/compile: schedule induction variable increments lateKeith Randall
for ..; ..; i++ { ... } We want to schedule the i++ late in the block, so that all other uses of i in the block are scheduled first. That way, i++ can happen in place in a register instead of requiring a temporary register. Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df Reviewed-on: https://go-review.googlesource.com/c/go/+/672998 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-05-15cmd/compile: create LSym for closures with type conversionYongyue Sun
Follow-up to #54959 with another failing case. The linker needs FuncInfo metadata for all inlined functions. CL 436240 explicitly creates LSym for direct closure calls to ensure we keep the FuncInfo metadata. However, CL 436240 won't work if the direct closure call is wrapped by a no-effect type conversion, even if that closure could be inlined. This commit should fix such case. Fixes #73716 Change-Id: Icda6024da54c8d933f87300e691334c080344695 GitHub-Last-Rev: e9aed02eb6ef343e4ed2d8a79f6426abf917ab0e GitHub-Pull-Request: golang/go#73718 Reviewed-on: https://go-review.googlesource.com/c/go/+/672855 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-14cmd/compile: fold negation into addition/subtraction on loong64Xiaolin Zhao
This change also avoid double negation, and add loong64 codegen for arithmetic tests. Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279972 279896 -76 -0.0271% asm 556390 556310 -80 -0.0144% buildid 272376 272300 -76 -0.0279% cgo 481534 481550 +16 +0.0033% compile 2457992 2457396 -596 -0.0242% covdata 323488 323404 -84 -0.0260% cover 518630 518490 -140 -0.0270% dist 340894 340814 -80 -0.0235% distpack 282568 282484 -84 -0.0297% doc 790224 789984 -240 -0.0304% fix 324408 324348 -60 -0.0185% link 704910 704666 -244 -0.0346% nm 277220 277144 -76 -0.0274% objdump 508026 507878 -148 -0.0291% pack 221810 221786 -24 -0.0108% pprof 1470284 1469880 -404 -0.0275% test2json 254896 254852 -44 -0.0173% trace 1100390 1100074 -316 -0.0287% vet 781398 781142 -256 -0.0328% go 1529668 1529128 -540 -0.0353% gofmt 318668 318568 -100 -0.0314% total 13795746 13792094 -3652 -0.0265% Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca Reviewed-on: https://go-review.googlesource.com/c/go/+/672555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-12cmd/link: use >4GB base address for 64-bit PE binariesqmuntal
Windows prefers 64-bit binaries to be loaded at an address above 4GB. Having a preferred base address below this boundary triggers a compatibility mode in Address Space Layout Randomization (ASLR) on recent versions of Windows that reduces the number of locations to which ASLR may relocate the binary. The Go internal linker was using a smaller base address due to an issue with how dynamic cgo symbols were relocated, which has been fixed in this CL. Fixes #73561. Cq-Include-Trybots: luci.golang.try:gotip-windows-amd64-longtest Change-Id: Ia8cb35d57d921d9be706a8975fa085af7996f124 Reviewed-on: https://go-review.googlesource.com/c/go/+/671515 Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-05-08runtime: schedule cleanups across multiple goroutinesMichael Anthony Knyszek
This change splits the finalizer and cleanup queues and implements a new lock-free blocking queue for cleanups. The basic design is as follows: The cleanup queue is organized in fixed-sized blocks. Individual cleanup functions are queued, but only whole blocks are dequeued. Enqueuing cleanups places them in P-local cleanup blocks. These are flushed to the full list as they get full. Cleanups can only be enqueued by an active sweeper. Dequeuing cleanups always dequeues entire blocks from the full list. Cleanup blocks can be dequeued and executed at any time. The very last active sweeper in the sweep phase is responsible for flushing all local cleanup blocks to the full list. It can do this without any synchronization because the next GC can't start yet, so we can be very certain that nobody else will be accessing the local blocks. Cleanup blocks are stored off-heap because the need to be allocated by the sweeper, which is called from heap allocation paths. As a result, the GC treats cleanup blocks as roots, just like finalizer blocks. Flushes to the full list signal to the scheduler that cleanup goroutines should be awoken. Every time the scheduler goes to wake up a cleanup goroutine and there were more signals than goroutines to wake, it then forwards this signal to runtime.AddCleanup, so that it creates another goroutine the next time it is called, up to gomaxprocs goroutines. The signals here are a little convoluted, but exist because the sweeper and the scheduler cannot safely create new goroutines. For #71772. For #71825. Change-Id: Ie839fde2b67e1b79ac1426be0ea29a8d923a62cc Reviewed-on: https://go-review.googlesource.com/c/go/+/650697 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>
2025-05-08cmd/compile: add 2 phiopt casesJakub Ciolek
Add 2 more cases: if a { x = value } else { x = a } => x = a && value if a { x = a } else { x = value } => x = a || value AND case goes from: 00006 (8) TESTB AX, AX 00007 (8) JNE 9 00008 (13) MOVL AX, BX 00009 (13) MOVL BX, AX 00010 (13) RET to: 00006 (13) ANDL BX, AX 00007 (13) RET OR goes from: 00006 (19) TESTB AX, AX 00007 (19) JNE 9 00008 (24) MOVL BX, AX 00009 (24) RET to: 00006 (24) ORL BX, AX 00007 (24) RET compilecmp linux/amd64: runtime runtime.lock2 847 -> 869 (+2.60%) runtime.addspecial 542 -> 517 (-4.61%) runtime.tracebackPCs changed runtime.scanstack changed runtime.mallocinit changed runtime.traceback2 2238 -> 2206 (-1.43%) runtime [cmd/compile] runtime.lock2 860 -> 882 (+2.56%) runtime.scanstack changed runtime.addspecial 542 -> 517 (-4.61%) runtime.traceback2 2238 -> 2206 (-1.43%) runtime.lockWithRank 870 -> 890 (+2.30%) runtime.tracebackPCs changed runtime.mallocinit changed strconv strconv.ryuFtoaFixed32 changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.readFloat changed strconv.ryuFtoaShortest changed strings strings.(*Replacer).build changed strconv [cmd/compile] strconv.readFloat changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.ryuFtoaFixed32 changed strconv.ryuFtoaShortest changed strings [cmd/compile] strings.(*Replacer).build changed regexp regexp.makeOnePass.func1 changed regexp [cmd/compile] regexp.makeOnePass.func1 changed encoding/json encoding/json.indirect changed database/sql database/sql.driverArgsConnLocked changed vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.Form.transform changed go/doc/comment go/doc/comment.parseSpans changed internal/diff internal/diff.tgs changed log/slog log/slog.(*handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%) testing/fstest testing/fstest.(*fsTester).checkGlob changed runtime/pprof runtime/pprof.(*profileBuilder).build changed cmd/internal/dwarf cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%) go/printer go/printer.keepTypeColumn 302 -> 270 (-10.60%) go/printer.(*printer).binaryExpr changed cmd/compile/internal/syntax cmd/compile/internal/syntax.(*scanner).rune changed cmd/compile/internal/syntax.(*scanner).number 2137 -> 2153 (+0.75%) Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73 Reviewed-on: https://go-review.googlesource.com/c/go/+/666835 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile: improve multiplication strength reductionKeith Randall
Use an automatic algorithm to generate strength reduction code. You give it all the linear combination (a*x+b*y) instructions in your architecture, it figures out the rest. Just amd64 and arm64 for now. Fixes #67575 Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/626998 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01cmd/compile: intrinsify math/bits.Bswap on riscv64Joel Sing
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap using the REV8 machine instruction. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ rb.1 │ rb.2 │ │ sec/op │ sec/op vs base │ ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10) ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10) ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10) ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10) geomean 13.11n 4.649n -64.54% Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece Reviewed-on: https://go-review.googlesource.com/c/go/+/660855 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-04-24cmd/compile: put constant value on node inside parenthesesKeith Randall
That's where the unified IR writer expects it. Fixes #73476 Change-Id: Ic22bd8dee5be5991e6d126ae3f6eccb2acdc0b19 Reviewed-on: https://go-review.googlesource.com/c/go/+/667415 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com>
2025-04-24cmd/compile: add cast in range loop final value computationKeith Randall
When replacing a loop where the iteration variable has a named type, we need to compute the last iteration value as i = T(len(a)-1), not just i = len(a)-1. Fixes #73491 Change-Id: Ic1cc3bdf8571a40c10060f929a9db8a888de2b70 Reviewed-on: https://go-review.googlesource.com/c/go/+/667815 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-04-23runtime: use precise bounds of Go data/bss for race detectorKeith Randall
We only want to call into the race detector for Go global variables. By rounding up the region bounds, we can include some C globals. Even worse, we can include only *part* of a C global, leading to race{read,write}range calls which straddle the end of shadow memory. That causes the race detector to barf. Fix some off-by-one errors in the assembly comparisons. We want to skip calling the race detector when addr == racedataend. Fixes #73483 Change-Id: I436b0f588d6165b61f30cb7653016ba9b7cbf585 Reviewed-on: https://go-review.googlesource.com/c/go/+/667655 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2025-04-22cmd/compile: constant fold 128-bit multipliesKeith Randall
The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22sync: use atomic.Bool for Once.donePrabhav Dogra
Updated the use of atomic.Uint32 to atomic.Bool for sync package. Change-Id: Ib8da66fea86ef06e1427ac5118016b96fbcda6b1 GitHub-Last-Rev: d36e0f431fcde988f90badf86bbf04a18a411947 GitHub-Pull-Request: golang/go#73447 Reviewed-on: https://go-review.googlesource.com/c/go/+/666895 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-04-21cmd/compile: ensure we evaluate side effects of len() argKeith Randall
For any len() which requires the evaluation of its arg (according to the spec). Update #72844 Change-Id: Id2b0bcc78073a6d5051abd000131dafdf65e7f26 Reviewed-on: https://go-review.googlesource.com/c/go/+/658097 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-04-21cmd/compile: don't evaluate side effects of range over arrayKeith Randall
If the thing we're ranging over is an array or ptr to array, and it doesn't have a function call or channel receive in it, then we shouldn't evaluate it. Typecheck the ranged-over value as a constant in that case. That makes the unified exporter replace the range expression with a constant int. Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554 Reviewed-on: https://go-review.googlesource.com/c/go/+/659317 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-09cmd/compile: set unalignedOK to make memcombine work properly on loong64limeidan
goos: linux goarch: loong64 pkg: unicode/utf8 cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ ValidTenASCIIChars 7.604n ± 0% 6.805n ± 0% -10.51% (p=0.000 n=10) Valid100KASCIIChars 37.41µ ± 0% 16.58µ ± 0% -55.67% (p=0.000 n=10) ValidTenJapaneseChars 60.84n ± 0% 58.62n ± 0% -3.64% (p=0.000 n=10) ValidLongMostlyASCII 113.5µ ± 0% 113.5µ ± 0% ~ (p=0.303 n=10) ValidLongJapanese 204.6µ ± 0% 206.8µ ± 0% +1.07% (p=0.000 n=10) ValidStringTenASCIIChars 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=10) ValidString100KASCIIChars 38.05µ ± 0% 17.14µ ± 0% -54.97% (p=0.000 n=10) ValidStringTenJapaneseChars 60.58n ± 0% 59.48n ± 0% -1.82% (p=0.000 n=10) ValidStringLongMostlyASCII 113.5µ ± 0% 113.4µ ± 0% -0.10% (p=0.000 n=10) ValidStringLongJapanese 205.9µ ± 0% 207.3µ ± 0% +0.67% (p=0.000 n=10) geomean 3.324µ 2.756µ -17.08% Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/662495 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com>
2025-04-07cmd/compile: add additional flag constant folding rulesKeith Randall
Fixes #73200 Change-Id: I77518d37acd838acf79ed113194bac5e2c30897f Reviewed-on: https://go-review.googlesource.com/c/go/+/663535 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-04-06cmd/compile: be more conservative about arm64 insns that can take zero registerKeith Randall
It's really only needed for stores and store-like instructions (atomic exchange, compare-and-swap, ...). Fixes #73180 Change-Id: I8ecd833a301355adf0fa4bff43250091640c6226 Reviewed-on: https://go-review.googlesource.com/c/go/+/663155 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
2025-04-04cmd/compile: allow pointer-containing elements in stack allocationsKeith Randall
For variable-sized allocations. Turns out that we already implement the correct escape semantics for this case. Even when the result of the "make" does not escape, everything assigned into it does. Change-Id: Ia123c538d39f2f1e1581c24e4135a65af3821c5e Reviewed-on: https://go-review.googlesource.com/c/go/+/657937 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>
2025-04-04cmd/compile: stack allocate variable-sized makesliceKeith Randall
Instead of always allocating variable-sized "make" calls on the heap, allocate a small, constant-sized array on the stack and use that array as the backing store if it is big enough. Requires the result of the "make" doesn't escape. if cap <= K { var arr [K]E slice = arr[:len:cap] } else { slice = makeslice(E, len, cap) } Pretty conservatively for now, K = 32/sizeof(E). The slice header is already 24 bytes, so wasting 32 bytes of stack if the requested size is too big isn't that bad. Larger would waste more stack space but maybe avoid more allocations. This CL also requires the element type be pointer-free. Maybe we could relax that at some point, but it is hard. If the element type has pointers we can get heap->stack pointers (in the case where the requested size is too big and the slice is heap allocated). Note that this only handles the case of makeslice called directly from compiler-generated code. It does not handle slices built in the runtime on behalf of the program (e.g. in growslice). Some of those are currently handled by passing in a tmpBuf (e.g. concatstrings), but we could probably do more. Change-Id: I8378efad527cd00d25948a80b82a68d88fbd93a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/653856 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-04-04cmd/compile: improve store-to-load forwarding with compatible typesAlexander Musman
Improve the compiler's store-to-load forwarding optimization by relaxing the type comparison condition. Instead of requiring exact type equality (CMPeq), we now use copyCompatibleType which allows forwarding between compatible types where safe. Fix several size comparison bugs in the nested store patterns. Previously, we were comparing the size of the outer store with the load type, rather than comparing with the size of the actual store being forwarded from. Skip OpConvert in dead store elimination to help get rid of dead stores such as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory. This optimization is particularly beneficial for code that creates slices with computed pointers, such as the runtime's heapBitsSlice function, where intermediate calculations were previously causing the compiler to miss store-to-load forwarding opportunities. Local sweet run result on an x86_64 laptop: │ Orig.res │ Hopt.res │ │ sec/op │ sec/op vs base │ BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10) BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10) BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10) EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10) EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10) GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10) GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10) GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10) GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10) GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10) GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10) GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10) MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10) Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10) geomean 1.336 1.333 -0.22% Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f Reviewed-on: https://go-review.googlesource.com/c/go/+/662075 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Carlos Amedee <carlos@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-03-28cmd/compile/internal/ssa: optimise more branches with zero on riscv64Joel Sing
Optimise more branches with zero on riscv64. In particular, BLTU with zero occurs with IsInBounds checks for index zero. This currently results in two instructions and requires an additional register: li t2, 0 bltu t2, t1, 0x174b4 This is equivalent to checking if the bounds is not equal to zero. With this change: bnez t1, 0x174c0 This removes more than 500 instructions from the Go binary on riscv64. Change-Id: I6cd861d853e3ef270bd46dacecdfaa205b1c4644 Reviewed-on: https://go-review.googlesource.com/c/go/+/606715 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-27cmd/compile: rename some test packages in codegenMark Freeman
All other files here use the codegen package. Change-Id: I714162941b9fa9051dacc29643e905fe60b9304b Reviewed-on: https://go-review.googlesource.com/c/go/+/661135 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>