aboutsummaryrefslogtreecommitdiff
path: root/src/cmd/compile/internal/ssa/_gen
AgeCommit message (Collapse)Author
2026-01-28cmd/compile, simd: capture VAES instructions and fix AVX512VAES featureJunyang Shao
The code previously filters out VAES-only instructions, this CL added them back. This CL added the VAES feature check following the Intel xed data: XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28 This CL also found out that the old AVX512VAES feature check is not checking the correct bits, it also fixes it: XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 It restricts to the most strict common set - includes avx512vl for even 512-bits although it doesn't requires it. Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/738500 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-28cmd/compile: (loong64) optimize float32(abs|sqrt64(float64(x)))Xiaolin Zhao
Ref: #733621 Updates #75463 Change-Id: Idd8821d1713754097a2fe83a050c25d9ec5b17eb Reviewed-on: https://go-review.googlesource.com/c/go/+/735540 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-28cmd/compile: remove the NORconst op on mips{,64}Xiaolin Zhao
In the mips{,64} instruction sets and their extensions, there is no NORI instruction. Change-Id: If008442c792297d011b3d0c1e8501e62e32ab175 Reviewed-on: https://go-review.googlesource.com/c/go/+/735900 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-01-27cmd/compile, runtime: avoid improper control transfer instruction hints on ↵wangboyao
riscv64 On RISC-V the JAL and JALR instructions provide Return Address Stack(RAS) prediction hints based on the registers used (as per section 2.5.1 of the RISC-V ISA manual). When a JALR instruction uses X1 or X5 as the source register, it hints that a pop should occur. When making a function call, avoid the use of X5 as a source register since this results in the RAS performing a pop-then-push instead of a push, breaking call/return pairing and significantly degrading front-end branch prediction performance. Based on test result of golang.org/x/benchmarks/json on SpacemiT K1, fix version has a performance improvement of about 7% Fixes #76654 Change-Id: I867c8d7cfb54f5decbe176f3ab3bb3d78af1cf64 Reviewed-on: https://go-review.googlesource.com/c/go/+/726760 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au>
2026-01-23cmd/compile: on amd64 use 32bits copies for 64bits copies of 32bits valuesJorropo
Fixes #76449 This saves a single byte for the REX prefix per OpCopy it triggers on. Change-Id: I1eab364d07354555ba2f23ffd2f9c522d4a04bd0 Reviewed-on: https://go-review.googlesource.com/c/go/+/731640 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-23cmd/compile: cleanup isUnsignedPowerOfTwoJorropo
Merge the signed and unsigned generic functions. The only implementation difference between the two is: n > 0 vs n != 0 check. For unsigned numbers n > 0 == n != 0 and we infact optimize the first to the second. Change-Id: Ia2f6c3e3d4eb098d98f85e06dc2e81baa60bad4e Reviewed-on: https://go-review.googlesource.com/c/go/+/726720 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-23cmd/compile: avoid extending when already sufficiently shifted on loong64Xiaolin Zhao
This reduces 744 instructions from the go toolchain binary on loong64. file before after Δ % asm 599282 599222 -60 -0.0100% cgo 513606 513534 -72 -0.0140% compile 2939250 2939146 -104 -0.0035% cover 564136 564056 -80 -0.0142% fix 895622 895546 -76 -0.0085% link 759460 759376 -84 -0.0111% preprofile 264960 264916 -44 -0.0166% vet 869964 869888 -76 -0.0087% go 1712990 1712890 -100 -0.0058% gofmt 346416 346368 -48 -0.0139% total 9465686 9464942 -744 -0.0079% Change-Id: I32dfa7506d0458ca0b6de83b030c330cd2b82176 Reviewed-on: https://go-review.googlesource.com/c/go/+/725720 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-01-22cmd/compile: ensure ops have the expected argument widthsKeith Randall
The generic SSA representation uses explicit extension and truncation operations to change widths of values. The map intrinsics were playing somewhat fast and loose with this requirement. Fix that, and add a check to make sure we don't regress. I don't think there is a triggerable bug here, but I ran into this with some prove pass modifications, where cmd/compile/internal/ssa/prove.go:isCleanExt (and/or its uses) is actually wrong when this invariant is not maintained. Change-Id: Idb7be6e691e2dbf6d7af6584641c3227c5c64bf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/731300 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2026-01-22cmd/compile: optimize small constant-sized MemEqAlexander Musman
Add optimization patterns for MemEq with small constant sizes (3-32 bytes). These patterns help to avoid runtime calls for small sizes. For sizes 3-16, combine two chunks loading and comparison. For sizes 17-32, combine a 16-byte comparison with the remaining bytes. This change may increase binary size slightly due to inline expansion, but improves performance for code with many small memequals, e.g. DecodehealingTracker benchmark on arm64: shortname: minio pkg: github.com/minio/minio/cmd │ Orig.res │ Uexp.res │ │ sec/op │ sec/op vs base │ DecodehealingTracker-4 842.5n ± 1% 794.0n ± 3% -5.75% (p=0.000 n=10) AppendMsgResyncTargetsInfo-4 8.472n ± 0% 8.472n ± 0% ~ (p=0.582 n=10) DataUpdateTracker-4 2.856µ ± 2% 2.804µ ± 3% ~ (p=0.210 n=10) MarshalMsgdataUsageCacheInfo-4 131.2n ± 1% 131.6n ± 2% ~ (p=0.494 n=10) geomean 227.4n 223.2n -1.86% │ Orig.res │ Uexp.res │ │ B/s │ B/s vs base │ DecodehealingTracker-4 352.0Mi ± 1% 373.5Mi ± 3% +6.10% (p=0.000 n=10) AppendMsgResyncTargetsInfo-4 1.099Gi ± 0% 1.099Gi ± 0% ~ (p=0.183 n=10) DataUpdateTracker-4 341.8Ki ± 3% 351.6Ki ± 3% ~ (p=0.286 n=10) geomean 50.95Mi 52.46Mi +2.96% Change-Id: If3d7e7395656d5f36e3ab303a71044293d17bc3e Reviewed-on: https://go-review.googlesource.com/c/go/+/688195 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2026-01-13simd/archsimd: 128- and 256-bit FMA operations do not require AVX-512Austin Clements
Currently, all FMA operations are marked as requiring AVX512, even on smaller vector widths. This is happening because the narrower FMA operations are marked as extension "FMA" in the XED. Since this extension doesn't start with "AVX", we filter them out very early in the XED process. However, this is just a quirk of naming: the FMA feature depends on the AVX feature, so it is part of AVX, even if it doesn't say so on the tin. Fix this by accepting the FMA extension and adding FMA to the table of CPU features. We also tweak internal/cpu slightly do it correctly enforces that the logical FMA feature depends on both the FMA and AVX CPUID flags. This actually *deletes* a lot of generated code because we no longer need the AVX-512 encoding of these 128- and 256-bit operations. Change-Id: I744a18d0be888f536ac034fe88b110347622be7e Reviewed-on: https://go-review.googlesource.com/c/go/+/736160 Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/736201 Reviewed-by: Austin Clements <austin@google.com>
2026-01-08simd/archsimd: rename Broadcast methodsCherry Mui
Currently the Broadcast128/256/512 methods broadcast the lowest element of the input vector to a vector of the corresponding width. There are also variations of broadcast operations that broadcast the whole (128- or 256-bit) vector to a larger vector, which we don't yet support. Our current naming is unclear which version it is, though. Rename the current ones to Broadcast1ToN, to be clear that they broadcast one element. The vector version probably will be named BoradcastAllToN (not included in this CL). Change-Id: I47a21e367f948ec0b578d63706a40d20f5a9f46d Reviewed-on: https://go-review.googlesource.com/c/go/+/734840 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-01-05simd/archsimd: use V(P)MOVMSK for mask ToBits if possibleCherry Mui
VPMOVMSKB, VMOVMSKPS, and VMOVMSKPD moves AVX1/2-style masks to integer registers, similar to VPMOV[BWDQ]2M (which moves to mask registers). The former is available on AVX1/2, the latter requires AVX512. So use the former if it is supported, i.e. for 128- and 256-bit vectors with 8-, 32-, and 64-bit elements (16-bit elements always require AVX512). Change-Id: I972195116617ed2faaf95cee5cd6b250e671496c Reviewed-on: https://go-review.googlesource.com/c/go/+/734060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-01-02cmd/compile: optimize SIMD IsNaN.Or(IsNaN)Cherry Mui
IsNaN's underlying instruction, VCMPPS (or VCMPPD), takes two inputs, and computes either of them is NaN. Optimize the Or pattern to generate two-operand form. This implements the optimization mentioned in CL 733660. Change-Id: I13943b377ee384864c913eed320763f333a03e41 Reviewed-on: https://go-review.googlesource.com/c/go/+/733680 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-02simd/archsimd: make IsNaN unaryCherry Mui
Currently, the IsNan API is defined as x.IsNan(y), which returns a mask to represent, for each element, either x or y is NaN. Albeit closer to the machine instruction, this is weird API, as IsNaN is a unary operation. This CL changes it to unary, x.IsNaN(). It compiles to VCMPPS $3, x, x (or VCMPPD). For the two-operand version, we can optimize x.IsNaN().Or(y.IsNaN()) to VCMPPS $3, x, y (not done in this CL). While here, change the name to IsNaN (uppercase both Ns), which matches math.IsNaN. Tests in the next CL. Change-Id: Ib6e7afc2635e6c3c606db5ea16420ee673a6c6d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/733660 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-29simd/archsimd: remove redundant suffix of ExtendLo operationsCherry Mui
For methods like ExtendLo2ToInt64x2, the last "x2" is redundant, as it is already mentioned in "Lo2". Remove it, so it is just ExtendLo2ToInt64. Change-Id: I490afd818c40bb7a4ef15c249723895735bd6488 Reviewed-on: https://go-review.googlesource.com/c/go/+/733100 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-24simd/archsimd: fix "go generate" commandCherry Mui
Correct the generate command for test helpers. There is no longer a genfiles.go. Also correct the generated file headers to match the current generator layout. Change-Id: Ifb9a8c394477359020ff44290dbaabe7a2d59aca Reviewed-on: https://go-review.googlesource.com/c/go/+/732280 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: David Chase <drchase@google.com>
2025-12-22simd/archsimd: correct documentation for pairwise operationsCherry Mui
For Add/SubPairs(Saturated?), the documented result element order is wrong. Corrected. Also, for 256-bit vectors, this is a grouped operation. So name it with the Grouped suffix to be clear. Change-Id: Idfd0975cb4a332b2e28c898613861205d26f75b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/732020 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-19simd/archsimd: delete DotProductQuadruple methods for nowCherry Mui
The DotProductQuadruple methods are currently defined on Int8 vectors. There are some problems for that. 1. We defined a DotProductQuadrupleSaturated method, but the dot product part does not need saturation, as it cannot overflow. It is the addition part of VPDPBUSDS that does the saturation. Currently we have optimization rules like x.DotProductQuadrupleSaturated(y).Add(z) -> VPDPBUSDS which is incorrect, in that the dot product doesn't do (or need) saturation, and the Add is a regular Add, but we rewrite it to a saturated add. The correct rule should be something like x.DotProductQuadruple(y).AddSaturated(z) -> VPDPBUSDS 2. There are multiple flavors of DotProductQuadruple: signed/unsigned × signed/unsigned, which cannot be completely disambiguated by the type. The current naming may preclude adding all the flavors. For these reasons, remove the methods for now. We can add them later with the issues addressed. Change-Id: I549c0925afaa68c7e2cc956105619f2c1b46b325 Reviewed-on: https://go-review.googlesource.com/c/go/+/731441 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-19simd/archsimd: add Grouped for 256- and 512-bit SaturateTo(U)Int16Concat, ↵Cherry Mui
and fix type They operate on 128-bit groups, so name them Grouped to be clear, and consistent with other grouped operations. Reword the documentation, mention the grouping only for grouped versions. Also, SaturateToUnt16Concat(Grouped) is a signed int32 to unsigned uint16 saturated conversion. The receiver and the parameter should be signed. The result remains unsigned. Change-Id: I30e28bc05e07f5c28214c9c6d9d201cbbb183468 Reviewed-on: https://go-review.googlesource.com/c/go/+/731501 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-19simd/archsimd: correct type and instruction for SaturateToUint8Cherry Mui
It should be defined on unsigned types, not signed types, and use unsigned conversion instructions. Change-Id: I49694ccdf1d331cfde88591531c358d9886e83e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/731500 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-16cmd/compile: use unsigned constant when folding loads for SIMD ops with ↵Cherry Mui
constants When folding loads into a SIMD op with a constant, in the SSA rules we use makeValAndOff to create an AuxInt for the constant and the offset. For the SIMD ops of concern (for now), the constants are always unsigned. So pass the constant unsigned. Fixes #76756. Change-Id: Ia5910e689ff510ce54d3a0c2ed0e950bc54f8862 Reviewed-on: https://go-review.googlesource.com/c/go/+/730420 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-08[dev.simd] cmd/compile: zero only low 128-bit of X15Cherry Mui
Zeroing the upper part of X15 may make the CPU think it is "dirty" and slow down SSE operations. For now, just not zeroing the upper part, and construct a zero value on the fly if we need a 256- or 512-bit zero value. Maybe VZEROUPPER works better than explicitly zeroing X15, but we need to evaluate. Long term, we probably want to move more things from SSE to AVX. This essentially undoes CL 698237 and CL 698238, except keeping using X15 for 128-bit zeroing for SIMD. Change-Id: I1564e6332c4c57f9721397c92c7c734c5497534c Reviewed-on: https://go-review.googlesource.com/c/go/+/728240 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-08[dev.simd] simd: add carryless multiplyDavid Chase
now with comments, and also a test. choice of data types, method names, etc, are all up for comment. It's NOT commutative, because of the immediate operand (unless we swap the bits of the immediate). Change-Id: I730a6938c6803d0b93544445db65eadc51783e42 Reviewed-on: https://go-review.googlesource.com/c/go/+/726963 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-03[dev.simd] all: merge master (5945fc0) into dev.simdDavid Chase
Merge List: + 2025-12-03 5945fc02fc doc/next: delete + 2025-12-03 dcc5fe0c62 api: promote next to go1.26 + 2025-12-03 7991da1161 crypto/hpke: remove unused hybridKEM field + 2025-12-03 2729e87aa5 doc/next: pluralize 'result' + 2025-12-03 6e72f526cd doc/next/6-stdlib/99-minor/go/ast/76031.md: add BasicLit caveat + 2025-12-03 fa30b68767 go/{ast,doc}: update BasicLit.ValueEnd as well as ValuePos + 2025-12-03 32a9804c7b cmd/link: don't update offset of existing ELF section name + 2025-12-02 509ddf3868 cmd/compile: ensure bloop only kept alive addressable nodes + 2025-12-02 7cab1b1b26 doc: pre-announce removal of gotypesalias and asynctimerchan GODEBUG flags + 2025-12-02 1a64db3a4b spec: remove restriction on channel element types for close built-in (bug fix) + 2025-12-02 2e06fa6b68 doc/next: release note for scheduler metrics + 2025-12-02 77c795011b doc/next: document cgo call overhead improvement + 2025-12-02 6e4abe8cef doc: mention stack allocation of slices + 2025-12-02 88c24de8b5 doc/next: add section for Green Tea + 2025-12-02 043b9de658 net: parse addresses without separators in ParseMac + 2025-12-02 e432b4f3a1 cmd/compile: more generated equality function tests + 2025-12-02 c1acdcb345 crypto/x509: prevent HostnameError.Error() from consuming excessive resource + 2025-12-02 8ae5d408ed spec: more precise prose for built-in function new + 2025-12-02 c5c05a0e43 cmd/go: add test checking version with experiment is valid + 2025-12-01 f22d37d574 runtime/internal/testprog: log initial SchedMetrics GOMAXPROCS + 2025-12-01 8b5db48db1 net/http: deflake TestClientConnReserveAndConsume + 2025-12-01 94616dad42 internal/runtime/cgroup: remove duplicate readString definition + 2025-12-01 67851547d8 internal/runtime/cgroup: lineReader fuzz test + 2025-12-01 ac3e0ae51a doc: document go tool pprof -http default change + 2025-12-01 42e03bbd27 debug/elf: correct case of DWARF in comment + 2025-12-01 18015e8c36 doc/next: clean up some Go 1.26 release notes + 2025-12-01 4be545115c cmd/pprof: update vendored github.com/google/pprof + 2025-12-01 16c0f7e152 cmd/compile: run go generate for internal/ir + 2025-12-01 dc913c316a all: update vendored dependencies + 2025-12-01 1555fad47d vendor/golang.org/x/tools: update to 1ad6f3d + 2025-12-01 eec1afeb28 debug/elf: make check for empty symbol section consistent for 64-bit and 32-bit binaries + 2025-11-28 3f94f3d4b2 test/codegen: fix shift tests on riscv64 + 2025-11-28 2ac1f9cbc3 cmd/compile: avoid unnecessary interface conversion in bloop + 2025-11-28 de456450e7 runtime/secret: disable tests under memory validating modes + 2025-11-27 67d4a28707 fmt: document space behavior of Append + 2025-11-27 c079dd13c0 runtime/secret: reorganize tests to fix -buildmode=shared + 2025-11-27 2947cb0469 runtime/_mkmalloc: fix log.Fatal formatting directive + 2025-11-26 cead111a77 internal/runtime/cgroup: stricter unescapePath + 2025-11-26 c2af9f14b4 internal/runtime/cgroup: fix path on non-root mount point + 2025-11-26 6be5de4bc4 internal/runtime/cgroup: simplify escapePath in test + 2025-11-26 481c6df7b9 io: reduce intermediate allocations in ReadAll and have a smaller final result + 2025-11-26 cec4d4303f os: allow direntries to have zero inodes on Linux + 2025-11-26 f1bbc66a10 cmd/link: test that moduledata is in its own section + 2025-11-26 003f52407a cmd/link: test that findfunctab is in gopclntab section + 2025-11-26 21b6ab57d5 cmd/link: test that funcdata values are in gopclntab section + 2025-11-26 c03e25a263 cmd/link: always run current linker in tests + 2025-11-26 9f5cd43fe6 cmd/link: put moduledata in its own .go.module section + 2025-11-26 43cfd785e7 cmd/link, runtime, debug/gosym: move pclntab magic to internal/abi + 2025-11-26 312b2034a4 cmd/link: put runtime.findfunctab in the .gopclntab section + 2025-11-26 b437d5bf36 cmd/link: put funcdata symbols in .gopclntab section + 2025-11-26 4bc3410b6c cmd/link: build shstrtab from ELF sections + 2025-11-26 b0c278be40 cmd/link: use shdr as a slice rather than counting in elfhdr.Shnum + 2025-11-26 0ff323143d cmd/link: sort allocated ELF section headers by address + 2025-11-26 4879151d1d cmd/compile: introduce alias analysis and automatically free non-aliased memory after growslice + 2025-11-26 d8269ab0d5 cmd/link, cmd/internal/obj: fix a remote call failure issue + 2025-11-26 c6d64f8556 cmd/internal/obj/loong64: remove the incorrect unsigned instructions + 2025-11-26 c048a9a11f go/types, types2: remove InvalidTypeCycle from literals.go + 2025-11-26 ff2fd6327e go/types, types2: remove setDefType and most def plumbing + 2025-11-26 3531ac23d4 go/types, types2: replace setDefType with pending type check + 2025-11-26 2b8dbb35b0 crypto,testing/cryptotest: ignore random io.Reader params, add SetGlobalRandom + 2025-11-26 21ebed0ac0 runtime: update mkmalloc to make generated code look nicer + 2025-11-26 a3fb92a710 runtime/secret: implement new secret package + 2025-11-26 0c747b7aa7 go/build/constraint: use strings.Builder instead of for { str+=str } + 2025-11-26 0f6397384b go/types: relax NewSignatureType for append(slice, str...) + 2025-11-26 992ad55e3d crypto/tls: support crypto.MessageSigner private keys + 2025-11-26 3fd9cb1895 cmd/compile: fix bloop get name logic + 2025-11-26 3353c100bb cmd/go: remove experiment checks for compile -c + 2025-11-26 301d9f9b52 doc/next: document broken freebsd/riscv64 port + 2025-11-26 de39282332 cmd/compile, runtime: guard X15 zeroing with GOEXPERIMENT=simd + 2025-11-26 86bbea0cfa crypto/fips140: add WithoutEnforcement + 2025-11-26 e2cae9ecdf crypto/x509: add ExtKeyUsage.OID method + 2025-11-26 623ef28135 cmd/go: limit total compile -c backend concurrency using a pool + 2025-11-26 3c6bf6fbf3 cmd/compile: handle loops better during stack allocation of slices + 2025-11-26 efe9ad501d go/types, types2: improve printing of []*operand lists (debugging support) + 2025-11-26 ac3369242d runtime: merge all the linux 32 and 64 bits files into one for each + 2025-11-26 fb5156a098 testing: fix bloop doc + 2025-11-26 b194f5d24a os,internal/syscall/windows: support O_* flags in Root.OpenFile + 2025-11-26 e0a4dffb0c cmd/internal/obj/loong64: add {,x}vmadd series instructions support + 2025-11-26 c0f02c11ff cmd/internal/obj/loong64: add aliases to 32-bit arithmetic instructions + 2025-11-26 37ce4adcd4 cmd/compile: add tests bruteforcing limit complement + 2025-11-26 437d2362ce os,internal/poll: don't call IsNonblock for consoles and Stdin + 2025-11-26 71f8f031b2 crypto/internal/fips140/aes: optimize ctrBlocks8Asm on amd64 + 2025-11-26 03fcb33c0e cmd/compile: add tests bruteforcing limit negation and improve limit addition + 2025-11-26 dda7c8253d cmd/compile,internal/bytealg: add MemEq intrinsic for runtime.memequal + 2025-11-26 4976606a2f cmd/go: remove final references to modfetch.Fetcher_ + 2025-11-26 08bf23cb97 cmd/go/internal/toolchain: remove references to modfetch.Fetcher_ + 2025-11-26 46d5e3ea0e cmd/go/internal/modget: remove references to modfetch.Fetcher_ + 2025-11-26 a3a6c9f62a cmd/go/internal/load: remove references to modfetch.Fetcher_ + 2025-11-26 c1ef3d5881 cmd/go/internal/modcmd: remove references to modfetch.Fetcher_ + 2025-11-26 ab2829ec06 cmd/compile: adjust start heap size + 2025-11-26 54b82e944e internal/trace: support event constructor for testing + 2025-11-25 eb63ef9d66 runtime: panic if cleanup function closes over cleanup pointer + 2025-11-25 06412288cf runtime: panic on AddCleanup with self pointer + 2025-11-25 03f499ec46 cmd/go/internal/modfetch: remove references to Fetcher_ in test file + 2025-11-25 da31fd4177 cmd/go/internal/modload: replace references to modfetch.Fetcher_ + 2025-11-25 07b10e97d6 cmd/go/internal/modcmd: inject modfetch.Fetcher_ into DownloadModule + 2025-11-25 e96094402d cmd/go/internal/modload: inject modfetch.Fetcher_ into commitRequirements + 2025-11-25 47baf48890 cmd/go/internal/modfetch: inject Fetcher_ into TidyGoSum + 2025-11-25 272df5f6ba crypto/internal/fips140/aes/gcm: add more GCM nonce modes + 2025-11-25 1768cb40b8 crypto/tls: add SecP256r1/SecP384r1MLKEM1024 hybrid post-quantum key exchanges + 2025-11-25 a9093067ee cmd/internal/obj/loong64: add {,X}V{ADD,SUB}W{EV,OD}.{H.B,W.H,D.W,Q.D}{,U} instructions support + 2025-11-25 7b904c25a2 cmd/go/internal/modfetch: move global goSum to Fetcher_ + 2025-11-25 e7358c6cf4 cmd/go: remove fips140 dependency on global Fetcher_ + 2025-11-25 89f6dba7e6 internal/strconv: add testbase tests + 2025-11-25 6954be0baa internal/strconv: delete ftoaryu + 2025-11-25 8d6d14f5d6 compress/flate: move big non-pointer arrays to end of compressor + 2025-11-25 4ca048cc32 cmd/internal/obj/riscv: document compressed instructions + 2025-11-25 a572d571fa path: add more examples for path.Clean + 2025-11-25 eec40aae45 maps: use strings.EqualFold in example + 2025-11-25 113eb42efc strconv: replace Ryu ftoa with Dragonbox + 2025-11-25 6e5cfe94b0 crypto: fix dead links and correct SHA-512 algorithm comment + 2025-11-25 2c7c62b972 crypto/internal/fips140/sha512: interleave scheduling with rounds for 10.3% speed-up + 2025-11-25 5b34354bd3 crypto/internal/fips140/sha256: interleave scheduling and rounds for 11.2% speed-up + 2025-11-25 1cc1337f0a internal/runtime/cgroup: allow more tests to run on all OSes + 2025-11-25 6e4a0d8e44 crypto/internal/fips140/bigmod: vector implementation of addMulVVWx on s390x + 2025-11-25 657b331ff5 net/url: fix example of Values.Encode + 2025-11-25 bd9222b525 crypto/sha3: reduce cSHAKE allocations + 2025-11-25 e3088d6eb8 crypto/hpke: expose crypto/internal/hpke + 2025-11-25 a5ebc6b67c crypto/ecdsa: clean up ECDSA parsing and serialization paths + 2025-11-25 e8fdfeb72b reflect: add iterator equivalents for NumField, NumIn, NumOut and NumMethod + 2025-11-25 12d437c09a crypto/x509: sub-quadratic name constraint checking + 2025-11-25 ed4deb157e crypto/x509: cleanup name constraint tests + 2025-11-25 0d2baa808c crypto/rsa: add EncryptOAEPWithOptions + 2025-11-25 09e377b599 internal/poll: replace t.Sub(time.Now()) with time.Until in test + 2025-11-25 4fb7e083a8 crypto/tls: expose HelloRetryRequest state + 2025-11-24 31d373534e doc: pre-announce removal of 1.23 and earlier crypto GODEBUGs + 2025-11-24 aa093eed83 crypto/fips140: add Version + 2025-11-24 1dc1505d4a cmd/go/internal/modfetch: rename State to Fetcher + 2025-11-24 d3e11b3f90 cmd/go/internal/modload: make State.modfetchState a pointer + 2025-11-24 2f7fd5714f cmd/go: add setters for critical State fields + 2025-11-24 6851795fb6 runtime: add GODEBUG=tracebacklabels=1 to include pprof labels in tracebacks + 2025-11-24 0921e1db83 net/http: add Transport.NewClientConn + 2025-11-24 6465818435 all: update to x/net@bff14c52567061031b9761881907c39e24792736 + 2025-11-24 1a53ce9734 context: don't return the wrong error when Cause races cancellation + 2025-11-24 c6f882f6c5 crypto/x509: add ExtKeyUsage.String and KeyUsage.String methods + 2025-11-24 97d5295f6f crypto/internal/fips140test: add ML-DSA coverage + 2025-11-24 62cd044a79 cmd/compile: add cases for StringLen to prove + 2025-11-24 f1e376f342 cmd/go/internal/auth: fix typo + 2025-11-24 7fbd141de5 runtime: use m.profStack in traceStack + 2025-11-24 0bc192368a runtime: don't write unique string to trace if it's length zero + 2025-11-24 d4f5650cc5 all: REVERSE MERGE dev.simd (7d65463) into master Change-Id: I4273ac3987ae2d0bc1df0051d752d8ef6c5e9af5
2025-12-03[dev.simd] simd: make "best" instruction choice also depend on commutativityDavid Chase
the compare-based-on-immediate instructions are sometimes commutative, sometimes not. In this case, that means the instruction cannot be commutative. also improve the comments for comparisons. Change-Id: I83a55fa5ffbd6cbbaf5cb23b3e8a68a5da8aae2f Reviewed-on: https://go-review.googlesource.com/c/go/+/726440 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Austin Clements <austin@google.com>
2025-11-26cmd/compile: introduce alias analysis and automatically free non-aliased ↵thepudds
memory after growslice This CL is part of a set of CLs that attempt to reduce how much work the GC must do. See the design in https://go.dev/design/74299-runtime-freegc This CL updates the compiler to examine append calls to prove whether or not the slice is aliased. If proven unaliased, the compiler automatically inserts a call to a new runtime function introduced with this CL, runtime.growsliceNoAlias, which frees the old backing memory immediately after slice growth is complete and the old storage is logically dead. Two append benchmarks below show promising results, executing up to ~2x faster and up to factor of ~3 memory reduction with this CL. The approach works with multiple append calls for the same slice, including inside loops, and the final slice memory can be escaping, such as in a classic pattern of returning a slice from a function after the slice is built. (The final slice memory is never freed with this CL, though we have other work that tackles that.) An example target for this CL is we automatically free the intermediate memory for the appends in the loop in this function: func f1(input []int) []int { var s []int for _, x := range input { s = append(s, g(x)) // s cannot be aliased here if h(x) { s = append(s, x) // s cannot be aliased here } } return s // slice escapes at end } In this case, the compiler and the runtime collaborate so that the heap allocated backing memory for s is automatically freed after a successful grow. (For the first grow, there is nothing to free, but for the second and subsequent growths, the old heap memory is freed automatically.) The new runtime.growsliceNoAlias is primarily implemented by calling runtime.freegc, which we introduced in CL 673695. The high-level approach here is we step through the IR starting from a slice declaration and look for any operations that either alias the slice or might do so, and treat any IR construct we don't specifically handle as a potential alias (and therefore conservatively fall back to treating the slice as aliased when encountering something not understood). For loops, some additional care is required. We arrange the analysis so that an alias in the body of a loop causes all the appends in that same loop body to be marked aliased, even if the aliasing occurs after the append in the IR: func f2() { var s []int for i := range 10 { s = append(s, i) // aliased due to next line alias = s } } For nested loops, we analyse the nesting appropriately so that for example this append is still proven as non-aliased in the inner loop even though it aliased for the outer loop: func f3() { for range 10 { var s []int for i := range 10 { s = append(s, i) // append using non-aliased slice } alias = s } } A good starting point is the beginning of the test/escape_alias.go file, which starts with ~10 introductory examples with brief comments that attempt to illustrate the high-level approach. For more details, see the new .../internal/escape/alias.go file, especially the (*aliasAnalysis).analyze method. In the first benchmark, an append in a loop builds up a slice from nothing, where the slice elements are each 64 bytes. In the table below, 'count' is the number of appends. With 1 append, there is no opportunity for this CL to free memory. Once there are 2 appends, the growth from 1 element to 2 elements means the compiler-inserted growsliceNoAlias frees the 1-element array, and we see a ~33% reduction in memory use and a small reported speed improvement. As the number of appends increases for example to 5, we are at a ~20% speed improvement and ~45% memory reduction, and so on until we reach ~40% faster and ~50% less memory allocated at the end of the table. There can be variation in the reported numbers based on -randlayout, so this table is for 30 different values of -randlayout with a total n=150. (Even so, there is still some variation, so we probably should not read too much into small changes.) This is with GOAMD64=v3 on a VM that gcc reports is cascadelake. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ Append64Bytes/count=1-4 31.09n ± 2% 31.69n ± 1% +1.95% (n=150) Append64Bytes/count=2-4 73.31n ± 1% 70.27n ± 0% -4.15% (n=150) Append64Bytes/count=3-4 142.7n ± 1% 124.6n ± 1% -12.68% (n=150) Append64Bytes/count=4-4 149.6n ± 1% 127.7n ± 0% -14.64% (n=150) Append64Bytes/count=5-4 277.1n ± 1% 213.6n ± 0% -22.90% (n=150) Append64Bytes/count=6-4 280.7n ± 1% 216.5n ± 1% -22.87% (n=150) Append64Bytes/count=10-4 544.3n ± 1% 386.6n ± 0% -28.97% (n=150) Append64Bytes/count=20-4 1058.5n ± 1% 715.6n ± 1% -32.39% (n=150) Append64Bytes/count=50-4 2.121µ ± 1% 1.404µ ± 1% -33.83% (n=150) Append64Bytes/count=100-4 4.152µ ± 1% 2.736µ ± 1% -34.11% (n=150) Append64Bytes/count=200-4 7.753µ ± 1% 4.882µ ± 1% -37.03% (n=150) Append64Bytes/count=400-4 15.163µ ± 2% 9.273µ ± 1% -38.84% (n=150) geomean 601.8n 455.0n -24.39% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ Append64Bytes/count=1-4 64.00 ± 0% 64.00 ± 0% ~ (n=150) Append64Bytes/count=2-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) Append64Bytes/count=3-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=4-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) Append64Bytes/count=5-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=6-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) Append64Bytes/count=10-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) Append64Bytes/count=20-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) Append64Bytes/count=50-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) Append64Bytes/count=100-4 15.938Ki ± 0% 8.021Ki ± 0% -49.67% (n=150) Append64Bytes/count=200-4 31.94Ki ± 0% 16.08Ki ± 0% -49.64% (n=150) Append64Bytes/count=400-4 63.94Ki ± 0% 32.33Ki ± 0% -49.44% (n=150) geomean 1.991Ki 1.124Ki -43.54% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ Append64Bytes/count=1-4 1.000 ± 0% 1.000 ± 0% ~ (n=150) Append64Bytes/count=2-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) Append64Bytes/count=3-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=4-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) Append64Bytes/count=5-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=6-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) Append64Bytes/count=10-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) Append64Bytes/count=20-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) Append64Bytes/count=50-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) Append64Bytes/count=100-4 8.000 ± 0% 1.000 ± 0% -87.50% (n=150) Append64Bytes/count=200-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) Append64Bytes/count=400-4 10.000 ± 0% 1.000 ± 0% -90.00% (n=150) geomean 4.331 1.000 -76.91% The second benchmark is similar, but instead uses an 8-byte integer for the slice element. The first 4 appends in the loop never call into the runtime thanks to the excellent CL 664299 introduced by Keith in Go 1.25 that allows some <= 32 byte dynamically-sized slices to be on the stack, so this CL is neutral for <= 32 bytes. Once the 5th append occurs at count=5, a grow happens via the runtime and heap allocates as normal, but freegc does not yet have anything to free, so we see a small ~1.4ns penalty reported there. But once the second growth happens, the older heap memory is now automatically freed by freegc, so we start to see some benefit in memory reductions and speed improvements, starting at a tiny speed improvement (close to a wash, or maybe noise) by the second growth before count=10, and building up to ~2x faster with ~68% fewer allocated bytes reported. goos: linux goarch: amd64 pkg: runtime cpu: Intel(R) Xeon(R) CPU @ 2.80GHz │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ sec/op │ sec/op vs base │ AppendInt/count=1-4 2.978n ± 0% 2.969n ± 0% -0.30% (p=0.000 n=150) AppendInt/count=4-4 4.292n ± 3% 4.163n ± 3% ~ (p=0.528 n=150) AppendInt/count=5-4 33.50n ± 0% 34.93n ± 0% +4.25% (p=0.000 n=150) AppendInt/count=10-4 76.21n ± 1% 75.67n ± 0% -0.72% (p=0.000 n=150) AppendInt/count=20-4 150.6n ± 1% 133.0n ± 0% -11.65% (n=150) AppendInt/count=50-4 284.1n ± 1% 225.6n ± 0% -20.59% (n=150) AppendInt/count=100-4 544.2n ± 1% 392.4n ± 1% -27.89% (n=150) AppendInt/count=200-4 1051.5n ± 1% 702.3n ± 0% -33.21% (n=150) AppendInt/count=400-4 2.041µ ± 1% 1.312µ ± 1% -35.70% (n=150) AppendInt/count=1000-4 5.224µ ± 2% 2.851µ ± 1% -45.43% (n=150) AppendInt/count=2000-4 11.770µ ± 1% 6.010µ ± 1% -48.94% (n=150) AppendInt/count=3000-4 17.747µ ± 2% 8.264µ ± 1% -53.44% (n=150) geomean 331.8n 246.4n -25.72% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ B/op │ B/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 192.0 ± 0% 128.0 ± 0% -33.33% (n=150) AppendInt/count=20-4 448.0 ± 0% 256.0 ± 0% -42.86% (n=150) AppendInt/count=50-4 960.0 ± 0% 512.0 ± 0% -46.67% (n=150) AppendInt/count=100-4 1.938Ki ± 0% 1.000Ki ± 0% -48.39% (n=150) AppendInt/count=200-4 3.938Ki ± 0% 2.001Ki ± 0% -49.18% (n=150) AppendInt/count=400-4 7.938Ki ± 0% 4.005Ki ± 0% -49.54% (n=150) AppendInt/count=1000-4 24.56Ki ± 0% 10.05Ki ± 0% -59.07% (n=150) AppendInt/count=2000-4 58.56Ki ± 0% 20.31Ki ± 0% -65.32% (n=150) AppendInt/count=3000-4 85.19Ki ± 0% 27.30Ki ± 0% -67.95% (n=150) geomean ² -42.81% │ old-1bb1f2bf0c │ freegc-8ba7421-ps16 │ │ allocs/op │ allocs/op vs base │ AppendInt/count=1-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=4-4 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=5-4 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=150) AppendInt/count=10-4 2.000 ± 0% 1.000 ± 0% -50.00% (n=150) AppendInt/count=20-4 3.000 ± 0% 1.000 ± 0% -66.67% (n=150) AppendInt/count=50-4 4.000 ± 0% 1.000 ± 0% -75.00% (n=150) AppendInt/count=100-4 5.000 ± 0% 1.000 ± 0% -80.00% (n=150) AppendInt/count=200-4 6.000 ± 0% 1.000 ± 0% -83.33% (n=150) AppendInt/count=400-4 7.000 ± 0% 1.000 ± 0% -85.71% (n=150) AppendInt/count=1000-4 9.000 ± 0% 1.000 ± 0% -88.89% (n=150) AppendInt/count=2000-4 11.000 ± 0% 1.000 ± 0% -90.91% (n=150) AppendInt/count=3000-4 12.000 ± 0% 1.000 ± 0% -91.67% (n=150) geomean ² -72.76% ² Of course, these are just microbenchmarks, but likely indicate there are some opportunities here. The immediately following CL 712422 tackles inlining and is able to get runtime.freegc working automatically with iterators such as used by slices.Collect, which becomes able to automatically free the intermediate memory from its repeated appends (which earlier in this work required a temporary hand edit to the slices package). For now, we only use the NoAlias version for element types without pointers while waiting on additional runtime support in CL 698515. Updates #74299 Change-Id: I1b9d286aa97c170dcc2e203ec0f8ca72d84e8221 Reviewed-on: https://go-review.googlesource.com/c/go/+/710015 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-26cmd/compile,internal/bytealg: add MemEq intrinsic for runtime.memequalAlexander Musman
Introduce a new MemEq SSA operation for runtime.memequal. The operation is initially implemented for arm64. The change adds opt rules (following existing rules for call to runtime.memequal), working with MemEq, and a later op version LoweredMemEq which may be lowered differently for more constant size cases in future (for other targets as well as for arm64). The new MemEq SSA operation does not have memory result, allowing cse of loads operations around it. Code size difference (for arm64 linux): Executable Old .text New .text Change ------------------------------------------------------- asm 1970420 1969668 -0.04% cgo 1741220 1740212 -0.06% compile 8956756 8959428 +0.03% cover 1879332 1878772 -0.03% link 2574116 2572660 -0.06% preprofile 867124 866820 -0.04% vet 2890404 2888596 -0.06% Change-Id: I6ab507929b861884d17d5818cfbd152cf7879751 Reviewed-on: https://go-review.googlesource.com/c/go/+/686655 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-25[dev.simd] simd, cmd/compile: add float -> float conversionsJunyang Shao
This should mark the end of the conversion table, except for float16 which does not exist on Go yet. The rounding logic documentation of float64 -> float32 is based on abi-internal default MXCSR: | RC | 14/13 | 0 (RN) | Round to nearest | Change-Id: I27a86560e8d74d20f21350bf78314b4eada20ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/724440 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-25[dev.simd] simd, cmd/compile: add int -> fp conversionsJunyang Shao
Change-Id: Iadfa2dd982d7156d60fb6977ed9afb7894d6e8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/724321 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-25[dev.simd] simd, cmd/compile: add float -> int conversionsJunyang Shao
This CL also fixed some documentation errors in existing APIs. Go defaults MXCSR to mask exceptions, the documentation is based on this fact. Change-Id: I745083b82b4bef93126a4b4e41f8698956963704 Reviewed-on: https://go-review.googlesource.com/c/go/+/724320 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-24[dev.simd] all: merge master (02d1f3a) into dev.simdCherry Mui
Merge List: + 2025-11-24 02d1f3a06b runtime: respect GOTRACEBACK for user-triggered runtime panics + 2025-11-24 a593ca9d65 runtime/cgo: add support for `any` param and return type + 2025-11-24 89552911b3 cmd/compile, internal/buildcfg: enable regABI on s390x, and add s390x + 2025-11-24 2fe0ba8d52 internal/bytealg: port bytealg functions to reg ABI on s390x + 2025-11-24 4529c8fba6 runtime: port memmove, memclr to register ABI on s390x + 2025-11-24 58a48a3e3b internal/runtime/syscall: Syscall changes for s390x regabi + 2025-11-24 2a185fae7e reflect, runtime: add reflect support for regabi on s390x + 2025-11-24 e92d2964fa runtime: mark race functions on s390x as ABIInternal + 2025-11-24 41af98eb83 runtime: add runtime changes for register ABI on s390x + 2025-11-24 85e6080089 cmd/internal/obj: set morestack arg spilling and regabi prologue on s390x + 2025-11-24 24697419c5 cmd/compile: update s390x CALL* ops + 2025-11-24 81242d034c cmd/compile/internal/s390x: add initial spill support + 2025-11-24 73b6aa0fec cmd/compile/internal: add register ABI information for s390x + 2025-11-24 1036f6f485 internal/abi: define s390x ABI constants + 2025-11-24 2e5d12a277 cmd/compile: document register-based ABI for s390x Change-Id: I57b4ae6f9b65d99958b9fe5974205770e18f7788
2025-11-24cmd/compile: update s390x CALL* opsSrinivas Pokala
This CL allow the CALL ops to take variable no of arguments. Update #40724 Change-Id: Ibfa2e98c5051684cae69200c396dfa1edb2878e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/719464 Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-24cmd/compile/internal: add register ABI information for s390xSrinivas Pokala
Update #40724 Change-Id: If8f2574259560b097db29347b2aecb098acef863 Reviewed-on: https://go-review.googlesource.com/c/go/+/719462 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Vishwanatha HD <vishwanatha.hd@ibm.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-24[dev.simd] all: merge master (8dd5b13) into dev.simdCherry Mui
Merge List: + 2025-11-24 8dd5b13abc cmd/compile: relax stmtline_test on amd64 + 2025-11-23 feae743bdb cmd/compile: use 32x32->64 multiplies on loong64 + 2025-11-23 e88be8a128 runtime: fix stale comment for mheap/malloc + 2025-11-23 a318843a2a cmd/internal/obj/loong64: optimize duplicate optab entries + 2025-11-23 a18294bb6a cmd/internal/obj/arm64, image/gif, runtime, sort: use math/bits to calculate log2 + 2025-11-23 437323ef7b slices: fix incorrect comment in slices.Insert function documentation + 2025-11-23 1993dca400 doc/next: pre-announce end of support for macOS 12 in Go 1.27 + 2025-11-22 337f7b1f5d cmd/go: update default go directive in mod or work init + 2025-11-21 3c26aef8fb cmd/internal/obj/riscv: improve large branch/call/jump tests + 2025-11-21 31aa9f800b crypto/tls: use inner hello for earlyData when using QUIC and ECH + 2025-11-21 d68aec8db1 runtime: replace trace seqlock with write flag + 2025-11-21 8d9906cd34 runtime/trace: add Log benchmark + 2025-11-21 6aeacdff38 cmd/go: support sha1 repos when git default is sha256 + 2025-11-21 9570036ca5 crypto/sha3: make the zero value of SHAKE useable + 2025-11-21 155efbbeeb crypto/sha3: make the zero value of SHA3 useable + 2025-11-21 6f16669e34 database/sql: don't ignore ColumnConverter for unknown input count + 2025-11-21 121bc3e464 runtime/pprof: remove hard-coded sleep in CPU profile reader + 2025-11-21 b604148c4e runtime: fix double wakeup in CPU profile buffer + 2025-11-21 22f24f90b5 cmd/compile: change testing.B.Loop keep alive semantic + 2025-11-21 cfb9d2eb73 net: remove unused linknames + 2025-11-21 65ef314f89 net/http: remove unused linknames + 2025-11-21 0f32fbc631 net/http: populate Response.Request when using NewFileTransport + 2025-11-21 3e0a8e7867 net/http: preserve original path encoding in redirects + 2025-11-21 831af61120 net/http: use HTTP 307 redirects in ServeMux + 2025-11-21 87269224cb net/http: update Response.Request.URL after redirects on GOOS=js + 2025-11-21 7aa9ca729f net/http/cookiejar: treat localhost as secure origin + 2025-11-21 f870a1d398 net/url: warn that JoinPath arguments should be escaped + 2025-11-21 9962d95fed crypto/internal/fips140/mldsa: unroll NTT and inverseNTT + 2025-11-21 f821fc46c5 crypto/internal/fisp140test: update acvptool, test data + 2025-11-21 b59efc38a0 crypto/internal/fips140/mldsa: new package + 2025-11-21 62741480b8 runtime: remove linkname for gopanic + 2025-11-21 7db2f0bb9a crypto/internal/hpke: separate KEM and PublicKey/PrivateKey interfaces + 2025-11-21 e15800c0ec crypto/internal/hpke: add ML-KEM and hybrid KEMs, and SHAKE KDFs + 2025-11-21 7c985a2df4 crypto/internal/hpke: modularize API and support more ciphersuites + 2025-11-21 e7d47ac33d cmd/compile: simplify negative on multiplication + 2025-11-21 35d2712b32 net/http: fix typo in Transport docs + 2025-11-21 90c970cd0f net: remove unnecessary loop variable copies in tests + 2025-11-21 9772d3a690 cmd/cgo: strip top-level const qualifier from argument frame struct + 2025-11-21 1903782ade errors: add examples for custom Is/As matching + 2025-11-21 ec92bc6d63 cmd/compile: rewrite Rsh to RshU if arguments are proved positive + 2025-11-21 3820f94c1d cmd/compile: propagate unsigned relations for Rsh if arguments are positive + 2025-11-21 d474f1fd21 cmd/compile: make dse track multiple shadowed ranges + 2025-11-21 d0d0a72980 cmd/compile/internal/ssa: correct type of ARM64 conditional instructions + 2025-11-21 a9704f89ea internal/runtime/gc/scan: add AVX512 impl of filterNil. + 2025-11-21 ccd389036a cmd/internal/objabi: remove -V=goexperiment internal special case + 2025-11-21 e7787b9eca runtime: go fmt + 2025-11-21 17b3b98796 internal/strconv: go fmt + 2025-11-21 c851827c68 internal/trace: go fmt + 2025-11-21 f87aaec53d cmd/compile: fix integer overflow in prove pass + 2025-11-21 dbd2ab9992 cmd/compile/internal: fix typos + 2025-11-21 b9d86baae3 cmd/compile/internal/devirtualize: fix typos + 2025-11-20 4b0e3cc1d6 cmd/link: support loading R_LARCH_PCREL20_S2 and R_LARCH_CALL36 relocs + 2025-11-20 cdba82c7d6 cmd/internal/obj/loong64: add {,X}VSLT.{B/H/W/V}{,U} instructions support + 2025-11-20 bd2b117c2c crypto/tls: add QUICErrorEvent + 2025-11-20 3ad2e113fc net/http/httputil: wrap ReverseProxy's outbound request body so Close is a noop + 2025-11-20 d58b733646 runtime: track goroutine location until actual STW + 2025-11-20 1bc54868d4 cmd/vendor: update to x/tools@68724af + 2025-11-20 8c3195973b runtime: disable stack allocation tests on sanitizers + 2025-11-20 ff654ea100 net/url: permit colons in the host of postgresql:// URLs + 2025-11-20 a662badab9 encoding/json: remove linknames + 2025-11-20 5afe237d65 mime: add missing path for mime types in godoc + 2025-11-20 c1b7112af8 os/signal: make NotifyContext cancel the context with a cause Change-Id: Ib93ef643be610dfbdd83ff45095a7b1ca2537b8b
2025-11-23cmd/compile: use 32x32->64 multiplies on loong64Xiaolin Zhao
Gets rid of some sign extensions, like arm64. Change-Id: I9fc37e15a82718bfcf53db8cab0c4e7baaa0a747 Reviewed-on: https://go-review.googlesource.com/c/go/+/721522 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-21cmd/compile: simplify negative on multiplicationMeng Zhuo
goos: linux goarch: amd64 pkg: cmd/compile/internal/test cpu: AMD EPYC 7532 32-Core Processor │ simplify_base │ simplify_new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 623.0n ± 0% 319.3n ± 1% -48.75% (p=0.000 n=10) goos: linux goarch: riscv64 pkg: cmd/compile/internal/test cpu: Spacemit(R) X60 │ simplify.base │ simplify.new │ │ sec/op │ sec/op vs base │ SimplifyNegMul 10.928µ ± 0% 6.432µ ± 0% -41.14% (p=0.000 n=10) Change-Id: I1d9393cd19a0b948a5d3a512d627cdc0cf0b38be Reviewed-on: https://go-review.googlesource.com/c/go/+/721520 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-11-21cmd/compile/internal/ssa: correct type of ARM64 conditional instructionsCh1n-ch1nless
The CCMP, CCMN, CCMPconst, and related instructions in ARM64Ops.go were incorrectly set to type "Flag". This non-existent type caused compilation failures during the "lower" and "late lower" passes. Change them to the correct type, "Flags". Change-Id: I4fbf96b8c7b051be901711948028a717ce953e5e Reviewed-on: https://go-review.googlesource.com/c/go/+/722780 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-11-21[dev.simd] cmd/compile, simd: update conversion API namesJunyang Shao
This CL is to address some API audit discussion decisions. Change-Id: Iaa206832c41852fec8fa25c23da12f65df736098 Reviewed-on: https://go-review.googlesource.com/c/go/+/721780 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-21[dev.simd] cmd/compile: fix incorrect mapping of SHA256MSG2128Neal Patel
Change-Id: Iff00fdb5cfc83c546ad564fa7618ec34d0352fdc Reviewed-on: https://go-review.googlesource.com/c/go/+/722640 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-20[dev.simd] simd, cmd/compile: add more element types for Select128FromPairDavid Chase
Also includes a comment cleanup pass. Fixed NAME processing for additional documentation. Change-Id: Ide5b60c17ddbf3c6eafd20147981c59493fc8133 Reviewed-on: https://go-review.googlesource.com/c/go/+/722180 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-20[dev.simd] simd: fix signatures for PermuteConstant* methodsDavid Chase
This moves the packed-immediate methods to package-private, and adds exported versions with four parameters. Rename PermuteConstant to PermuteScalars Rename VPSHUFB Permute to PermuteOrZero Rename Permute2 to ConcatPermute Comments were repaired/enhanced. Modified the generator to support an additional tag "hideMaskMethods : true" to suppress method, intrinsic, generic, and generic translation generation for said mask-modified versions of such methods (this is already true for exported methods). Change-Id: I91e208c1fff1f28ebce4edb4e73d26003715018c Reviewed-on: https://go-review.googlesource.com/c/go/+/721342 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-20[dev.simd] all: merge master (ca37d24) into dev.simdCherry Mui
Conflicts: - src/cmd/compile/internal/typecheck/builtin.go Merge List: + 2025-11-20 ca37d24e0b net/http: drop unused "broken" field from persistConn + 2025-11-20 4b740af56a cmd/internal/obj/x86: handle global reference in From3 in dynlink mode + 2025-11-20 790384c6c2 spec: adjust rule for type parameter on RHS of alias declaration + 2025-11-20 a49b0302d0 net/http: correctly close fake net.Conns + 2025-11-20 32f5aadd2f cmd/compile: stack allocate backing stores during append + 2025-11-20 a18aff8057 runtime: select GC mark workers during start-the-world + 2025-11-20 829779f4fe runtime: split findRunnableGCWorker in two + 2025-11-20 ab59569099 go/version: use "custom" as an example of a version suffix + 2025-11-19 c4bb9653ba cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64 + 2025-11-19 7f2ae21fb4 cmd/internal/obj/loong64: add MULW.D.W[U] instructions + 2025-11-19 a2946f2385 crypto: add Encapsulator and Decapsulator interfaces + 2025-11-19 6b83bd7146 crypto/ecdh: add KeyExchanger interface + 2025-11-19 4fef9f8b55 go/types, types2: fix object path for grouped declaration statements + 2025-11-19 33529db142 spec: escape double-ampersands + 2025-11-19 dc42565a20 cmd/compile: fix control flow for unsigned divisions proof relations + 2025-11-19 e64023dcbf cmd/compile: cleanup useless if statement in prove + 2025-11-19 2239520d1c test: go fmt prove.go tests + 2025-11-19 489d3dafb7 math: switch s390x math.Pow to generic implementation + 2025-11-18 8c41a482f9 runtime: add dlog.hexdump + 2025-11-18 e912618bd2 runtime: add hexdumper + 2025-11-18 2cf9d4b62f Revert "net/http: do not discard body content when closing it within request handlers" + 2025-11-18 4d0658bb08 cmd/compile: prefer fixed registers for values + 2025-11-18 ba634ca5c7 cmd/compile: fold boolean NOT into branches + 2025-11-18 8806d53c10 cmd/link: align sections, not symbols after DWARF compress + 2025-11-18 c93766007d runtime: do not print recovered when double panic with the same value + 2025-11-18 9859b43643 cmd/asm,cmd/compile,cmd/internal/obj/riscv: use compressed instructions on riscv64 + 2025-11-17 b9ef0633f6 cmd/internal/sys,internal/goarch,runtime: enable the use of compressed instructions on riscv64 + 2025-11-17 a087dea869 debug/elf: sync new loong64 relocation types up to LoongArch ELF psABI v20250521 + 2025-11-17 e1a12c781f cmd/compile: use 32x32->64 multiplies on arm64 + 2025-11-17 6caab99026 runtime: relax TestMemoryLimit on darwin a bit more + 2025-11-17 eda2e8c683 runtime: clear frame pointer at thread entry points + 2025-11-17 6919858338 runtime: rename findrunnable references to findRunnable + 2025-11-17 8e734ec954 go/ast: fix BasicLit.End position for raw strings containing \r + 2025-11-17 592775ec7d crypto/mlkem: avoid a few unnecessary inverse NTT calls + 2025-11-17 590cf18daf crypto/mlkem/mlkemtest: add derandomized Encapsulate768/1024 + 2025-11-17 c12c337099 cmd/compile: teach prove about subtract idioms + 2025-11-17 bc15963813 cmd/compile: clean up prove pass + 2025-11-17 1297fae708 go/token: add (*File).End method + 2025-11-17 65c09eafdf runtime: hoist invariant code out of heapBitsSmallForAddrInline + 2025-11-17 594129b80c internal/runtime/maps: update doc for table.Clear + 2025-11-15 c58d075e9a crypto/rsa: deprecate PKCS#1 v1.5 encryption + 2025-11-14 d55ecea9e5 runtime: usleep before stealing runnext only if not in syscall + 2025-11-14 410ef44f00 cmd: update x/tools to 59ff18c + 2025-11-14 50128a2154 runtime: support runtime.freegc in size-specialized mallocs for noscan objects + 2025-11-14 c3708350a4 cmd/go: tests: rename git-min-vers->git-sha256 + 2025-11-14 aea881230d std: fix printf("%q", int) mistakes + 2025-11-14 120f1874ef runtime: add more precise test of assist credit handling for runtime.freegc + 2025-11-14 fecfcaa4f6 runtime: add runtime.freegc to reduce GC work + 2025-11-14 5a347b775e runtime: set GOEXPERIMENT=runtimefreegc to disabled by default + 2025-11-14 1a03d0db3f runtime: skip tests for GOEXPERIMENT=arenas that do not handle clobberfree=1 + 2025-11-14 cb0d9980f5 net/http: do not discard body content when closing it within request handlers + 2025-11-14 03ed43988f cmd/compile: allow multi-field structs to be stored directly in interfaces + 2025-11-14 1bb1f2bf0c runtime: put AddCleanup cleanup arguments in their own allocation + 2025-11-14 9fd2e44439 runtime: add AddCleanup benchmark + 2025-11-14 80c91eedbb runtime: ensure weak handles end up in their own allocation + 2025-11-14 7a8d0b5d53 runtime: add debug mode to extend _Grunning-without-P windows + 2025-11-14 710abf74da internal/runtime/cgobench: add Go function call benchmark for comparison + 2025-11-14 b24aec598b doc, cmd/internal/obj/riscv: document the riscv64 assembler + 2025-11-14 a0e738c657 cmd/compile/internal: remove incorrect riscv64 SLTI rule + 2025-11-14 2cdcc4150b cmd/compile: fold negation into multiplication + 2025-11-14 b57962b7c7 bytes: fix panic in bytes.Buffer.Peek + 2025-11-14 0a569528ea cmd/compile: optimize comparisons with single bit difference + 2025-11-14 1e5e6663e9 cmd/compile: remove unnecessary casts and types from riscv64 rules + 2025-11-14 ddd8558e61 go/types, types2: swap object.color for Checker.objPathIdx + 2025-11-14 9daaab305c cmd/link/internal/ld: make runtime.buildVersion with experiments valid + 2025-11-13 d50a571ddf test: fix tests to work with sizespecializedmalloc turned off + 2025-11-13 704f841eab cmd/trace: annotation proc start/stop with thread and proc always + 2025-11-13 17a02b9106 net/http: remove unused isLitOrSingle and isNotToken + 2025-11-13 ff61991aed cmd/go: fix flaky TestScript/mod_get_direct + 2025-11-13 129d0cb543 net/http/cgi: accept INCLUDED as protocol for server side includes + 2025-11-13 77c5130100 go/types: minor simplification + 2025-11-13 7601cd3880 go/types: generate cycles.go + 2025-11-13 7a372affd9 go/types, types2: rename definedType to declaredType and clarify docs Change-Id: Ibaa9bdb982364892f80e511c1bb12661fcd5fb86
2025-11-20cmd/compile: stack allocate backing stores during appendkhr@golang.org
We can already stack allocate the backing store during append if the resulting backing store doesn't escape. See CL 664299. This CL enables us to often stack allocate the backing store during append *even if* the result escapes. Typically, for code like: func f(n int) []int { var r []int for i := range n { r = append(r, i) } return r } the backing store for r escapes, but only by returning it. Could we operate with r on the stack for most of its lifeime, and only move it to the heap at the return point? The current implementation of append will need to do an allocation each time it calls growslice. This will happen on the 1st, 2nd, 4th, 8th, etc. append calls. The allocations done by all but the last growslice call will then immediately be garbage. We'd like to avoid doing some of those intermediate allocations if possible. We rewrite the above code by introducing a move2heap operation: func f(n int) []int { var r []int for i := range n { r = append(r, i) } r = move2heap(r) return r } Using the move2heap runtime function, which does: move2heap(r): If r is already backed by heap storage, return r. Otherwise, copy r to the heap and return the copy. Now we can treat the backing store of r allocated at the append site as not escaping. Previous stack allocation optimizations now apply, which can use a fixed-size stack-allocated backing store for r when appending. See the description in cmd/compile/internal/slice/slice.go for how we ensure that this optimization is safe. Change-Id: I81f36e58bade2241d07f67967d8d547fff5302b8 Reviewed-on: https://go-review.googlesource.com/c/go/+/707755 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-19cmd/compile: Implement LoweredZeroLoop with LSX Instruction on loong64Guoqi Chen
goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A6000 @ 2500.00MHz | old.txt | new.txt | | sec/op | sec/op vs base | ClearFat256 6.406n ± 0% 3.329n ± 1% -48.03% (p=0.000 n=10) ClearFat512 12.810n ± 0% 7.607n ± 0% -40.62% (p=0.000 n=10) ClearFat1024 25.62n ± 0% 14.01n ± 0% -45.32% (p=0.000 n=10) ClearFat1032 26.02n ± 0% 14.28n ± 0% -45.14% (p=0.000 n=10) ClearFat1040 26.02n ± 0% 14.41n ± 0% -44.62% (p=0.000 n=10) MemclrKnownSize192 4.804n ± 0% 2.827n ± 0% -41.15% (p=0.000 n=10) MemclrKnownSize248 6.561n ± 0% 4.371n ± 0% -33.38% (p=0.000 n=10) MemclrKnownSize256 6.406n ± 0% 3.335n ± 0% -47.94% (p=0.000 n=10) geomean 11.41n 6.453n -43.45% goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3C5000 @ 2200.00MHz | old.txt | new.txt | | sec/op | sec/op vs base | ClearFat256 14.570n ± 0% 7.284n ± 0% -50.01% (p=0.000 n=10) ClearFat512 29.13n ± 0% 14.57n ± 0% -49.98% (p=0.000 n=10) ClearFat1024 58.26n ± 0% 29.15n ± 0% -49.97% (p=0.000 n=10) ClearFat1032 58.73n ± 0% 29.15n ± 0% -50.36% (p=0.000 n=10) ClearFat1040 59.18n ± 0% 29.26n ± 0% -50.56% (p=0.000 n=10) MemclrKnownSize192 10.930n ± 0% 5.466n ± 0% -49.99% (p=0.000 n=10) MemclrKnownSize248 14.110n ± 0% 6.772n ± 0% -52.01% (p=0.000 n=10) MemclrKnownSize256 14.570n ± 0% 7.285n ± 0% -50.00% (p=0.000 n=10) geomean 25.75n 12.78n -50.36% Change-Id: I88d7b6ae2f6fc3f095979f24fb83ff42a9d2d42e Reviewed-on: https://go-review.googlesource.com/c/go/+/720940 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Mark Freeman <markfreeman@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>
2025-11-18cmd/compile: fold boolean NOT into branchesKeith Randall
Gets rid of an EOR $1 instruction. Change-Id: Ib032b0cee9ac484329c978af9b1305446f8d5dac Reviewed-on: https://go-review.googlesource.com/c/go/+/721501 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-18[dev.simd] cmd/compile, simd: change DotProductQuadruple and add peepholesJunyang Shao
This CL addressed some API change decisions in the API audit. Instead of exposing the Intel format, we hide the add part of the instructions under the peephole, and rename the API as DotProdQuadruple Change-Id: I471c0a755174bc15dd83bdc0f757d6356b92d835 Reviewed-on: https://go-review.googlesource.com/c/go/+/721420 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-18[dev.simd] cmd/compile, simd: change SHA ops names and typesJunyang Shao
This CL addressed some naming changes decided in API audit. Before After SHA1Msg1 SHA1Message1, Remove signed SHA1Msg2 SHA1Message2, Remove signed SHA1NextE SHA1NextE, Remove signed SHA1Round4 SHA1FourRounds, Remove signed SHA256Msg1 SHA256Message1, Remove signed SHA256Msg2 SHA256Message2, Remove signed SHA256Rounds2 SHA256TwoRounds, Remove signed Change-Id: If2cead113f37a9044bc5c65e78fa9d124e318005 Reviewed-on: https://go-review.googlesource.com/c/go/+/721003 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-17cmd/compile: use 32x32->64 multiplies on arm64Keith Randall
Gets rid of some sign extensions. Change-Id: Ie67ef36b4ca1cd1a2cd9fa5d84578db553578a22 Reviewed-on: https://go-review.googlesource.com/c/go/+/721241 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com>
2025-11-17[dev.simd] cmd/compile, simd: change AES op names and add missing sizeJunyang Shao
This CL changed AESEncryptRound and AESDecryptRound to AESEncryptOneRound and AESDecryptOneRound. This CL also adds the 512-bit version of some AES instructions. Change-Id: Ia851a008cce2145b1ff193a89e172862060a725d Reviewed-on: https://go-review.googlesource.com/c/go/+/721280 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-11-17[dev.simd] cmd/compile, simd: add VPALIGNRJunyang Shao
This CL named VPALIGNR ConcatShiftBytes[Grouped]. Change-Id: I46c6703085efb0613deefa512de9911b4fdf6bc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/714440 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>