aboutsummaryrefslogtreecommitdiff
path: root/src/simd/archsimd/_gen
AgeCommit message (Collapse)Author
2026-01-28simd/archsimd: add missing cpufeature to generated mask/merge methodsDavid Chase
Change-Id: I34678f4ef17fe1b8b7657a2c3d39685b4a5951f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/739981 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
2026-01-28cmd/compile, simd: capture VAES instructions and fix AVX512VAES featureJunyang Shao
The code previously filters out VAES-only instructions, this CL added them back. This CL added the VAES feature check following the Intel xed data: XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28 This CL also found out that the old AVX512VAES feature check is not checking the correct bits, it also fixes it: XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31 XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 It restricts to the most strict common set - includes avx512vl for even 512-bits although it doesn't requires it. Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a Reviewed-on: https://go-review.googlesource.com/c/go/+/738500 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-27simd/archsimd: fix typo in the SHA256Message1 documentation stringNeal Patel
Change-Id: I8bc5fec0475bfaebc0469d0efb2ba89af4b3f150 Reviewed-on: https://go-review.googlesource.com/c/go/+/738640 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-22simd/archsimd/_gen/simdgen: fix typos in error messagesjjpinto
simdgen: fix typos in error messages Change-Id: I921eea63c4847b2af43a1d5a1ea075e86f58aa77 GitHub-Last-Rev: 8c9dae51fd906aee04f52a5d44c6d4c923fc52d0 GitHub-Pull-Request: golang/go#77012 Reviewed-on: https://go-review.googlesource.com/c/go/+/732880 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
2026-01-13simd/archsimd: 128- and 256-bit FMA operations do not require AVX-512Austin Clements
Currently, all FMA operations are marked as requiring AVX512, even on smaller vector widths. This is happening because the narrower FMA operations are marked as extension "FMA" in the XED. Since this extension doesn't start with "AVX", we filter them out very early in the XED process. However, this is just a quirk of naming: the FMA feature depends on the AVX feature, so it is part of AVX, even if it doesn't say so on the tin. Fix this by accepting the FMA extension and adding FMA to the table of CPU features. We also tweak internal/cpu slightly do it correctly enforces that the logical FMA feature depends on both the FMA and AVX CPUID flags. This actually *deletes* a lot of generated code because we no longer need the AVX-512 encoding of these 128- and 256-bit operations. Change-Id: I744a18d0be888f536ac034fe88b110347622be7e Reviewed-on: https://go-review.googlesource.com/c/go/+/736160 Auto-Submit: Austin Clements <austin@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/736201 Reviewed-by: Austin Clements <austin@google.com>
2026-01-13simd/archsimd/_gen/simdgen: feature implicationsAustin Clements
This simplifies our handling of XED features, adds a table of which features imply which other features, and adds this information to the documentation of the CPU features APIs. As part of this we fix an issue around the "AVXAES" feature. AVXAES is defined as the combination of the AVX and AES CPUID flags. Several other features also work like this, but have hand-written logic in internal/cpu to compute logical feature flags from the underlying CPUID bits. For these, we expose a single feature check function from the SIMD API. AVXAES currently doesn't work like this: it requires the user to check both features. However, this forces the SIMD API to expose an "AES" feature check, which really has nothing to do with SIMD. To make this consistent, we introduce an AVXAES feature check function and use it in feature requirement docs. Unlike the others combo features, this is implemented in the simd package, but the difference is invisible to the user. Change-Id: I2985ebd361f0ecd45fd428903efe4c981a5ec65d Reviewed-on: https://go-review.googlesource.com/c/go/+/736100 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/736200 Reviewed-by: Austin Clements <austin@google.com>
2026-01-13all: fix misspellings in commentscuishuang
Change-Id: I121847e7f68c602dd8e9ecddfc41b547f8a86f10 Reviewed-on: https://go-review.googlesource.com/c/go/+/734361 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-01-08simd/archsimd: rename Broadcast methodsCherry Mui
Currently the Broadcast128/256/512 methods broadcast the lowest element of the input vector to a vector of the corresponding width. There are also variations of broadcast operations that broadcast the whole (128- or 256-bit) vector to a larger vector, which we don't yet support. Our current naming is unclear which version it is, though. Rename the current ones to Broadcast1ToN, to be clear that they broadcast one element. The vector version probably will be named BoradcastAllToN (not included in this CL). Change-Id: I47a21e367f948ec0b578d63706a40d20f5a9f46d Reviewed-on: https://go-review.googlesource.com/c/go/+/734840 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-01-05simd/archsimd: use V(P)MOVMSK for mask ToBits if possibleCherry Mui
VPMOVMSKB, VMOVMSKPS, and VMOVMSKPD moves AVX1/2-style masks to integer registers, similar to VPMOV[BWDQ]2M (which moves to mask registers). The former is available on AVX1/2, the latter requires AVX512. So use the former if it is supported, i.e. for 128- and 256-bit vectors with 8-, 32-, and 64-bit elements (16-bit elements always require AVX512). Change-Id: I972195116617ed2faaf95cee5cd6b250e671496c Reviewed-on: https://go-review.googlesource.com/c/go/+/734060 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com>
2026-01-02simd/archsimd: add tests for IsNaNCherry Mui
Change-Id: I374ce84fd21c41a04e2d5964d8aa872545c6a8a7 Reviewed-on: https://go-review.googlesource.com/c/go/+/733661 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2026-01-02simd/archsimd: make IsNaN unaryCherry Mui
Currently, the IsNan API is defined as x.IsNan(y), which returns a mask to represent, for each element, either x or y is NaN. Albeit closer to the machine instruction, this is weird API, as IsNaN is a unary operation. This CL changes it to unary, x.IsNaN(). It compiles to VCMPPS $3, x, x (or VCMPPD). For the two-operand version, we can optimize x.IsNaN().Or(y.IsNaN()) to VCMPPS $3, x, y (not done in this CL). While here, change the name to IsNaN (uppercase both Ns), which matches math.IsNaN. Tests in the next CL. Change-Id: Ib6e7afc2635e6c3c606db5ea16420ee673a6c6d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/733660 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2026-01-02simd/archsimd: correct documentation of Mask typesCherry Mui
The documentation of Mask types currently describe vector types, not masks. Correct them. Change-Id: Ib2723310842c6d10cfdd772c7abb8d4c1e63b130 Reviewed-on: https://go-review.googlesource.com/c/go/+/733342 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-30simd/archsimd: adjust documentations slightlyCherry Mui
- Reword the documentation of Scale to mention parameter names. - Correct the parameter name in Merge. - Use proper a/an articles in some documentation. - Add punctuations. - Format code blocks for long expressions. Change-Id: I8a31721503c1b155862255619a835895f3d5123a Reviewed-on: https://go-review.googlesource.com/c/go/+/731560 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-29simd/archsimd: add tests for ExtendLo operationsCherry Mui
Change-Id: I77a5f0dc58e068882a177dc32d162821b38f34ef Reviewed-on: https://go-review.googlesource.com/c/go/+/733101 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-29simd/archsimd: remove redundant suffix of ExtendLo operationsCherry Mui
For methods like ExtendLo2ToInt64x2, the last "x2" is redundant, as it is already mentioned in "Lo2". Remove it, so it is just ExtendLo2ToInt64. Change-Id: I490afd818c40bb7a4ef15c249723895735bd6488 Reviewed-on: https://go-review.googlesource.com/c/go/+/733100 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-29simd/archsimd: add more tests for Truncate operationsCherry Mui
Now include operations with input and output with different lengths. Change-Id: I5c9759e31ffae2d621a13f9cb3f5dd64e87a1c44 Reviewed-on: https://go-review.googlesource.com/c/go/+/732920 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-27simd/archsimd: add more tests for Extend operationsCherry Mui
The operations that extend only low elements, ExtendLoNTo..., are not yet included. Change-Id: I93168889b92c56720344b443c1cff238f8cc096a Reviewed-on: https://go-review.googlesource.com/c/go/+/732661 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-24simd/archsimd: fix "go generate" commandCherry Mui
Correct the generate command for test helpers. There is no longer a genfiles.go. Also correct the generated file headers to match the current generator layout. Change-Id: Ifb9a8c394477359020ff44290dbaabe7a2d59aca Reviewed-on: https://go-review.googlesource.com/c/go/+/732280 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: David Chase <drchase@google.com>
2025-12-24simd/archsimd: guard test helpers with amd64 tagCherry Mui
The test helpers load vectors. Currently the load functions are only available on AMD64, so guard them with the tag. Now GOEXPERIMENT=simd go test simd/... doesn't fail on a non-AMD64 machine. Change-Id: Ie75f1fbb3b91629bc477b3140630bc47a4ef5b63 Reviewed-on: https://go-review.googlesource.com/c/go/+/732380 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-22simd/archsimd: correct documentation for pairwise operationsCherry Mui
For Add/SubPairs(Saturated?), the documented result element order is wrong. Corrected. Also, for 256-bit vectors, this is a grouped operation. So name it with the Grouped suffix to be clear. Change-Id: Idfd0975cb4a332b2e28c898613861205d26f75b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/732020 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-19simd/archsimd: delete DotProductQuadruple methods for nowCherry Mui
The DotProductQuadruple methods are currently defined on Int8 vectors. There are some problems for that. 1. We defined a DotProductQuadrupleSaturated method, but the dot product part does not need saturation, as it cannot overflow. It is the addition part of VPDPBUSDS that does the saturation. Currently we have optimization rules like x.DotProductQuadrupleSaturated(y).Add(z) -> VPDPBUSDS which is incorrect, in that the dot product doesn't do (or need) saturation, and the Add is a regular Add, but we rewrite it to a saturated add. The correct rule should be something like x.DotProductQuadruple(y).AddSaturated(z) -> VPDPBUSDS 2. There are multiple flavors of DotProductQuadruple: signed/unsigned × signed/unsigned, which cannot be completely disambiguated by the type. The current naming may preclude adding all the flavors. For these reasons, remove the methods for now. We can add them later with the issues addressed. Change-Id: I549c0925afaa68c7e2cc956105619f2c1b46b325 Reviewed-on: https://go-review.googlesource.com/c/go/+/731441 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-19simd/archsimd: add Grouped for 256- and 512-bit SaturateTo(U)Int16Concat, ↵Cherry Mui
and fix type They operate on 128-bit groups, so name them Grouped to be clear, and consistent with other grouped operations. Reword the documentation, mention the grouping only for grouped versions. Also, SaturateToUnt16Concat(Grouped) is a signed int32 to unsigned uint16 saturated conversion. The receiver and the parameter should be signed. The result remains unsigned. Change-Id: I30e28bc05e07f5c28214c9c6d9d201cbbb183468 Reviewed-on: https://go-review.googlesource.com/c/go/+/731501 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-19simd/archsimd: correct type and instruction for SaturateToUint8Cherry Mui
It should be defined on unsigned types, not signed types, and use unsigned conversion instructions. Change-Id: I49694ccdf1d331cfde88591531c358d9886e83e6 Reviewed-on: https://go-review.googlesource.com/c/go/+/731500 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-19simd/archsimd: reword documentation for some operationsCherry Mui
- Min/Max: make it clear it is elementwise. - RoundToEven: clarify it is rounding tie to even. - MulEvenWiden: use mathematical form of the index. - CopySign: use parameter names directly. - ConcatShiftBytesRight: rename the parameter. Change-Id: I4cf0773c4daf3e3bf7b26e79d84ac5c2a9145c88 Reviewed-on: https://go-review.googlesource.com/c/go/+/731421 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-19simd/archsimd: reword documentation of comparison operationsCherry Mui
The wording for the emulated ones are more precise. Use that everywhere. Change-Id: Iab64e0bb1fb6b19178ebf30ba8e82360b5882fd3 Reviewed-on: https://go-review.googlesource.com/c/go/+/731420 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-18simd/archsimd: reword documentation for conversion opsCherry Mui
Use more compact wording for extension, truncation, and saturation. Say that we pack the results to the low elements and zero the high elements if and only if the result has more elements. Change-Id: Iae98d3c6ea6b5b5fa0acd548471e8d6c70a26d2d Reviewed-on: https://go-review.googlesource.com/c/go/+/730940 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-17simd/archsimd: reword documentation of shfit operationsCherry Mui
Say the name of the parameter, instead of "the immediate". Don't say "Emptied bits are zeroed", which is implied by the shift operation in Go (and perhaps many other languages). For right shifts, say signed or unsigned shifts instead. Change-Id: I29c9c0e218bfaeef55b03d92d44762e34f006654 Reviewed-on: https://go-review.googlesource.com/c/go/+/730720 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-17simd/archsimd: reword documentation of As methodsCherry Mui
Change-Id: Ifd6d3e5386383908435dd622e280edb6aa13fdab Reviewed-on: https://go-review.googlesource.com/c/go/+/730660 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2025-12-16cmd/compile: use unsigned constant when folding loads for SIMD ops with ↵Cherry Mui
constants When folding loads into a SIMD op with a constant, in the SSA rules we use makeValAndOff to create an AuxInt for the constant and the offset. For the SIMD ops of concern (for now), the constants are always unsigned. So pass the constant unsigned. Fixes #76756. Change-Id: Ia5910e689ff510ce54d3a0c2ed0e950bc54f8862 Reviewed-on: https://go-review.googlesource.com/c/go/+/730420 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-11simd/archsimd: rename Mask.AsIntMxN to Mask.ToIntMxNCherry Mui
To be more consistent with vector.ToMask and mask.ToBits. Cherry-pick CL 729022 from the dev.simd branch, for Go 1.26. Change-Id: I4ea4dfd0059d256f39a93d1fe2ce1de158049800 Reviewed-on: https://go-review.googlesource.com/c/go/+/729223 Auto-Submit: David Chase <drchase@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-11simd/archsimd: define ToMask only on integer vectorsCherry Mui
The ToMask method is for converting an AVX2-style mask represented in a vector to the Mask type. The AVX2-style mask is a (signed) integer, so define ToMask only on integer vectors. Cherry-pick CL 729020 from the dev.simd branch, for Go 1.26. Change-Id: I0c541eb28e945bfaebf2a2feb940bdd438fb6e99 Reviewed-on: https://go-review.googlesource.com/c/go/+/729222 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: David Chase <drchase@google.com>
2025-12-08[dev.simd] simd, cmd/compile: move "simd" to "simd/archsimd"David Chase
Also removes a few leftover TODOs and scraps of commented-out code from simd development. Updated etetest.sh to make it behave whether amd64 implies the experiment, or not. Fixes #76473. Change-Id: I6d9792214d7f514cb90c21b101dbf7d07c1d0e55 Reviewed-on: https://go-review.googlesource.com/c/go/+/728220 TryBot-Bypass: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>