| Age | Commit message (Collapse) | Author |
|
The code previously filters out VAES-only instructions, this CL added
them back.
This CL added the VAES feature check following the Intel xed data:
XED_ISA_SET_VAES: vaes.7.0.ecx.9 # avx.1.0.ecx.28
This CL also found out that the old AVX512VAES feature check is not
checking the correct bits, it also fixes it:
XED_ISA_SET_AVX512_VAES_128: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31
XED_ISA_SET_AVX512_VAES_256: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16 avx512vl.7.0.ebx.31
XED_ISA_SET_AVX512_VAES_512: vaes.7.0.ecx.9 aes.1.0.ecx.25 avx512f.7.0.ebx.16
It restricts to the most strict common set - includes avx512vl for even
512-bits although it doesn't requires it.
Change-Id: I4e2f72b312fd2411589fbc12f9ee5c63c09c2e9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/738500
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: I8bc5fec0475bfaebc0469d0efb2ba89af4b3f150
Reviewed-on: https://go-review.googlesource.com/c/go/+/738640
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
simdgen: fix typos in error messages
Change-Id: I921eea63c4847b2af43a1d5a1ea075e86f58aa77
GitHub-Last-Rev: 8c9dae51fd906aee04f52a5d44c6d4c923fc52d0
GitHub-Pull-Request: golang/go#77012
Reviewed-on: https://go-review.googlesource.com/c/go/+/732880
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
|
|
Currently, all FMA operations are marked as requiring AVX512, even on
smaller vector widths. This is happening because the narrower FMA
operations are marked as extension "FMA" in the XED. Since this
extension doesn't start with "AVX", we filter them out very early in
the XED process. However, this is just a quirk of naming: the FMA
feature depends on the AVX feature, so it is part of AVX, even if it
doesn't say so on the tin.
Fix this by accepting the FMA extension and adding FMA to the table of
CPU features. We also tweak internal/cpu slightly do it correctly
enforces that the logical FMA feature depends on both the FMA and AVX
CPUID flags.
This actually *deletes* a lot of generated code because we no longer
need the AVX-512 encoding of these 128- and 256-bit operations.
Change-Id: I744a18d0be888f536ac034fe88b110347622be7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/736160
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/736201
Reviewed-by: Austin Clements <austin@google.com>
|
|
This simplifies our handling of XED features, adds a table of which
features imply which other features, and adds this information to the
documentation of the CPU features APIs.
As part of this we fix an issue around the "AVXAES" feature. AVXAES is
defined as the combination of the AVX and AES CPUID flags. Several
other features also work like this, but have hand-written logic in
internal/cpu to compute logical feature flags from the underlying
CPUID bits. For these, we expose a single feature check function from
the SIMD API.
AVXAES currently doesn't work like this: it requires the user to check
both features. However, this forces the SIMD API to expose an "AES"
feature check, which really has nothing to do with SIMD. To make this
consistent, we introduce an AVXAES feature check function and use it
in feature requirement docs. Unlike the others combo features, this is
implemented in the simd package, but the difference is invisible to
the user.
Change-Id: I2985ebd361f0ecd45fd428903efe4c981a5ec65d
Reviewed-on: https://go-review.googlesource.com/c/go/+/736100
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/736200
Reviewed-by: Austin Clements <austin@google.com>
|
|
Change-Id: I121847e7f68c602dd8e9ecddfc41b547f8a86f10
Reviewed-on: https://go-review.googlesource.com/c/go/+/734361
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Currently the Broadcast128/256/512 methods broadcast the lowest
element of the input vector to a vector of the corresponding width.
There are also variations of broadcast operations that broadcast
the whole (128- or 256-bit) vector to a larger vector, which we
don't yet support. Our current naming is unclear which version it
is, though. Rename the current ones to Broadcast1ToN, to be clear
that they broadcast one element. The vector version probably will
be named BoradcastAllToN (not included in this CL).
Change-Id: I47a21e367f948ec0b578d63706a40d20f5a9f46d
Reviewed-on: https://go-review.googlesource.com/c/go/+/734840
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
VPMOVMSKB, VMOVMSKPS, and VMOVMSKPD moves AVX1/2-style masks to
integer registers, similar to VPMOV[BWDQ]2M (which moves to mask
registers). The former is available on AVX1/2, the latter requires
AVX512. So use the former if it is supported, i.e. for 128- and
256-bit vectors with 8-, 32-, and 64-bit elements (16-bit elements
always require AVX512).
Change-Id: I972195116617ed2faaf95cee5cd6b250e671496c
Reviewed-on: https://go-review.googlesource.com/c/go/+/734060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
|
|
Currently, the IsNan API is defined as x.IsNan(y), which returns
a mask to represent, for each element, either x or y is NaN.
Albeit closer to the machine instruction, this is weird API, as
IsNaN is a unary operation. This CL changes it to unary, x.IsNaN().
It compiles to VCMPPS $3, x, x (or VCMPPD). For the two-operand
version, we can optimize x.IsNaN().Or(y.IsNaN()) to VCMPPS $3, x,
y (not done in this CL).
While here, change the name to IsNaN (uppercase both Ns), which
matches math.IsNaN.
Tests in the next CL.
Change-Id: Ib6e7afc2635e6c3c606db5ea16420ee673a6c6d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/733660
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The documentation of Mask types currently describe vector types,
not masks. Correct them.
Change-Id: Ib2723310842c6d10cfdd772c7abb8d4c1e63b130
Reviewed-on: https://go-review.googlesource.com/c/go/+/733342
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
- Reword the documentation of Scale to mention parameter names.
- Correct the parameter name in Merge.
- Use proper a/an articles in some documentation.
- Add punctuations.
- Format code blocks for long expressions.
Change-Id: I8a31721503c1b155862255619a835895f3d5123a
Reviewed-on: https://go-review.googlesource.com/c/go/+/731560
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
For methods like ExtendLo2ToInt64x2, the last "x2" is redundant, as
it is already mentioned in "Lo2". Remove it, so it is just
ExtendLo2ToInt64.
Change-Id: I490afd818c40bb7a4ef15c249723895735bd6488
Reviewed-on: https://go-review.googlesource.com/c/go/+/733100
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Correct the generate command for test helpers. There is no longer
a genfiles.go. Also correct the generated file headers to match
the current generator layout.
Change-Id: Ifb9a8c394477359020ff44290dbaabe7a2d59aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/732280
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: David Chase <drchase@google.com>
|
|
For Add/SubPairs(Saturated?), the documented result element order
is wrong. Corrected.
Also, for 256-bit vectors, this is a grouped operation. So name it
with the Grouped suffix to be clear.
Change-Id: Idfd0975cb4a332b2e28c898613861205d26f75b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/732020
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
The DotProductQuadruple methods are currently defined on Int8
vectors. There are some problems for that.
1. We defined a DotProductQuadrupleSaturated method, but the dot
product part does not need saturation, as it cannot overflow. It
is the addition part of VPDPBUSDS that does the saturation.
Currently we have optimization rules like
x.DotProductQuadrupleSaturated(y).Add(z) -> VPDPBUSDS
which is incorrect, in that the dot product doesn't do (or need)
saturation, and the Add is a regular Add, but we rewrite it to a
saturated add. The correct rule should be something like
x.DotProductQuadruple(y).AddSaturated(z) -> VPDPBUSDS
2. There are multiple flavors of DotProductQuadruple:
signed/unsigned × signed/unsigned, which cannot be completely
disambiguated by the type. The current naming may preclude adding
all the flavors.
For these reasons, remove the methods for now. We can add them
later with the issues addressed.
Change-Id: I549c0925afaa68c7e2cc956105619f2c1b46b325
Reviewed-on: https://go-review.googlesource.com/c/go/+/731441
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
and fix type
They operate on 128-bit groups, so name them Grouped to be clear,
and consistent with other grouped operations. Reword the
documentation, mention the grouping only for grouped versions.
Also, SaturateToUnt16Concat(Grouped) is a signed int32 to unsigned
uint16 saturated conversion. The receiver and the parameter should
be signed. The result remains unsigned.
Change-Id: I30e28bc05e07f5c28214c9c6d9d201cbbb183468
Reviewed-on: https://go-review.googlesource.com/c/go/+/731501
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
It should be defined on unsigned types, not signed types, and use
unsigned conversion instructions.
Change-Id: I49694ccdf1d331cfde88591531c358d9886e83e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/731500
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
- Min/Max: make it clear it is elementwise.
- RoundToEven: clarify it is rounding tie to even.
- MulEvenWiden: use mathematical form of the index.
- CopySign: use parameter names directly.
- ConcatShiftBytesRight: rename the parameter.
Change-Id: I4cf0773c4daf3e3bf7b26e79d84ac5c2a9145c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/731421
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
The wording for the emulated ones are more precise. Use that
everywhere.
Change-Id: Iab64e0bb1fb6b19178ebf30ba8e82360b5882fd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/731420
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Use more compact wording for extension, truncation, and
saturation. Say that we pack the results to the low elements and
zero the high elements if and only if the result has more elements.
Change-Id: Iae98d3c6ea6b5b5fa0acd548471e8d6c70a26d2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/730940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
Say the name of the parameter, instead of "the immediate".
Don't say "Emptied bits are zeroed", which is implied by the shift
operation in Go (and perhaps many other languages). For right
shifts, say signed or unsigned shifts instead.
Change-Id: I29c9c0e218bfaeef55b03d92d44762e34f006654
Reviewed-on: https://go-review.googlesource.com/c/go/+/730720
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Change-Id: Ifd6d3e5386383908435dd622e280edb6aa13fdab
Reviewed-on: https://go-review.googlesource.com/c/go/+/730660
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
|
|
constants
When folding loads into a SIMD op with a constant, in the SSA
rules we use makeValAndOff to create an AuxInt for the constant
and the offset. For the SIMD ops of concern (for now), the
constants are always unsigned. So pass the constant unsigned.
Fixes #76756.
Change-Id: Ia5910e689ff510ce54d3a0c2ed0e950bc54f8862
Reviewed-on: https://go-review.googlesource.com/c/go/+/730420
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
To be more consistent with vector.ToMask and mask.ToBits.
Cherry-pick CL 729022 from the dev.simd branch, for Go 1.26.
Change-Id: I4ea4dfd0059d256f39a93d1fe2ce1de158049800
Reviewed-on: https://go-review.googlesource.com/c/go/+/729223
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
|
Also removes a few leftover TODOs and scraps of commented-out code
from simd development.
Updated etetest.sh to make it behave whether amd64 implies the
experiment, or not.
Fixes #76473.
Change-Id: I6d9792214d7f514cb90c21b101dbf7d07c1d0e55
Reviewed-on: https://go-review.googlesource.com/c/go/+/728220
TryBot-Bypass: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|