<feed xmlns='http://www.w3.org/2005/Atom'>
<title>go/test/codegen/mathbits.go, branch fix-runtime-test-GOMAXPROCS</title>
<subtitle>Fork of Go programming language with my patches.</subtitle>
<id>http://git.kilabit.info/go/atom?h=fix-runtime-test-GOMAXPROCS</id>
<link rel='self' href='http://git.kilabit.info/go/atom?h=fix-runtime-test-GOMAXPROCS'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/'/>
<updated>2025-05-01T12:57:41Z</updated>
<entry>
<title>cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64</title>
<updated>2025-05-01T12:57:41Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-03-19T12:57:23Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=4d10d4ad849467f12a1a16a5ade26cc03d8f1a1f'/>
<id>urn:sha1:4d10d4ad849467f12a1a16a5ade26cc03d8f1a1f</id>
<content type='text'>
For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount
using the CPOP/CPOPW machine instructions. Since the native Go
implementation of OnesCount is relatively expensive, it is also
worth emitting a check for Zbb support when compiled for rva20u64.

On a Banana Pi F3, with GORISCV64=rva22u64:

              │     oc.1     │                oc.2                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   4.389n ± 0%  -74.08% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.016n ± 0%  -11.10% (p=0.000 n=10)
OnesCount16-8    9.404n ± 0%   5.015n ± 0%  -46.67% (p=0.000 n=10)
OnesCount32-8   13.165n ± 0%   4.388n ± 0%  -66.67% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   4.388n ± 0%  -73.08% (p=0.000 n=10)
geomean          11.40n        4.629n       -59.40%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb
detection enabled:

              │     oc.3     │                oc.4                 │
              │    sec/op    │   sec/op     vs base                │
OnesCount-8     16.930n ± 0%   5.643n ± 0%  -66.67% (p=0.000 n=10)
OnesCount8-8     5.642n ± 0%   5.642n ± 0%        ~ (p=0.447 n=10)
OnesCount16-8   10.030n ± 0%   6.896n ± 0%  -31.25% (p=0.000 n=10)
OnesCount32-8   13.170n ± 0%   5.642n ± 0%  -57.16% (p=0.000 n=10)
OnesCount64-8   16.300n ± 0%   5.642n ± 0%  -65.39% (p=0.000 n=10)
geomean          11.55n        5.873n       -49.16%

On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb
detection disabled:

              │    oc.3     │                oc.5                 │
              │   sec/op    │   sec/op     vs base                │
OnesCount-8     16.93n ± 0%   29.47n ± 0%  +74.07% (p=0.000 n=10)
OnesCount8-8    5.642n ± 0%   5.643n ± 0%        ~ (p=0.191 n=10)
OnesCount16-8   10.03n ± 0%   15.05n ± 0%  +50.05% (p=0.000 n=10)
OnesCount32-8   13.17n ± 0%   18.18n ± 0%  +38.04% (p=0.000 n=10)
OnesCount64-8   16.30n ± 0%   21.94n ± 0%  +34.60% (p=0.000 n=10)
geomean         11.55n        15.84n       +37.16%

For hardware without Zbb, this adds ~5ns overhead, while for hardware
with Zbb we achieve a performance gain up of up to 11ns. It is worth
noting that OnesCount8 is cheap enough that it is preferable to stick
with the generic version in this case.

Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5
Reviewed-on: https://go-review.googlesource.com/c/go/+/660856
Reviewed-by: Carlos Amedee &lt;carlos@golang.org&gt;
Reviewed-by: Meng Zhuo &lt;mengzhuo1203@gmail.com&gt;
Reviewed-by: Mark Ryan &lt;markdryan@rivosinc.com&gt;
Reviewed-by: Dmitri Shuralyov &lt;dmitshur@google.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify math/bits.Bswap on riscv64</title>
<updated>2025-05-01T12:57:13Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-03-19T14:09:23Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=90e8b8cdaeb76a57604a461a138c59340daed7ef'/>
<id>urn:sha1:90e8b8cdaeb76a57604a461a138c59340daed7ef</id>
<content type='text'>
For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap
using the REV8 machine instruction.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │     rb.1     │                rb.2                 │
                 │    sec/op    │   sec/op     vs base                │
ReverseBytes-4     18.790n ± 0%   4.026n ± 0%  -78.57% (p=0.000 n=10)
ReverseBytes16-4    6.710n ± 0%   5.368n ± 0%  -20.00% (p=0.000 n=10)
ReverseBytes32-4   13.420n ± 0%   5.368n ± 0%  -60.00% (p=0.000 n=10)
ReverseBytes64-4   17.450n ± 0%   4.026n ± 0%  -76.93% (p=0.000 n=10)
geomean             13.11n        4.649n       -64.54%

Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece
Reviewed-on: https://go-review.googlesource.com/c/go/+/660855
Reviewed-by: Mark Ryan &lt;markdryan@rivosinc.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Meng Zhuo &lt;mengzhuo1203@gmail.com&gt;
Reviewed-by: Carlos Amedee &lt;carlos@golang.org&gt;
Reviewed-by: Junyang Shao &lt;shaojunyang@google.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: constant fold 128-bit multiplies</title>
<updated>2025-04-22T17:24:18Z</updated>
<author>
<name>Keith Randall</name>
<email>khr@golang.org</email>
</author>
<published>2025-04-21T19:44:24Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=7d0cb2a2adec493b8ad9d79ef35354c8e20f0213'/>
<id>urn:sha1:7d0cb2a2adec493b8ad9d79ef35354c8e20f0213</id>
<content type='text'>
The full 64x64-&gt;128 multiply comes up when using bits.Mul64.
The 64x64-&gt;64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao &lt;shaojunyang@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify math/bits.Len on riscv64</title>
<updated>2025-03-22T01:21:44Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-02-23T13:27:34Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=b70244ff7a043786c211775b68259de6104ff91c'/>
<id>urn:sha1:b70244ff7a043786c211775b68259de6104ff91c</id>
<content type='text'>
For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the
CLZ/CLZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                 │   clz.b.1   │               clz.b.2               │
                 │   sec/op    │   sec/op     vs base                │
LeadingZeros-4     28.89n ± 0%   12.08n ± 0%  -58.19% (p=0.000 n=10)
LeadingZeros8-4    18.79n ± 0%   14.76n ± 0%  -21.45% (p=0.000 n=10)
LeadingZeros16-4   25.27n ± 0%   14.76n ± 0%  -41.59% (p=0.000 n=10)
LeadingZeros32-4   25.12n ± 0%   12.08n ± 0%  -51.92% (p=0.000 n=10)
LeadingZeros64-4   25.89n ± 0%   12.08n ± 0%  -53.35% (p=0.000 n=10)
geomean            24.55n        13.09n       -46.70%

Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694
Reviewed-on: https://go-review.googlesource.com/c/go/+/652321
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Meng Zhuo &lt;mengzhuo1203@gmail.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Mark Ryan &lt;markdryan@rivosinc.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify math/bits.TrailingZeros on riscv64</title>
<updated>2025-03-16T02:07:53Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-02-23T11:17:53Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=6fb7bdc96d0398fab313586fba6fdc89cc14c679'/>
<id>urn:sha1:6fb7bdc96d0398fab313586fba6fdc89cc14c679</id>
<content type='text'>
For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros
using the CTZ/CTZW machine instructions.

On a StarFive VisionFive 2 with GORISCV64=rva22u64:

                  │   ctz.b.1    │               ctz.b.2               │
                  │    sec/op    │   sec/op     vs base                │
TrailingZeros-4     25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros8-4     14.76n ± 0%   10.74n ± 0%  -27.24% (p=0.000 n=10)
TrailingZeros16-4    26.84n ± 0%   10.74n ± 0%  -59.99% (p=0.000 n=10)
TrailingZeros32-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
TrailingZeros64-4   25.500n ± 0%   8.052n ± 0%  -68.42% (p=0.000 n=10)
geomean              23.09n        9.035n       -60.88%

Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e
Reviewed-on: https://go-review.googlesource.com/c/go/+/652320
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Mark Ryan &lt;markdryan@rivosinc.com&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Meng Zhuo &lt;mengzhuo1203@gmail.com&gt;
</content>
</entry>
<entry>
<title>test/codegen: tighten the TrailingZeros64 test on 386</title>
<updated>2025-03-14T22:04:38Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-02-26T10:03:15Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=c1c7e5902fda622d5d5870ed045407a9acd5666b'/>
<id>urn:sha1:c1c7e5902fda622d5d5870ed045407a9acd5666b</id>
<content type='text'>
Make the TrailingZeros64 code generation check more specific for 386.
Just checking for BSFL will match both the generic 64 bit decomposition
and the custom 386 lowering.

Change-Id: I62076f1889af0ef1f29704cba01ab419cae0c6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656996
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
Auto-Submit: Keith Randall &lt;khr@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: simplify intrinsification of TrailingZeros16 and TrailingZeros8</title>
<updated>2025-02-27T11:45:44Z</updated>
<author>
<name>Joel Sing</name>
<email>joel@sing.id.au</email>
</author>
<published>2025-02-22T12:26:21Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=927fdb7843ce96b42791912b42d0d3e6735e8dde'/>
<id>urn:sha1:927fdb7843ce96b42791912b42d0d3e6735e8dde</id>
<content type='text'>
Decompose Ctz16 and Ctz8 within the SSA rules for LOONG64, MIPS, PPC64
and S390X, rather than having a custom intrinsic. Note that for PPC64 this
actually allows the existing Ctz16 and Ctz8 rules to be used.

Change-Id: I27a5e978f852b9d75396d2a80f5d7dfcb5ef7dd4
Reviewed-on: https://go-review.googlesource.com/c/go/+/651816
Reviewed-by: Paul Murphy &lt;murp@ibm.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Michael Pratt &lt;mpratt@google.com&gt;
Run-TryBot: Joel Sing &lt;joel@sing.id.au&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64</title>
<updated>2024-11-13T00:57:25Z</updated>
<author>
<name>Xiaolin Zhao</name>
<email>zhaoxiaolin@loongson.cn</email>
</author>
<published>2024-11-01T08:09:32Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=ab55465098a0cd33007684091b573717a6ea54cf'/>
<id>urn:sha1:ab55465098a0cd33007684091b573717a6ea54cf</id>
<content type='text'>
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     1.7240n ± 0%   0.8120n ± 0%  -52.90% (p=0.000 n=20)
TrailingZeros8    1.0530n ± 0%   0.8015n ± 0%  -23.88% (p=0.000 n=20)
TrailingZeros16    2.072n ± 0%    1.015n ± 0%  -51.01% (p=0.000 n=20)
TrailingZeros32   1.7160n ± 0%   0.8122n ± 0%  -52.67% (p=0.000 n=20)
TrailingZeros64   2.0060n ± 0%   0.8125n ± 0%  -59.50% (p=0.000 n=20)
geomean            1.669n        0.8470n       -49.25%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
                |  bench.old   |              bench.new               |
                |    sec/op    |    sec/op     vs base                |
TrailingZeros     2.6275n ± 0%   0.9120n ± 0%  -65.29% (p=0.000 n=20)
TrailingZeros8     1.451n ± 0%    1.163n ± 0%  -19.85% (p=0.000 n=20)
TrailingZeros16    3.069n ± 0%    1.201n ± 0%  -60.87% (p=0.000 n=20)
TrailingZeros32   2.9060n ± 0%   0.9115n ± 0%  -68.63% (p=0.000 n=20)
TrailingZeros64   2.6305n ± 0%   0.9115n ± 0%  -65.35% (p=0.000 n=20)
geomean            2.456n         1.011n       -58.83%

This patch is a copy of CL 479498.
Co-authored-by: WANG Xuerui &lt;git@xen0n.name&gt;

Change-Id: I1a5b2114a844dc0d02c8e68f41ce2443ac3b5fda
Reviewed-on: https://go-review.googlesource.com/c/go/+/624356
Reviewed-by: abner chenc &lt;chenguoqi@loongson.cn&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: optimize math/bits.OnesCount{16,32,64} implementation on loong64</title>
<updated>2024-11-12T00:48:04Z</updated>
<author>
<name>Guoqi Chen</name>
<email>chenguoqi@loongson.cn</email>
</author>
<published>2024-10-18T08:31:29Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=fb9b946adcc8389aafaa43866f3cc26b12411439'/>
<id>urn:sha1:fb9b946adcc8389aafaa43866f3cc26b12411439</id>
<content type='text'>
Use Loong64's LSX instruction VPCNT to implement math/bits.OnesCount{16,32,64}
and make it intrinsic.

Benchmark results on loongson 3A5000 and 3A6000 machines:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000-HV @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base               |
OnesCount      4.413n ± 0%     1.401n ± 0%   -68.25% (p=0.000 n=10)
OnesCount8     1.364n ± 0%     1.363n ± 0%         ~ (p=0.130 n=10)
OnesCount16    2.112n ± 0%     1.534n ± 0%   -27.37% (p=0.000 n=10)
OnesCount32    4.533n ± 0%     1.529n ± 0%   -66.27% (p=0.000 n=10)
OnesCount64    4.565n ± 0%     1.531n ± 1%   -66.46% (p=0.000 n=10)
geomean        3.048n          1.470n        -51.78%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
            |   bench.old   |   bench.new                          |
            |    sec/op     |    sec/op       vs base              |
OnesCount       3.553n ± 0%     1.201n ± 0%  -66.20% (p=0.000 n=10)
OnesCount8     0.8021n ± 0%    0.8004n ± 0%   -0.21% (p=0.000 n=10)
OnesCount16     1.216n ± 0%     1.000n ± 0%  -17.76% (p=0.000 n=10)
OnesCount32     3.006n ± 0%     1.035n ± 0%  -65.57% (p=0.000 n=10)
OnesCount64     3.503n ± 0%     1.035n ± 0%  -70.45% (p=0.000 n=10)
geomean         2.053n          1.006n       -51.01%

Change-Id: I07a5b8da2bb48711b896387ec7625145804affc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/620978
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Meidan Li &lt;limeidan@loongson.cn&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: wire up bits.Reverse intrinsics for loong64</title>
<updated>2024-11-11T00:08:45Z</updated>
<author>
<name>Xiaolin Zhao</name>
<email>zhaoxiaolin@loongson.cn</email>
</author>
<published>2024-11-02T07:40:13Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=583d750fa119d504686c737be6a898994b674b69'/>
<id>urn:sha1:583d750fa119d504686c737be6a898994b674b69</id>
<content type='text'>
Micro-benchmark results on Loongson 3A5000 and 3A6000:

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     2.8130n ± 0%   0.8008n ± 0%  -71.53% (p=0.000 n=20)
Reverse8    0.7014n ± 0%   0.4040n ± 0%  -42.40% (p=0.000 n=20)
Reverse16   1.2975n ± 0%   0.6632n ± 1%  -48.89% (p=0.000 n=20)
Reverse32   2.7520n ± 0%   0.4042n ± 0%  -85.31% (p=0.000 n=20)
Reverse64   2.8970n ± 0%   0.4041n ± 0%  -86.05% (p=0.000 n=20)
geomean      1.828n        0.5116n       -72.01%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
          |  CL 624576   |               this CL                |
          |    sec/op    |    sec/op     vs base                |
Reverse     4.0050n ± 0%   0.8011n ± 0%  -80.00% (p=0.000 n=20)
Reverse8    0.8010n ± 0%   0.5210n ± 1%  -34.96% (p=0.000 n=20)
Reverse16   1.6160n ± 0%   0.6008n ± 0%  -62.82% (p=0.000 n=20)
Reverse32   3.8550n ± 0%   0.5179n ± 0%  -86.57% (p=0.000 n=20)
Reverse64   3.8050n ± 0%   0.5177n ± 0%  -86.40% (p=0.000 n=20)
geomean      2.378n        0.5828n       -75.49%

Updates #59120

This patch is a copy of CL 483656.
Co-authored-by: WANG Xuerui &lt;git@xen0n.name&gt;

Change-Id: I98681091763279279c8404bd0295785f13ea1c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/624276
Reviewed-by: abner chenc &lt;chenguoqi@loongson.cn&gt;
LUCI-TryBot-Result: Go LUCI &lt;golang-scoped@luci-project-accounts.iam.gserviceaccount.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
</content>
</entry>
</feed>
