<feed xmlns='http://www.w3.org/2005/Atom'>
<title>go/test/codegen/mathbits.go, branch json-isValidNumber</title>
<subtitle>Fork of Go programming language with my patches.</subtitle>
<id>http://git.kilabit.info/go/atom?h=json-isValidNumber</id>
<link rel='self' href='http://git.kilabit.info/go/atom?h=json-isValidNumber'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/'/>
<updated>2023-02-08T03:43:23Z</updated>
<entry>
<title>cmd/compile: intrinsify math/bits/ReverseBytes{32|64} for 386</title>
<updated>2023-02-08T03:43:23Z</updated>
<author>
<name>Wayne Zuo</name>
<email>wdvxdr@golangcn.org</email>
</author>
<published>2023-02-06T13:21:52Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=d7ac5d148039822159601ebbf3512431d81e204f'/>
<id>urn:sha1:d7ac5d148039822159601ebbf3512431d81e204f</id>
<content type='text'>
The BSWAPL instruction is supported in i486 and newer.
https://github.com/golang/go/wiki/MinimumRequirements#386 says we
support "All Pentium MMX or later". The Pentium is also referred to as
i586, so that we are safe with these instructions.

Change-Id: I6dea1f9d864a45bb07c8f8f35a81cfe16cca216c
Reviewed-on: https://go-review.googlesource.com/c/go/+/465515
Run-TryBot: Wayne Zuo &lt;wdvxdr@golangcn.org&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: Matthew Dempsky &lt;mdempsky@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10</title>
<updated>2023-02-03T19:01:06Z</updated>
<author>
<name>Archana R</name>
<email>aravind5@in.ibm.com</email>
</author>
<published>2022-10-31T16:47:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=cd1fc871568e9947e84377f82c8d7a4882a07067'/>
<id>urn:sha1:cd1fc871568e9947e84377f82c8d7a4882a07067</id>
<content type='text'>
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.

Performance improvement seen on Power10:
ReverseBytes32  1.38ns ± 0%  1.18ns ± 0%  -14.2
ReverseBytes64  1.52ns ± 0%  1.11ns ± 0%  -26.87
ReverseBytes16  1.41ns ± 1%  1.18ns ± 0%  -16.47

Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger &lt;laboger@linux.vnet.ibm.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: Dmitri Shuralyov &lt;dmitshur@google.com&gt;
Run-TryBot: Lynn Boger &lt;laboger@linux.vnet.ibm.com&gt;
Reviewed-by: Michael Knyszek &lt;mknyszek@google.com&gt;
</content>
</entry>
<entry>
<title>test/codegen: merge identical ppc64 and ppc64le tests</title>
<updated>2023-01-27T19:03:02Z</updated>
<author>
<name>Paul E. Murphy</name>
<email>murp@ibm.com</email>
</author>
<published>2023-01-24T17:38:29Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=15405317460b57ff9ef605d0ca63795477c79b05'/>
<id>urn:sha1:15405317460b57ff9ef605d0ca63795477c79b05</id>
<content type='text'>
Manually consolidate the remaining ppc64/ppc64le test which
are not so trivial to automatically merge.

The remaining ppc64le tests are limited to cases where load/stores are
merged (this only happens on ppc64le) and the race detector (only
supported on ppc64le).

Change-Id: I1f9c0f3d3ddbb7fbbd8c81fbbd6537394fba63ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/463217
Reviewed-by: Dmitri Shuralyov &lt;dmitshur@google.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Run-TryBot: Paul Murphy &lt;murp@ibm.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: Lynn Boger &lt;laboger@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>test/codegen: combine trivial PPC64 tests into ppc64x</title>
<updated>2023-01-27T18:24:12Z</updated>
<author>
<name>Paul E. Murphy</name>
<email>murp@ibm.com</email>
</author>
<published>2023-01-25T17:53:10Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=0301c6c3512561b85b48d0e167f3e405484f496f'/>
<id>urn:sha1:0301c6c3512561b85b48d0e167f3e405484f496f</id>
<content type='text'>
Use a small python script to consolidate duplicate
ppc64/ppc64le tests into a single ppc64x codegen test.

This makes small assumption that anytime two tests with
for different arch/variant combos exists, those tests
can be combined into a single ppc64x test.

E.x:

  // ppc64le: foo
  // ppc64le/power9: foo
into
  // ppc64x: foo

or

  // ppc64: foo
  // ppc64le: foo
into
  // ppc64x: foo

import glob
import re
files = glob.glob("codegen/*.go")
for file in files:
    with open(file) as f:
        text = [l for l in f]
    i = 0
    while i &lt; len(text):
        first = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[i])
        if first:
            j = i+1
            while j &lt; len(text):
                second = re.match("\s*// ?ppc64(le)?(/power[89])?:(.*)", text[j])
                if not second:
                    break
                if (not first.group(2) or first.group(2) == second.group(2)) and first.group(3) == second.group(3):
                    text[i] = re.sub(" ?ppc64(le|x)?"," ppc64x",text[i])
                    text=text[:j] + (text[j+1:])
                else:
                    j += 1
        i+=1
    with open(file, 'w') as f:
        f.write("".join(text))

Change-Id: Ic6b009b54eacaadc5a23db9c5a3bf7331b595821
Reviewed-on: https://go-review.googlesource.com/c/go/+/463220
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Lynn Boger &lt;laboger@linux.vnet.ibm.com&gt;
Reviewed-by: Bryan Mills &lt;bcmills@google.com&gt;
Run-TryBot: Paul Murphy &lt;murp@ibm.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile/internal/ssa: re-adjust CarryChainTail scheduling priority</title>
<updated>2022-11-03T19:59:19Z</updated>
<author>
<name>Paul E. Murphy</name>
<email>murp@ibm.com</email>
</author>
<published>2022-10-28T20:59:43Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=d031e9e07a07afef8d16576fd7079a739a7e4394'/>
<id>urn:sha1:d031e9e07a07afef8d16576fd7079a739a7e4394</id>
<content type='text'>
This needs to be as low as possible while not breaking priority
assumptions of other scores to correctly schedule carry chains.

Prior to the arm64 changes, it was set below ReadTuple. At the time,
this prevented the MulHiLo implementation on PPC64 from occluding
the scheduling of a full carry chain.

Memory scores can also prevent better scheduling, as can be observed
with crypto/internal/edwards25519/field.feMulGeneric.

Fixes #56497

Change-Id: Ia4b54e6dffcce584faf46b1b8d7cea18a3913887
Reviewed-on: https://go-review.googlesource.com/c/go/+/447435
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
Run-TryBot: Paul Murphy &lt;murp@ibm.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: Bryan Mills &lt;bcmills@google.com&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify Sub64 on loong64</title>
<updated>2022-10-07T18:16:26Z</updated>
<author>
<name>Wayne Zuo</name>
<email>wdvxdr@golangcn.org</email>
</author>
<published>2022-09-06T14:29:31Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=90a352742754cd8731302f23bfa4a47f0b3f5f40'/>
<id>urn:sha1:90a352742754cd8731302f23bfa4a47f0b3f5f40</id>
<content type='text'>
This is a follow up of CL 420095  on loong64.

file                                    before    after     Δ       %
compile/internal/ssa.a                  35649482  35653274  +3792   +0.011%
compile/internal/ssagen.a               4099858   4098728   -1130   -0.028%
ecdh.a                                  227896    226896    -1000   -0.439%
internal/nistec/fiat.a                  1212254   1128184   -84070  -6.935%
tls.a                                   3256800   3256802   +2      +0.000%
big.a                                   1708518   1702496   -6022   -0.352%
bits.a                                  106762    105734    -1028   -0.963%
math.a                                  578762    577288    -1474   -0.255%
netip.a                                 555922    555610    -312    -0.056%
net.a                                   3286528   3286530   +2      +0.000%
golang.org/x/crypto/internal/poly1305.a 109546    107686    -1860   -1.698%
total                                   260392768 260299668 -93100  -0.036%

Change-Id: Ieffca705aae5666501f284502d986ca179dde494
Reviewed-on: https://go-review.googlesource.com/c/go/+/428557
Reviewed-by: Carlos Amedee &lt;carlos@golang.org&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Run-TryBot: Wayne Zuo &lt;wdvxdr@golangcn.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify Add64 on loong64</title>
<updated>2022-10-07T18:16:10Z</updated>
<author>
<name>Wayne Zuo</name>
<email>wdvxdr@golangcn.org</email>
</author>
<published>2022-09-06T14:12:16Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=97760ed651f88341bcf15aa4980863c199b6f3dc'/>
<id>urn:sha1:97760ed651f88341bcf15aa4980863c199b6f3dc</id>
<content type='text'>
This is a follow up of CL 420094  on loong64.

Reduce go toolchain size slightly on linux/loong64.

compilecmp HEAD~1 -&gt; HEAD
HEAD~1 (8a32354219): internal/trace: use strings.Builder
HEAD (1767784ac3): cmd/compile: intrinsify Add64 on loong64
platform: linux/loong64

file      before    after     Δ       %
addr2line 3882616   3882536   -80     -0.002%
api       5528866   5528450   -416    -0.008%
asm       5133780   5133796   +16     +0.000%
cgo       4668787   4668491   -296    -0.006%
compile   25163409  25164729  +1320   +0.005%
cover     4658055   4658007   -48     -0.001%
dist      3437783   3437727   -56     -0.002%
doc       3883069   3883205   +136    +0.004%
fix       3383254   3383070   -184    -0.005%
link      6747559   6747023   -536    -0.008%
nm        3793923   3793939   +16     +0.000%
objdump   4256628   4256812   +184    +0.004%
pack      2356328   2356144   -184    -0.008%
pprof     14233370  14131910  -101460 -0.713%
test2json 2638668   2638476   -192    -0.007%
trace     13392065  13360781  -31284  -0.234%
vet       7456388   7455588   -800    -0.011%
total     132498256 132364392 -133864 -0.101%

file                                    before    after     Δ       %
compile/internal/ssa.a                  35644590  35649482  +4892   +0.014%
compile/internal/ssagen.a               4101250   4099858   -1392   -0.034%
internal/edwards25519/field.a           226064    201718    -24346  -10.770%
internal/nistec/fiat.a                  1689922   1212254   -477668 -28.266%
tls.a                                   3256798   3256800   +2      +0.000%
big.a                                   1718552   1708518   -10034  -0.584%
bits.a                                  107786    106762    -1024   -0.950%
cmplx.a                                 169434    168214    -1220   -0.720%
math.a                                  581302    578762    -2540   -0.437%
netip.a                                 556096    555922    -174    -0.031%
net.a                                   3286526   3286528   +2      +0.000%
runtime.a                               8644786   8644510   -276    -0.003%
strconv.a                               519098    518374    -724    -0.139%
golang.org/x/crypto/internal/poly1305.a 115398    109546    -5852   -5.071%
total                                   260913122 260392768 -520354 -0.199%

Change-Id: I75b2bb7761fa5a0d0d032d4ebe3582d092ea77be
Reviewed-on: https://go-review.googlesource.com/c/go/+/428556
Reviewed-by: Carlos Amedee &lt;carlos@golang.org&gt;
Run-TryBot: Wayne Zuo &lt;wdvxdr@golangcn.org&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: optimize RotateLeft8/16 on arm64</title>
<updated>2022-09-02T17:46:31Z</updated>
<author>
<name>ruinan</name>
<email>ruinan.sun@arm.com</email>
</author>
<published>2022-07-13T09:00:57Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=121344ac338ef21d87eee4f64a60d0ae8a7f6fe3'/>
<id>urn:sha1:121344ac338ef21d87eee4f64a60d0ae8a7f6fe3</id>
<content type='text'>
This CL optimizes RotateLeft8/16 on arm64.

For 16 bits, we form a 32 bits register by duplicating two 16 bits
registers, then use RORW instruction to do the rotate shift.

For 8 bits, we just use LSR and LSL instead of RORW because the code is
simpler.

Benchmark          Old          ThisCL       delta
RotateLeft8-46     2.16 ns/op   1.73 ns/op   -19.70%
RotateLeft16-46    2.16 ns/op   1.54 ns/op   -28.53%

Change-Id: I09cde4383d12e31876a57f8cdfd3bb4f324fadb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/420976
Reviewed-by: Keith Randall &lt;khr@google.com&gt;
Auto-Submit: Keith Randall &lt;khr@golang.org&gt;
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
Reviewed-by: Heschi Kreinick &lt;heschi@google.com&gt;
Run-TryBot: Keith Randall &lt;khr@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify Sub64 on riscv64</title>
<updated>2022-08-27T05:43:59Z</updated>
<author>
<name>Wayne Zuo</name>
<email>wdvxdr@golangcn.org</email>
</author>
<published>2022-07-29T14:14:53Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=a6219737e3eb062282e6483a915c395affb30c69'/>
<id>urn:sha1:a6219737e3eb062282e6483a915c395affb30c69</id>
<content type='text'>
After this CL, the performance difference in crypto/elliptic
benchmarks on linux/riscv64 are:

name                 old time/op    new time/op    delta
ScalarBaseMult/P256    1.64ms ± 1%    1.60ms ± 1%   -2.36%  (p=0.008 n=5+5)
ScalarBaseMult/P224    1.53ms ± 1%    1.47ms ± 2%   -4.24%  (p=0.008 n=5+5)
ScalarBaseMult/P384    5.12ms ± 2%    5.03ms ± 2%     ~     (p=0.095 n=5+5)
ScalarBaseMult/P521    22.3ms ± 2%    13.8ms ± 1%  -37.89%  (p=0.008 n=5+5)
ScalarMult/P256        4.49ms ± 2%    4.26ms ± 2%   -5.13%  (p=0.008 n=5+5)
ScalarMult/P224        4.33ms ± 1%    4.09ms ± 1%   -5.59%  (p=0.008 n=5+5)
ScalarMult/P384        16.3ms ± 1%    15.5ms ± 2%   -4.78%  (p=0.008 n=5+5)
ScalarMult/P521         101ms ± 0%      47ms ± 2%  -53.36%  (p=0.008 n=5+5)

Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/420095
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Reviewed-by: Joel Sing &lt;joel@sing.id.au&gt;
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Run-TryBot: Wayne Zuo &lt;wdvxdr@golangcn.org&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</content>
</entry>
<entry>
<title>cmd/compile: intrinsify Add64 on riscv64</title>
<updated>2022-08-27T05:43:32Z</updated>
<author>
<name>Wayne Zuo</name>
<email>wdvxdr@golangcn.org</email>
</author>
<published>2022-07-29T06:24:26Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/go/commit/?id=969f48a3a24032c3dd1ec351302b5b62407dfb88'/>
<id>urn:sha1:969f48a3a24032c3dd1ec351302b5b62407dfb88</id>
<content type='text'>
According to RISCV instruction set manual v2.2 Sec 2.4, we can
implement overflowing check for unsigned addition cheaply using
SLTU instructions.

After this CL, the performance difference in crypto/elliptic
benchmarks on linux/riscv64 are:

name                 old time/op    new time/op    delta
ScalarBaseMult/P256    1.93ms ± 1%    1.64ms ± 1%  -14.96%  (p=0.008 n=5+5)
ScalarBaseMult/P224    1.80ms ± 2%    1.53ms ± 1%  -14.89%  (p=0.008 n=5+5)
ScalarBaseMult/P384    6.15ms ± 2%    5.12ms ± 2%  -16.73%  (p=0.008 n=5+5)
ScalarBaseMult/P521    25.9ms ± 1%    22.3ms ± 2%  -13.78%  (p=0.008 n=5+5)
ScalarMult/P256        5.59ms ± 1%    4.49ms ± 2%  -19.79%  (p=0.008 n=5+5)
ScalarMult/P224        5.42ms ± 1%    4.33ms ± 1%  -20.01%  (p=0.008 n=5+5)
ScalarMult/P384        19.9ms ± 2%    16.3ms ± 1%  -18.15%  (p=0.008 n=5+5)
ScalarMult/P521        97.3ms ± 1%   100.7ms ± 0%   +3.48%  (p=0.008 n=5+5)

Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/420094
Reviewed-by: David Chase &lt;drchase@google.com&gt;
Reviewed-by: Cherry Mui &lt;cherryyz@google.com&gt;
Run-TryBot: Wayne Zuo &lt;wdvxdr@golangcn.org&gt;
Reviewed-by: Joel Sing &lt;joel@sing.id.au&gt;
TryBot-Result: Gopher Robot &lt;gobot@golang.org&gt;
</content>
</entry>
</feed>
