aboutsummaryrefslogtreecommitdiff
path: root/src/unicode/utf8/utf8_test.go
AgeCommit message (Collapse)Author
2025-09-03unicode/utf8: make DecodeRune{,InString} inlineableJulien Cretel
This change makes the fast path for ASCII characters inlineable in DecodeRune and DecodeRuneInString and removes most instances of manual inlining at call sites. Here are some benchmark results (no change to allocations): goos: darwin goarch: amd64 pkg: unicode/utf8 cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz │ old │ new │ │ sec/op │ sec/op vs base │ DecodeASCIIRune-8 2.4545n ± 2% 0.6253n ± 2% -74.52% (p=0.000 n=20) DecodeJapaneseRune-8 3.988n ± 1% 4.023n ± 1% +0.86% (p=0.050 n=20) DecodeASCIIRuneInString-8 2.4675n ± 1% 0.6264n ± 2% -74.61% (p=0.000 n=20) DecodeJapaneseRuneInString-8 3.992n ± 1% 4.001n ± 1% ~ (p=0.625 n=20) geomean 3.134n 1.585n -49.43% Note: when #61502 gets resolved, DecodeRune and DecodeRuneInString should be reverted to their idiomatic implementations. Fixes #31666 Updates #48195 Change-Id: I4be25c4f52417dc28b3a7bd72f1b04018470f39d GitHub-Last-Rev: 2e352a0045027e059be79cdb60241b5cf35fec71 GitHub-Pull-Request: golang/go#75181 Reviewed-on: https://go-review.googlesource.com/c/go/+/699675 Reviewed-by: Sean Liao <sean@liao.dev> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
2025-07-24unicode/utf8: skip ahead during ascii runs in Valid/ValidStringKeith Randall
When we see an ASCII character, we will probably see many. Grab & check increasingly large chunks of the string for ASCII-only-ness. Also redo some of the non-ASCII code to make it more optimizer friendly. goos: linux goarch: amd64 pkg: unicode/utf8 cpu: 12th Gen Intel(R) Core(TM) i7-12700 │ base │ exp │ │ sec/op │ sec/op vs base │ ValidTenASCIIChars-20 3.596n ± 3% 2.522n ± 1% -29.86% (p=0.000 n=10) Valid100KASCIIChars-20 6.094µ ± 2% 2.115µ ± 1% -65.29% (p=0.000 n=10) ValidTenJapaneseChars-20 21.02n ± 0% 18.61n ± 2% -11.44% (p=0.000 n=10) ValidLongMostlyASCII-20 51.774µ ± 0% 3.836µ ± 1% -92.59% (p=0.000 n=10) ValidLongJapanese-20 102.40µ ± 1% 50.95µ ± 1% -50.24% (p=0.000 n=10) ValidStringTenASCIIChars-20 2.640n ± 3% 2.526n ± 1% -4.34% (p=0.000 n=10) ValidString100KASCIIChars-20 5.585µ ± 7% 2.118µ ± 1% -62.07% (p=0.000 n=10) ValidStringTenJapaneseChars-20 21.29n ± 2% 18.67n ± 1% -12.31% (p=0.000 n=10) ValidStringLongMostlyASCII-20 52.431µ ± 1% 3.841µ ± 0% -92.67% (p=0.000 n=10) ValidStringLongJapanese-20 102.66µ ± 1% 50.90µ ± 1% -50.42% (p=0.000 n=10) geomean 1.152µ 454.8n -60.53% This is an attempt to see if we can get enough performance that we don't need to consider assembly like that in CL 681695. Change-Id: I8250feb797a6b4e7d335c23929f6e3acc8b24840 Reviewed-on: https://go-review.googlesource.com/c/go/+/682778 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-06unicode/utf8: remove init from utf8_testLuka Krmpotic
TestConstants and init test the same thing, remove init, it does not exist in utf16_test.go either. Fixes #71579 Change-Id: Ie0afd640bebde822733b6eac0bf98a17872f4e5f GitHub-Last-Rev: d7224c18376e00038261279abdfa954abc3a8303 GitHub-Pull-Request: golang/go#71582 Reviewed-on: https://go-review.googlesource.com/c/go/+/647335 Commit-Queue: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: qiu laidongfeng2 <2645477756@qq.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-09-15unicode/utf8: add test that RuneCount does zero allocationsCuong Manh Le
See disccusion in CL 612955. Change-Id: I2de582321648f1798929ffb80d2f087e41146ead Reviewed-on: https://go-review.googlesource.com/c/go/+/613315 Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: Tim King <taking@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2024-09-03all: omit unnecessary 0 in slice expressionnlwkobe30
All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go. Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6 GitHub-Last-Rev: 794aa9b0539811d00e1cd42be1e8d9fe9afe0281 GitHub-Pull-Request: golang/go#69170 Reviewed-on: https://go-review.googlesource.com/c/go/+/609678 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-07-22unicode/utf8: AppendRune and EncodeRune performance improvementDiego Augusto Molina
- Prefer the evaluation of the valid higher byte-width runes branches over the one for invalid ones - Avoid the evaluation of the bytes of the RuneError constant, and instead hard code its byte values - EncodeRune only: inline for fast handling of ASCII goos: linux goarch: amd64 pkg: unicode/utf8 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ baseline.test-append.txt │ append-rune-invalid-case-last.test.txt │ │ sec/op │ sec/op vs base │ AppendASCIIRune-8 0.2135n ± 0% 0.2135n ± 0% ~ (p=0.578 n=20) AppendSpanishRune-8 1.645n ± 1% 1.509n ± 2% -8.30% (p=0.000 n=20) AppendJapaneseRune-8 2.196n ± 1% 2.004n ± 1% -8.74% (p=0.000 n=20) AppendMaxRune-8 2.670n ± 1% 2.349n ± 3% -12.01% (p=0.000 n=20) AppendInvalidRuneMaxPlusOne-8 2.214n ± 2% 1.798n ± 3% -18.77% (p=0.000 n=20) AppendInvalidRuneSurrogate-8 2.258n ± 1% 1.793n ± 2% -20.59% (p=0.000 n=20) AppendInvalidRuneNegative-8 2.171n ± 2% 1.767n ± 2% -18.61% (p=0.000 n=20) geomean 1.559n 1.361n -12.69% goos: linux goarch: amd64 pkg: unicode/utf8 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ baseline.test-encode.txt │ encode-rune-invalid-last.txt │ │ sec/op │ sec/op vs base │ EncodeASCIIRune-8 1.0950n ± 1% 0.2140n ± 0% -80.46% (p=0.000 n=20) EncodeSpanishRune-8 1.499n ± 0% 1.414n ± 2% -5.64% (p=0.000 n=20) EncodeJapaneseRune-8 1.960n ± 2% 1.716n ± 4% -12.43% (p=0.000 n=20) EncodeMaxRune-8 2.145n ± 2% 2.227n ± 1% +3.78% (p=0.000 n=20) EncodeInvalidRuneMaxPlusOne-8 1.955n ± 2% 1.802n ± 2% -7.80% (p=0.000 n=20) EncodeInvalidRuneSurrogate-8 1.946n ± 3% 1.777n ± 2% -8.68% (p=0.000 n=20) EncodeInvalidRuneNegative-8 1.968n ± 2% 1.766n ± 2% -10.29% (p=0.000 n=20) geomean 1.757n 1.308n -25.57% Fixes #68131 Change-Id: Ibcafa75d63cca07a2e78cd06f6f1e382cb8c716e Reviewed-on: https://go-review.googlesource.com/c/go/+/594115 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-09-07unicode/utf8: use strings.Buildercuiweixie
Change-Id: I88b55f61eccb5764cac2a9397fd99a62f8735a9a Reviewed-on: https://go-review.googlesource.com/c/go/+/428281 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com>
2022-03-02unicode/utf8: optimize Valid to parity with ValidStringAlan Donovan
The benchmarks added in this change revealed that ValidString runs ~17% faster than Valid([]byte) on the ASCII prefix of the input. Inspection of the assembly revealed that the code generated for p[8:] required recomputing the slice capacity to handle the cap=0 special case, which added an ADD -8 instruction. By making len=cap, the capacity becomes a common subexpression with the length, saving the ADD instruction. (Thanks to khr for the tip.) Incidentally, I tried a number of other optimizations but was unable to make consistent gains across all benchmarks. The most promising was to retain the bitmask of non-ASCII bytes from the fast loop; the slow loop would shift it, and when it becomes zero, return to the fast loop. This made the MostlyASCII benchmark 4x faster, but made the other cases slower by up to 10%. cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz benchmark old ns/op new ns/op delta BenchmarkValidTenASCIIChars-16 4.09 4.06 -0.85% BenchmarkValid100KASCIIChars-16 9325 7747 -16.92% BenchmarkValidTenJapaneseChars-16 27.0 27.2 +0.85% BenchmarkValidLongMostlyASCII-16 57277 58361 +1.89% BenchmarkValidLongJapanese-16 94002 93131 -0.93% BenchmarkValidStringTenASCIIChars-16 4.15 4.07 -1.74% BenchmarkValidString100KASCIIChars-16 7980 8019 +0.49% BenchmarkValidStringTenJapaneseChars-16 26.0 25.9 -0.38% BenchmarkValidStringLongMostlyASCII-16 58550 58006 -0.93% BenchmarkValidStringLongJapanese-16 97964 100038 +2.12% Change-Id: Ic9d585dedd9af83c27dd791ecd805150ac949f15 Reviewed-on: https://go-review.googlesource.com/c/go/+/375594 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Trust: Alex Rakoczy <alex@golang.org>
2021-11-05unicode/utf8: add AppendRune Examplejiahua wang
Also, correct TestAppendRune error message. Change-Id: I3ca3ac7051af1ae6d449381b78efa86c2f6be8ac Reviewed-on: https://go-review.googlesource.com/c/go/+/354529 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> Trust: Robert Findley <rfindley@google.com> Trust: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>
2021-08-28unicode/utf8: add AppendRuneJoe Tsai
AppendRune appends the UTF-8 encoding of a rune to a []byte. It is a generally more user friendly than EncodeRune. EncodeASCIIRune-4 2.35ns ± 2% EncodeJapaneseRune-4 4.60ns ± 2% AppendASCIIRune-4 0.30ns ± 3% AppendJapaneseRune-4 4.70ns ± 2% The ASCII case is written to be inlineable. Fixes #47609 Change-Id: If4f71eedffd2bd4ef0d7f960cb55b41c637eec54 Reviewed-on: https://go-review.googlesource.com/c/go/+/345571 Trust: Joe Tsai <joetsai@digital-static.net> Reviewed-by: Rob Pike <r@golang.org> Run-TryBot: Rob Pike <r@golang.org> TryBot-Result: Go Bot <gobot@golang.org>
2020-09-10unicode/utf8: refactor benchmarks for FullRune functioneric fang
BenchmarkFullASCIIRune tests the performance of function utf8.FullRune, which will be inlined in BenchmarkFullASCIIRune. Since the return value of FullRune is not referenced, it will be removed as dead code. This CL makes the FullRune functions return value referenced by a global variable to avoid this point. In addition, this CL adds one more benchmark to cover more code paths, and puts them together as sub benchmarks of BenchmarkFullRune. Change-Id: I6e79f4c087adf70e351498a4b58d7482dcd1ec4a Reviewed-on: https://go-review.googlesource.com/c/go/+/233979 Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-05-06cmd/compile: optimize len([]rune(string))Martin Möhrmann
Adds a new runtime function to count runes in a string. Modifies the compiler to detect the pattern len([]rune(string)) and replaces it with the new rune counting runtime function. RuneCount/lenruneslice/ASCII 27.8ns ± 2% 14.5ns ± 3% -47.70% (p=0.000 n=10+10) RuneCount/lenruneslice/Japanese 126ns ± 2% 60ns ± 2% -52.03% (p=0.000 n=10+10) RuneCount/lenruneslice/MixedLength 104ns ± 2% 50ns ± 1% -51.71% (p=0.000 n=10+9) Fixes #24923 Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9 Reviewed-on: https://go-review.googlesource.com/108985 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-10-17runtime: speed up non-ASCII rune decodingMartin Möhrmann
Copies utf8 constants and EncodeRune implementation from unicode/utf8. Adds a new decoderune implementation that is used by the compiler in code generated for ranging over strings. It does not handle ASCII runes since these are handled directly before calls to decoderune. The DecodeRuneInString implementation from unicode/utf8 is not used since it uses a lookup table that would increase the use of cpu caches. Adds more tests that check decoding of valid and invalid utf8 sequences. name old time/op new time/op delta RuneIterate/range2/ASCII-4 7.45ns ± 2% 7.45ns ± 1% ~ (p=0.634 n=16+16) RuneIterate/range2/Japanese-4 53.5ns ± 1% 49.2ns ± 2% -8.03% (p=0.000 n=20+20) RuneIterate/range2/MixedLength-4 46.3ns ± 1% 41.0ns ± 2% -11.57% (p=0.000 n=20+20) new: "".decoderune t=1 size=423 args=0x28 locals=0x0 old: "".charntorune t=1 size=666 args=0x28 locals=0x0 Change-Id: I1df1fdb385bb9ea5e5e71b8818ea2bf5ce62de52 Reviewed-on: https://go-review.googlesource.com/28490 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-12-01unicode/utf8: add test for FullRuneMarcel van Lohuizen
Check that it now properly handles \xC0 and \xC1. Fixes #11733. Change-Id: I66cfe0d43f9d123d4c4509a3fa18b9b6380dfc39 Reviewed-on: https://go-review.googlesource.com/17225 Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-16unicode/utf8: table-based algorithm for decodingMarcel van Lohuizen
This simplifies covering all cases, reducing the number of branches and making unrolling for simpler functions manageable. This significantly improves performance of non-ASCII input. This change will also allow addressing Issue #11733 in an efficient manner. RuneCountTenASCIIChars-8 13.7ns ± 4% 13.5ns ± 2% ~ (p=0.116 n=7+8) RuneCountTenJapaneseChars-8 153ns ± 3% 74ns ± 2% -51.42% (p=0.000 n=8+8) RuneCountInStringTenASCIIChars-8 13.5ns ± 2% 12.5ns ± 3% -7.13% (p=0.000 n=8+7) RuneCountInStringTenJapaneseChars-8 145ns ± 2% 68ns ± 2% -53.21% (p=0.000 n=8+8) ValidTenASCIIChars-8 14.1ns ± 3% 12.5ns ± 5% -11.38% (p=0.000 n=8+8) ValidTenJapaneseChars-8 147ns ± 3% 71ns ± 4% -51.72% (p=0.000 n=8+8) ValidStringTenASCIIChars-8 12.5ns ± 3% 12.3ns ± 3% ~ (p=0.095 n=8+8) ValidStringTenJapaneseChars-8 146ns ± 4% 70ns ± 2% -51.62% (p=0.000 n=8+7) DecodeASCIIRune-8 5.91ns ± 2% 4.83ns ± 3% -18.28% (p=0.001 n=7+7) DecodeJapaneseRune-8 12.2ns ± 7% 8.5ns ± 3% -29.79% (p=0.000 n=8+7) FullASCIIRune-8 5.95ns ± 3% 4.27ns ± 1% -28.23% (p=0.000 n=8+7) FullJapaneseRune-8 12.0ns ± 6% 4.3ns ± 3% -64.39% (p=0.000 n=8+8) Change-Id: Iea1d6b0180cbbee1739659a0a38038126beecaca Reviewed-on: https://go-review.googlesource.com/16940 Reviewed-by: Russ Cox <rsc@golang.org>
2015-10-26unicode/utf8: added benchmarksMarcel van Lohuizen
Cover some functions that weren't benched before and add InString variants if the underlying implementation is different. Note: compare (Valid|RuneCount)InString* to their (Valid|RuneCount)* counterparts. It shows, somewhat unexpectedly, that ranging over a string is *much* slower than using calls to DecodeRune. Results: In order to avoid a discrepancy in measuring the performance of core we could leave the names of the string-based measurements unchanged and suffix the added alternatives with Bytes. Compared to old: BenchmarkRuneCountTenASCIIChars-8 44.3 12.4 -72.01% BenchmarkRuneCountTenJapaneseChars-8 167 67.1 -59.82% BenchmarkEncodeASCIIRune-8 3.37 3.44 +2.08% BenchmarkEncodeJapaneseRune-8 7.19 7.24 +0.70% BenchmarkDecodeASCIIRune-8 5.41 5.53 +2.22% BenchmarkDecodeJapaneseRune-8 8.17 8.41 +2.94% All benchmarks: BenchmarkRuneCountTenASCIIChars-8 100000000 12.4 ns/op BenchmarkRuneCountTenJapaneseChars-8 20000000 67.1 ns/op BenchmarkRuneCountInStringTenASCIIChars-8 30000000 44.5 ns/op BenchmarkRuneCountInStringTenJapaneseChars-8 10000000 165 ns/op BenchmarkValidTenASCIIChars-8 100000000 12.5 ns/op BenchmarkValidTenJapaneseChars-8 20000000 71.1 ns/op BenchmarkValidStringTenASCIIChars-8 30000000 50.0 ns/op BenchmarkValidStringTenJapaneseChars-8 10000000 161 ns/op BenchmarkEncodeASCIIRune-8 500000000 3.44 ns/op BenchmarkEncodeJapaneseRune-8 200000000 7.24 ns/op BenchmarkDecodeASCIIRune-8 300000000 5.53 ns/op BenchmarkDecodeJapaneseRune-8 200000000 8.41 ns/op BenchmarkFullASCIIRune-8 500000000 3.91 ns/op BenchmarkFullJapaneseRune-8 300000000 4.22 ns/op Change-Id: I674d2ee4917b975a37717bbfa1082cc84dcd275e Reviewed-on: https://go-review.googlesource.com/14431 Reviewed-by: Russ Cox <rsc@golang.org>
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.