aboutsummaryrefslogtreecommitdiff
path: root/src/regexp/syntax
AgeCommit message (Collapse)Author
2025-07-30regexp: fix compiling alternate patterns of different fold case literalsitchyny
Fixing Equal method in regexp/syntax package fixes compilation of some alternate patterns like "0A|0[aA]". Fixes #59007 Change-Id: Idd519c6841167f932899b0ada347fb90a38a765e GitHub-Last-Rev: 6f43cbca6361c0d084a12abeec65d7c659a4fe61 GitHub-Pull-Request: golang/go#66165 Reviewed-on: https://go-review.googlesource.com/c/go/+/569735 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-18regexp/syntax: recognize category aliases like \p{Letter}Russ Cox
The Unicode specification defines aliases for some of the general category names. For example the category "L" has alias "Letter". The regexp package supports \p{L} but not \p{Letter}, because there was nothing in the Unicode tables that lets regexp know about Letter. Now that package unicode provides CategoryAliases (see #70780), we can use it to provide \p{Letter} as well. This is the only feature missing from making package regexp suitable for use in a JSON-API Schema implementation. (The official test suite includes usage of aliases like \p{Letter} instead of \p{L}.) For better conformity with Unicode TR18, also accept case-insensitive matches for names and ignore underscores, hyphens, and spaces; and add Any, ASCII, and Assigned. Fixes #70781. Change-Id: I50ff024d99255338fa8d92663881acb47f1e92a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/641377 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2024-09-03all: omit unnecessary 0 in slice expressionnlwkobe30
All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go. Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6 GitHub-Last-Rev: 794aa9b0539811d00e1cd42be1e8d9fe9afe0281 GitHub-Pull-Request: golang/go#69170 Reviewed-on: https://go-review.googlesource.com/c/go/+/609678 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-04-02regexp/syntax: cleanup code generation in perl_groups.goOlivier Mengué
Cleanup code generation of perl_groups.go: * Fix the generated code header to follow the standard https://go.dev/s/generatedcode * Apply gofmt as last step of code generation * Add //go:generate lines in parse.go to trigger code generation * Adapt make_perl_groups.pl to handle writing directly to the output file (as we can't use shell redirection in go:generate lines) * use strict; use warnings; Change-Id: I675241da03dd3f6facc9eb9de00999bd9203ad70 Reviewed-on: https://go-review.googlesource.com/c/go/+/555995 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-29crypto/tls,regexp: remove always-nil error resultsDaniel Martí
These were harmless, but added unnecessary verbosity to the code. This can happen as a result of refactors: for example, the method sessionState used to return errors in some cases. Change-Id: I4e6dacc01ae6a49b528c672979f95cbb86795a85 Reviewed-on: https://go-review.googlesource.com/c/go/+/528995 Reviewed-by: Leo Isla <islaleo93@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Olivier Mengué <olivier.mengue@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com> Reviewed-by: Quim Muntal <quimmuntal@gmail.com>
2024-03-29regexp/syntax: use the Regexp.Equal static method directlyDaniel Martí
A follow-up for the recent https://go.dev/cl/573978. Change-Id: I0e75ca0b37d9ef063bbdfb88d4d2e34647e0ee50 Reviewed-on: https://go-review.googlesource.com/c/go/+/574677 Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-26regexp/syntax: simplify the codeapocelipes
Use the slices package and the built-in max to simplify the code. There's no noticeable performance change in this modification. Change-Id: I96e46ba8ab1323f1ba0b8c9b827836e217772cf2 GitHub-Last-Rev: f0111ac7e220f7dac03290125a3a83831012f235 GitHub-Pull-Request: golang/go#66511 Reviewed-on: https://go-review.googlesource.com/c/go/+/573978 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: qiulaidongfeng <2645477756@qq.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>
2024-03-19regexp/syntax: use standard generated code headerOleksandr Redko
Updates doc by running these commands: - mksyntaxgo from the google/re2 repo, which changes comment according to https://golang.org/s/generatedcode - gofmt -w regexp/syntax/doc.go Change-Id: I66a9dd9fa841cbce899ab3aa32d7face798d2920 Reviewed-on: https://go-review.googlesource.com/c/go/+/572275 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
2024-02-26regexp: add available godoc linkcui fliter
Signed-off-by: cui fliter <imcusg@gmail.com> Change-Id: I85339293d4cfb691125f991ec7162e9be186efdc Reviewed-on: https://go-review.googlesource.com/c/go/+/539599 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>
2024-02-20regexp/syntax: regenerate docs with mksyntaxgoMauri de Souza Meneguzzo
This makes the docs up-to-date by running doc/mksyntaxgo from the google/re2 repo. Change-Id: I80358eed071e7566c85edaeb1cc5514a6d8c37a7 GitHub-Last-Rev: 0f8c8df4f213ce89fbea89e81f0ea3babd59d38f GitHub-Pull-Request: golang/go#65249 Reviewed-on: https://go-review.googlesource.com/c/go/+/558136 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-03regexp/syntax: use min funcqiulaidongfeng
Change-Id: I679c906057577d4a795c07a2f572b969c3ee14d5 GitHub-Last-Rev: fba371d2d6bfc7fbf3a93a53bb22039acf65d7cf GitHub-Pull-Request: golang/go#63350 Reviewed-on: https://go-review.googlesource.com/c/go/+/532218 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-09-09all: calculate the median uniformlyJes Cok
Like sort.Search, use "h := int(uint(i+j) >> 1)" style code to calculate the median. Change-Id: Ifb1a19dde1c6ed6c1654bc642fc9565a8b6c5fc4 GitHub-Last-Rev: e2213b738832f1674948d6507f40e2c0b98cb972 GitHub-Pull-Request: golang/go#62503 Reviewed-on: https://go-review.googlesource.com/c/go/+/526496 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2023-08-16regexp/syntax: use more compact Regexp.String outputRuss Cox
Compact the Regexp.String output. It was only ever intended for debugging, but there are at least some uses in the wild where regexps are built up using regexp/syntax and then formatted using the String method. Compact the output to help that use case. Specifically: - Compact 2-element character class ranges: [a-b] -> [ab]. - Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E). Fixes #57950. Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb Reviewed-on: https://go-review.googlesource.com/c/go/+/507015 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org> Auto-Submit: Russ Cox <rsc@golang.org>
2023-07-31regexp/syntax: accept (?<name>...) syntax as valid captureMauri de Souza Meneguzzo
Currently the only named capture supported by regexp is (?P<name>a). The syntax (?<name>a) is also widely used and there is currently an effort from the Rust regex and RE2 teams to also accept this syntax. Fixes #58458 Change-Id: If22d44d3a5c4e8133ec68238ab130c151ca7c5c5 GitHub-Last-Rev: 31b50e6ab40cfb0f36df6f570525657d4680017f GitHub-Pull-Request: golang/go#61624 Reviewed-on: https://go-review.googlesource.com/c/go/+/513838 Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-04-11all: re-run stringerIan Lance Taylor
Re-run all go:generate stringer commands. This mostly adds checks that the constant values did not change, but does add new strings for the debug/dwarf and internal/pkgbits packages. Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf Reviewed-on: https://go-review.googlesource.com/c/go/+/483717 Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-03-14regexp/syntax: test for lowercase letters first in IsWordCharLudi Rehak
Lowercase letters occur more frequently than uppercase letters in English text. In IsWordChar, evaluate the most common case (lowercase letters) first to minimize the expected value of its execution time. Code clarity does not suffer by rearranging the order of the checks. Add a benchmark on a sentence demonstrating the performance improvement. name old time/op new time/op delta IsWordChar-10 122ns ± 0% 114ns ± 1% -6.68% (p=0.000 n=8+10) Change-Id: Ieee8126a4bd8ee8703905b4f75724623029f6fa2 Reviewed-on: https://go-review.googlesource.com/c/go/+/404100 Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: thepudds <thepudds1460@gmail.com>
2022-11-02regexp: add ErrLarge errorcuiweixie
For #56041 Change-Id: I6c98458b5c0d3b3636a53ee04fc97221f3fd8bbc Reviewed-on: https://go-review.googlesource.com/c/go/+/444817 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: xie cui <523516579@qq.com>
2022-10-05regexp: limit size of parsed regexpsRuss Cox
Set a 128 MB limit on the amount of space used by []syntax.Inst in the compiled form corresponding to a given regexp. Also set a 128 MB limit on the rune storage in the *syntax.Regexp tree itself. Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue. Fixes CVE-2022-41715. Fixes #55949. Change-Id: Ia656baed81564436368cf950e1c5409752f28e1b Reviewed-on: https://go-review.googlesource.com/c/go/+/439356 Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
2022-10-02regexp: fix a few function names on commentscui fliter
Change-Id: I192dd34c677e52e16f0ef78e1dae58a78f6d1aac GitHub-Last-Rev: 1638a7468951df72f13fea34855b6a4fcbb08226 GitHub-Pull-Request: golang/go#55967 Reviewed-on: https://go-review.googlesource.com/c/go/+/436885 Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-04-29regexp/syntax: fix typo in commentLudi Rehak
Fix typo in comment describing IsWordChar. Change-Id: Ia283813cf5662e218ee6d0411fb0c1b1ad1021f3 Reviewed-on: https://go-review.googlesource.com/c/go/+/393435 Auto-Submit: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-22regexp/syntax: rename ErrInvalidDepth to ErrNestingDepthIan Lance Taylor
The proposal accepted the name ErrNestingDepth. For #51684 Change-Id: I702365f19e5e1889dbcc3c971eecff68e0b08727 Reviewed-on: https://go-review.googlesource.com/c/go/+/401854 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-22regexp: change ErrInvalidDepth message to match proposalIan Lance Taylor
Also update the file in $GOROOT/api/next to use proposal number. For #51684 Change-Id: I28bfa6bc1cee98a17b13da196d41cda34d968bb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/401076 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2022-04-11all: gofmt main repoRuss Cox
[This CL is part of a sequence implementing the proposal #51082. The design doc is at https://go.dev/s/godocfmt-design.] Run the updated gofmt, which reformats doc comments, on the main repository. Vendored files are excluded. For #51082. Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407 Reviewed-on: https://go-review.googlesource.com/c/go/+/384268 Reviewed-by: Jonathan Amsterdam <jba@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-05all: replace `` and '' with “ (U+201C) and ” (U+201D) in doc commentsRuss Cox
go/doc in all its forms applies this replacement when rendering the comments. We are considering formatting doc comments, including doing this replacement as part of the formatting. Apply it to our source files ahead of time. For #51082. Change-Id: Ifcc1f5861abb57c5d14e7d8c2102dfb31b7a3a19 Reviewed-on: https://go-review.googlesource.com/c/go/+/384262 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-04regexp/syntax: add and use ErrInvalidDepthRuss Cox
The fix for #51112 introduced a depth check but used ErrInternalError to avoid introduce new API in a CL that would be backported to earlier releases. New API accepted in proposal #51684. This CL adds a distinct error for this case. For #51112. Fixes #51684. Change-Id: I068fc70aafe4218386a06103d9b7c847fb7ffa65 Reviewed-on: https://go-review.googlesource.com/c/go/+/384617 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01all: remove trailing blank doc comment linesRuss Cox
A future change to gofmt will rewrite // Doc comment. // func f() to // Doc comment. func f() Apply that change preemptively to all doc comments. For #51082. Change-Id: I4023e16cfb0729b64a8590f071cd92f17343081d Reviewed-on: https://go-review.googlesource.com/c/go/+/384259 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-10regexp/syntax: reject very deeply nested regexps in ParseRuss Cox
The regexp code assumes it can recurse over the structure of a regexp safely. Go's growable stacks make that reasonable for all plausible regexps, but implausible ones can reach the “infinite recursion?” stack limit. This CL limits the depth of any parsed regexp to 1000. That is, the depth of the parse tree is required to be ≤ 1000. Regexps that require deeper parse trees will return ErrInternalError. A future CL will change the error to ErrInvalidDepth, but using ErrInternalError for now avoids introducing new API in point releases when this is backported. Fixes #51112. Change-Id: I97d2cd82195946eb43a4ea8561f5b95f91fb14c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/384616 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-10-11regexp: document and implement that invalid UTF-8 bytes are the same as U+FFFDRuss Cox
What should it mean to run a regexp match on invalid UTF-8 bytes? The coherent behavior options are: 1. Invalid UTF-8 does not match any character classes, nor a U+FFFD literal (nor \x{fffd}). 2. Each byte of invalid UTF-8 is treated identically to a U+FFFD in the input, as a utf8.DecodeRune loop might. RE2 uses Rule 1. Because it works byte at a time, it can also provide \C to match any single byte of input, which matches invalid UTF-8 as well. This provides the nice property that a match for a regexp without \C is guaranteed to be valid UTF-8. Unfortunately, today Go has an incoherent mix of these two, although mostly Rule 2. This is a deviation from RE2, and it gives up the nice property, but we probably can't correct that at this point. In particular .* already matches entire inputs today, valid UTF-8 or not, and I doubt we can break that. This CL adopts Rule 2 officially, fixing the few places that deviate from it. Fixes #48749. Change-Id: I96402527c5dfb1146212f568ffa09dde91d71244 Reviewed-on: https://go-review.googlesource.com/c/go/+/354569 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
2021-10-06all: use bytes.Cut, strings.CutRuss Cox
Many uses of Index/IndexByte/IndexRune/Split/SplitN can be written more clearly using the new Cut functions. Do that. Also rewrite to other functions if that's clearer. For #46336. Change-Id: I68d024716ace41a57a8bf74455c62279bde0f448 Reviewed-on: https://go-review.googlesource.com/c/go/+/351711 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2021-05-13regexp: fix repeat of preferred empty matchRuss Cox
In Perl mode, (|a)* should match an empty string at the start of the input. Instead it matches as many a's as possible. Because (|a)+ is handled correctly, matching only an empty string, this leads to the paradox that e* can match more text than e+ (for e = (|a)) and that e+ is sometimes different from ee*. This is a very old bug that ultimately derives from the picture I drew for e* in https://swtch.com/~rsc/regexp/regexp1.html. The picture is correct for longest-match (POSIX) regexps but subtly wrong for preferred-match (Perl) regexps in the case where e has a preferred empty match. Pointed out by Andrew Gallant in private mail. The current code treats e* and e+ as the same structure, with different entry points. In the case of e* the preference list ends up not quite in the right order, in part because the “before e” and “after e” states are the same state. Splitting them apart fixes the preference list, and that can be done by compiling e* as if it were (e+)?. Like with any bug fix, there is a very low chance of breaking a program that accidentally depends on the buggy behavior. RE2, Go, and Rust all have this bug, and we've all agreed to fix it, to keep the implementations in sync. Fixes #46123. Change-Id: I70e742e71e0a23b626593b16ddef3c1e73b413b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/318750 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org> TryBot-Result: Go Bot <gobot@golang.org>
2020-11-25regexp/syntax: add note about Unicode character classesTom Payne
As proposed on golang-nuts: https://groups.google.com/g/golang-nuts/c/M3lmSUptExQ/m/hRySV9GsCAAJ Includes the latest updates from re2's mksyntaxgo: https://code.googlesource.com/re2/+/refs/heads/master/doc/mksyntaxgo Change-Id: Ib7b79aa6531f473feabd0a7f1d263cd65c4388e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/264678 Reviewed-by: Russ Cox <rsc@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com>
2020-06-18regexp/syntax: append patchLists in constant timeAndy Balholm
By keeping a tail pointer, we can append to a patchList in constant time, rather than in time proportional to the length of the list. This gets rid of the quadratic compile times we were seeing for long series of alternations. This is basically the same change as https://github.com/google/re2/commit/e9d517989f66f2e0a24cde42f4d2424dd3e4a9b9. Fixes #39542. Change-Id: Ib4ca0ca9c55abd1594df1984653c7d311ccf7572 Reviewed-on: https://go-review.googlesource.com/c/go/+/238079 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-04-17regexp/syntax: fix comment on p.literal and simplifyRuss Cox
p.literal's doc comment said it returned a value but it doesn't. While we're here, p.newLiteral is only called from p.literal, so simplify the code by merging the two. Change-Id: Ia357937a99f4e7473f0f1ec837113a39eaeb83d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/222659 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-22regexp/syntax: exclude full range from String negation caseKeegan Carruthers-Smith
If the char class is 0x0-0x10ffff we mistakenly would String that to `[^]`, which is not a valid regex. Fixes #31807 Change-Id: I9ceeaddc28b67b8e1de12b6703bcb124cc784556 Reviewed-on: https://go-review.googlesource.com/c/go/+/175679 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-02all: use "reports whether" consistently in the few places that didn'tBrad Fitzpatrick
Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-22regexp/syntax: don't do both linear and binary sesarch in MatchRunePosIan Lance Taylor
MatchRunePos is a significant element of regexp performance, so some attention to optimization is appropriate. Before this CL, a non-matching rune would do both a linear search in the first four entries, and a binary search over all the entries. Change the code to optimize for the common case of two runes, to only do a linear search when there are up to four entries, and to only do a binary search when there are more than four entries. Updates #26623 name old time/op new time/op delta Find-12 260ns ± 1% 275ns ± 7% +5.84% (p=0.000 n=8+10) FindAllNoMatches-12 144ns ± 9% 143ns ±12% ~ (p=0.187 n=10+10) FindString-12 256ns ± 4% 254ns ± 1% ~ (p=0.357 n=9+8) FindSubmatch-12 587ns ±12% 593ns ±11% ~ (p=0.516 n=10+10) FindStringSubmatch-12 534ns ±12% 525ns ±14% ~ (p=0.565 n=10+10) Literal-12 104ns ±14% 106ns ±11% ~ (p=0.145 n=10+10) NotLiteral-12 1.51µs ± 8% 1.47µs ± 2% ~ (p=0.508 n=10+9) MatchClass-12 2.47µs ± 1% 2.26µs ± 6% -8.55% (p=0.000 n=8+10) MatchClass_InRange-12 2.18µs ± 5% 2.25µs ±11% +2.85% (p=0.009 n=9+10) ReplaceAll-12 2.35µs ± 6% 2.08µs ±23% -11.59% (p=0.010 n=9+10) AnchoredLiteralShortNonMatch-12 93.2ns ± 9% 93.2ns ±11% ~ (p=0.716 n=10+10) AnchoredLiteralLongNonMatch-12 118ns ±10% 117ns ± 9% ~ (p=0.802 n=10+10) AnchoredShortMatch-12 142ns ± 1% 141ns ± 1% -0.53% (p=0.007 n=8+8) AnchoredLongMatch-12 303ns ± 9% 304ns ± 6% ~ (p=0.724 n=10+10) OnePassShortA-12 620ns ± 1% 618ns ± 9% ~ (p=0.162 n=8+10) NotOnePassShortA-12 599ns ± 8% 568ns ± 1% -5.21% (p=0.000 n=10+8) OnePassShortB-12 525ns ± 7% 489ns ± 1% -6.93% (p=0.000 n=10+8) NotOnePassShortB-12 449ns ± 9% 431ns ±11% -4.05% (p=0.033 n=10+10) OnePassLongPrefix-12 119ns ± 6% 114ns ± 0% -3.88% (p=0.006 n=10+9) OnePassLongNotPrefix-12 420ns ± 9% 410ns ± 7% ~ (p=0.645 n=10+9) MatchParallelShared-12 376ns ± 0% 375ns ± 0% -0.45% (p=0.003 n=8+10) MatchParallelCopied-12 39.4ns ± 1% 39.1ns ± 0% -0.55% (p=0.004 n=10+9) QuoteMetaAll-12 139ns ± 7% 142ns ± 7% ~ (p=0.445 n=10+10) QuoteMetaNone-12 56.7ns ± 0% 61.3ns ± 7% +8.03% (p=0.001 n=8+10) Match/Easy0/32-12 83.4ns ± 7% 83.1ns ± 8% ~ (p=0.541 n=10+10) Match/Easy0/1K-12 417ns ± 8% 394ns ± 6% ~ (p=0.059 n=10+9) Match/Easy0/32K-12 7.05µs ± 8% 7.30µs ± 9% ~ (p=0.190 n=10+10) Match/Easy0/1M-12 291µs ±17% 284µs ±10% ~ (p=0.481 n=10+10) Match/Easy0/32M-12 9.89ms ± 4% 10.27ms ± 8% ~ (p=0.315 n=10+10) Match/Easy0i/32-12 1.13µs ± 1% 1.14µs ± 1% +1.51% (p=0.000 n=8+8) Match/Easy0i/1K-12 35.7µs ±11% 36.8µs ±10% ~ (p=0.143 n=10+10) Match/Easy0i/32K-12 1.70ms ± 7% 1.72ms ± 7% ~ (p=0.776 n=9+6) name old alloc/op new alloc/op delta Find-12 0.00B 0.00B ~ (all equal) FindAllNoMatches-12 0.00B 0.00B ~ (all equal) FindString-12 0.00B 0.00B ~ (all equal) FindSubmatch-12 48.0B ± 0% 48.0B ± 0% ~ (all equal) FindStringSubmatch-12 32.0B ± 0% 32.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Find-12 0.00 0.00 ~ (all equal) FindAllNoMatches-12 0.00 0.00 ~ (all equal) FindString-12 0.00 0.00 ~ (all equal) FindSubmatch-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) FindStringSubmatch-12 1.00 ± 0% 1.00 ± 0% ~ (all equal) name old speed new speed delta QuoteMetaAll-12 101MB/s ± 8% 99MB/s ± 7% ~ (p=0.529 n=10+10) QuoteMetaNone-12 458MB/s ± 0% 425MB/s ± 8% -7.22% (p=0.003 n=8+10) Match/Easy0/32-12 385MB/s ± 7% 386MB/s ± 7% ~ (p=0.579 n=10+10) Match/Easy0/1K-12 2.46GB/s ± 8% 2.60GB/s ± 6% ~ (p=0.065 n=10+9) Match/Easy0/32K-12 4.66GB/s ± 7% 4.50GB/s ±10% ~ (p=0.190 n=10+10) Match/Easy0/1M-12 3.63GB/s ±15% 3.70GB/s ± 9% ~ (p=0.481 n=10+10) Match/Easy0/32M-12 3.40GB/s ± 4% 3.28GB/s ± 8% ~ (p=0.315 n=10+10) Match/Easy0i/32-12 28.4MB/s ± 1% 28.0MB/s ± 1% -1.50% (p=0.000 n=8+8) Match/Easy0i/1K-12 28.8MB/s ±10% 27.9MB/s ±11% ~ (p=0.143 n=10+10) Match/Easy0i/32K-12 19.0MB/s ±14% 19.1MB/s ± 8% ~ (p=1.000 n=10+6) Change-Id: I238a451b36ad84b0f5534ff0af5c077a0d52d73a Reviewed-on: https://go-review.googlesource.com/130417 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-01all: update comment URLs from HTTP to HTTPS, where possibleTim Cooper
Each URL was manually verified to ensure it did not serve up incorrect content. Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df Reviewed-on: https://go-review.googlesource.com/115798 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>
2018-03-29regexp/syntax: update perl script to preserve \s behaviorIan Lance Taylor
Incorporate https://code-review.googlesource.com/#/c/re2/+/3050/ from the re2 repository. Description of that change: Preserve the original behaviour of \s. Prior to Perl 5.18, \s did not match vertical tab. Bake that into make_perl_groups.pl as an override so that perl_groups.cc retains its current definitions when rebuilt with newer versions of Perl. This fixes make_perl_groups.pl to generate an unchanged perl_groups.go with perl versions 5.18 and later. Fixes #22057 Change-Id: I9a56e9660092ed6c1ff1045b4a3847de355441a7 Reviewed-on: https://go-review.googlesource.com/103517 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-26all: use strings.Builder instead of bytes.Buffer where appropriateBrad Fitzpatrick
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test files. I skipped a few on purpose and probably missed a few others, but otherwise I think this should be most of them. Updates #18990 Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774 Reviewed-on: https://go-review.googlesource.com/102479 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-24all: remove some unused return parametersDaniel Martí
As found by unparam. Picked the low-hanging fruit, consisting only of errors that were always nil and results that were never used. Left out those that were useful for consistency with other func signatures. Change-Id: I06b52bbd3541f8a5d66659c909bd93cb3e172018 Reviewed-on: https://go-review.googlesource.com/102418 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20regexp/syntax: make Op an fmt.StringerDaniel Martí
Using stringer. Fixes #22684. Change-Id: I62fbde5dcb337cf269686615616bd39a27491ac1 Reviewed-on: https://go-review.googlesource.com/95355 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-12-01Revert "go/printer: forbid empty line before first comment in block"Joe Tsai
This reverts commit 08f19bbde1b01227fdc2fa2d326e4029bb74dd96. Reason for revert: The changed transformation takes effect on a larger set of code snippets than expected. For example, this: func foo() { // Comment bar() } becomes: func foo() { // Comment bar() } This is an unintended consequence. Change-Id: Ifca88d6267dab8a8170791f7205124712bf8ace8 Reviewed-on: https://go-review.googlesource.com/81335 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Joe Tsai <joetsai@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-11-02go/printer: forbid empty line before first comment in blockJoe Tsai
To improve readability when exported fields are removed, forbid the printer from emitting an empty line before the first comment in a const, var, or type block. Also, when printing the "Has filtered or unexported fields." message, add an empty line before it to separate the message from the struct or interfact contents. Before the change: <<< type NamedArg struct { // Name is the name of the parameter placeholder. // // If empty, the ordinal position in the argument list will be // used. // // Name must omit any symbol prefix. Name string // Value is the value of the parameter. // It may be assigned the same value types as the query // arguments. Value interface{} // contains filtered or unexported fields } >>> After the change: <<< type NamedArg struct { // Name is the name of the parameter placeholder. // // If empty, the ordinal position in the argument list will be // used. // // Name must omit any symbol prefix. Name string // Value is the value of the parameter. // It may be assigned the same value types as the query // arguments. Value interface{} // contains filtered or unexported fields } >>> Fixes #18264 Change-Id: I9fe17ca39cf92fcdfea55064bd2eaa784ce48c88 Reviewed-on: https://go-review.googlesource.com/71990 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-09-08regexp: Remove duplicated function wordRune()Sylvain Zimmer
Fixes #21742 Change-Id: Ib56b092c490c27a4ba7ebdb6391f1511794710b8 Reviewed-on: https://go-review.googlesource.com/61034 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2017-03-06regexp/syntax: remove unused flags parameterDaniel Martí
Found by github.com/mvdan/unparam. Change-Id: I186d2afd067e97eb05d65c4599119b347f82867d Reviewed-on: https://go-review.googlesource.com/37840 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-06-28unicode: upgrade to version 9.0.0Marcel van Lohuizen
Changes beyond generated tables: - Now supports aliases to handle deprecated property classes. - Some Mongolian letters are now modifiers. Other changes: - strconv: newly generated table to be in sync - regexp/syntax: updated maxFold Fixes #16191 Change-Id: I56bdf21ee2f775f2a82d0465b3772faf5c24cb61 Reviewed-on: https://go-review.googlesource.com/24496 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-05-18regexp/syntax: clarify that \Z means Perl's \ZBrad Fitzpatrick
Fixes #14793 Change-Id: I408056d096cd6a999fa5e349704b5ea8e26d2e4e Reviewed-on: https://go-review.googlesource.com/23201 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-02all: make copyright headers consistent with one space after periodEmmanuel Odeke
Follows suit with https://go-review.googlesource.com/#/c/20111. Generated by running $ grep -R 'Go Authors. All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go Authors. All/Go Authors. All/g' $F;done The code in cmd/internal/unvendor wasn't changed. Fixes #15213 Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f Reviewed-on: https://go-review.googlesource.com/21819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02all: single space after period.Brad Fitzpatrick
The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01all: make copyright headers consistent with one space after periodBrad Fitzpatrick
This is a subset of https://golang.org/cl/20022 with only the copyright header lines, so the next CL will be smaller and more reviewable. Go policy has been single space after periods in comments for some time. The copyright header template at: https://golang.org/doc/contribute.html#copyright also uses a single space. Make them all consistent. Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0 Reviewed-on: https://go-review.googlesource.com/20111 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>