aboutsummaryrefslogtreecommitdiff
path: root/src/regexp/syntax/parse_test.go
AgeCommit message (Collapse)Author
2025-04-18regexp/syntax: recognize category aliases like \p{Letter}Russ Cox
The Unicode specification defines aliases for some of the general category names. For example the category "L" has alias "Letter". The regexp package supports \p{L} but not \p{Letter}, because there was nothing in the Unicode tables that lets regexp know about Letter. Now that package unicode provides CategoryAliases (see #70780), we can use it to provide \p{Letter} as well. This is the only feature missing from making package regexp suitable for use in a JSON-API Schema implementation. (The official test suite includes usage of aliases like \p{Letter} instead of \p{L}.) For better conformity with Unicode TR18, also accept case-insensitive matches for names and ignore underscores, hyphens, and spaces; and add Any, ASCII, and Assigned. Fixes #70781. Change-Id: I50ff024d99255338fa8d92663881acb47f1e92a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/641377 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
2023-08-16regexp/syntax: use more compact Regexp.String outputRuss Cox
Compact the Regexp.String output. It was only ever intended for debugging, but there are at least some uses in the wild where regexps are built up using regexp/syntax and then formatted using the String method. Compact the output to help that use case. Specifically: - Compact 2-element character class ranges: [a-b] -> [ab]. - Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E). Fixes #57950. Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb Reviewed-on: https://go-review.googlesource.com/c/go/+/507015 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org> Auto-Submit: Russ Cox <rsc@golang.org>
2023-07-31regexp/syntax: accept (?<name>...) syntax as valid captureMauri de Souza Meneguzzo
Currently the only named capture supported by regexp is (?P<name>a). The syntax (?<name>a) is also widely used and there is currently an effort from the Rust regex and RE2 teams to also accept this syntax. Fixes #58458 Change-Id: If22d44d3a5c4e8133ec68238ab130c151ca7c5c5 GitHub-Last-Rev: 31b50e6ab40cfb0f36df6f570525657d4680017f GitHub-Pull-Request: golang/go#61624 Reviewed-on: https://go-review.googlesource.com/c/go/+/513838 Auto-Submit: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-10-05regexp: limit size of parsed regexpsRuss Cox
Set a 128 MB limit on the amount of space used by []syntax.Inst in the compiled form corresponding to a given regexp. Also set a 128 MB limit on the rune storage in the *syntax.Regexp tree itself. Thanks to Adam Korczynski (ADA Logics) and OSS-Fuzz for reporting this issue. Fixes CVE-2022-41715. Fixes #55949. Change-Id: Ia656baed81564436368cf950e1c5409752f28e1b Reviewed-on: https://go-review.googlesource.com/c/go/+/439356 Auto-Submit: Roland Shoemaker <roland@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Roland Shoemaker <roland@golang.org> Reviewed-by: Damien Neil <dneil@google.com>
2022-02-10regexp/syntax: reject very deeply nested regexps in ParseRuss Cox
The regexp code assumes it can recurse over the structure of a regexp safely. Go's growable stacks make that reasonable for all plausible regexps, but implausible ones can reach the “infinite recursion?” stack limit. This CL limits the depth of any parsed regexp to 1000. That is, the depth of the parse tree is required to be ≤ 1000. Regexps that require deeper parse trees will return ErrInternalError. A future CL will change the error to ErrInvalidDepth, but using ErrInternalError for now avoids introducing new API in point releases when this is backported. Fixes #51112. Change-Id: I97d2cd82195946eb43a4ea8561f5b95f91fb14c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/384616 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-22regexp/syntax: exclude full range from String negation caseKeegan Carruthers-Smith
If the char class is 0x0-0x10ffff we mistakenly would String that to `[^]`, which is not a valid regex. Fixes #31807 Change-Id: I9ceeaddc28b67b8e1de12b6703bcb124cc784556 Reviewed-on: https://go-review.googlesource.com/c/go/+/175679 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-26all: use strings.Builder instead of bytes.Buffer where appropriateBrad Fitzpatrick
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test files. I skipped a few on purpose and probably missed a few others, but otherwise I think this should be most of them. Updates #18990 Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774 Reviewed-on: https://go-review.googlesource.com/102479 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-01all: make copyright headers consistent with one space after periodBrad Fitzpatrick
This is a subset of https://golang.org/cl/20022 with only the copyright header lines, so the next CL will be smaller and more reviewable. Go policy has been single space after periods in comments for some time. The copyright header template at: https://golang.org/doc/contribute.html#copyright also uses a single space. Make them all consistent. Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0 Reviewed-on: https://go-review.googlesource.com/20111 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-01-08regexp/syntax: fix factoring of common prefixes in alternationsPaul Wankadia
In the past, `a.*?c|a.*?b` was factored to `a.*?[bc]`. Thus, given "abc" as its input string, the automaton would consume "ab" and then stop (when unanchored) whereas it should consume all of "abc" as per leftmost semantics. Fixes #13812. Change-Id: I67ac0a353d7793b3d0c9c4aaf22d157621dfe784 Reviewed-on: https://go-review.googlesource.com/18357 Reviewed-by: Russ Cox <rsc@golang.org>
2015-12-01regexp/syntax: fix handling of \Q...\ERuss Cox
It's not a group: must handle the inside as a sequence of literal chars, not a single literal string. That is, \Qab\E+ is the same as ab+, not (ab)+. Fixes #11187. Change-Id: I5406d05ccf7efff3a7f15395bdb0cfb2bd23a8ed Reviewed-on: https://go-review.googlesource.com/17233 Reviewed-by: David Crawshaw <crawshaw@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2014-09-30regexp/syntax: reject large repetitions created by nesting small onesRuss Cox
Fixes #7609. LGTM=r R=r CC=golang-codereviews https://golang.org/cl/150270043
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.