regexp/syntax: recognize category aliases like \p{Letter}

The Unicode specification defines aliases for some of the general category names. For example the category "L" has alias "Letter". The regexp package supports \p{L} but not \p{Letter}, because there was nothing in the Unicode tables that lets regexp know about Letter. Now that package unicode provides CategoryAliases (see #70780), we can use it to provide \p{Letter} as well. This is the only feature missing from making package regexp suitable for use in a JSON-API Schema implementation. (The official test suite includes usage of aliases like \p{Letter} instead of \p{L}.) For better conformity with Unicode TR18, also accept case-insensitive matches for names and ignore underscores, hyphens, and spaces; and add Any, ASCII, and Assigned. Fixes #70781. Change-Id: I50ff024d99255338fa8d92663881acb47f1e92a5 Reviewed-on: https://go-review.googlesource.com/c/go/+/641377 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com>
author: Russ Cox <rsc@golang.org> 2025-01-08 11:27:07 -0500
committer: Russ Cox <rsc@golang.org> 2025-04-18 14:13:38 -0700
commit: 930cf59ba8091bfd56c71357085bc7de74daf421 (patch)
tree: 844b9b05136e53cd788676b7b377ef9ef6f5ef86 /src/regexp/syntax/parse_test.go
parent: 28fd9fa8a6de5f5e75a3ca2eeaa55b5ae4a2722b (diff)
download: go-930cf59ba8091bfd56c71357085bc7de74daf421.tar.xz
1 files changed, 6 insertions, 0 deletions
diff --git a/src/regexp/syntax/parse_test.go b/src/regexp/syntax/parse_test.go
index 0f885bd5c8..9d2f698e25 100644
--- a/src/regexp/syntax/parse_test.go
+++ b/src/regexp/syntax/parse_test.go
@@ -107,10 +107,16 @@ var parseTests = []parseTest{
 	{`[\P{^Braille}]`, `cc{0x2800-0x28ff}`},
 	{`[\pZ]`, `cc{0x20 0xa0 0x1680 0x2000-0x200a 0x2028-0x2029 0x202f 0x205f 0x3000}`},
 	{`\p{Lu}`, mkCharClass(unicode.IsUpper)},
+	{`\p{Uppercase_Letter}`, mkCharClass(unicode.IsUpper)},
+	{`\p{upper case-let ter}`, mkCharClass(unicode.IsUpper)},
+	{`\p{__upper case-let ter}`, mkCharClass(unicode.IsUpper)},
 	{`[\p{Lu}]`, mkCharClass(unicode.IsUpper)},
 	{`(?i)[\p{Lu}]`, mkCharClass(isUpperFold)},
 	{`\p{Any}`, `dot{}`},
 	{`\p{^Any}`, `cc{}`},
+	{`(?i)\p{ascii}`, `cc{0x0-0x7f 0x17f 0x212a}`},
+	{`\p{Assigned}`, mkCharClass(func(r rune) bool { return !unicode.In(r, unicode.Cn) })},
+	{`\p{^Assigned}`, mkCharClass(func(r rune) bool { return unicode.In(r, unicode.Cn) })},
 
 	// Hex, octal.
 	{`[\012-\234]\141`, `cat{cc{0xa-0x9c}lit{a}}`},
author	Russ Cox <rsc@golang.org>	2025-01-08 11:27:07 -0500
committer	Russ Cox <rsc@golang.org>	2025-04-18 14:13:38 -0700
commit	930cf59ba8091bfd56c71357085bc7de74daf421 (patch)
tree	844b9b05136e53cd788676b7b377ef9ef6f5ef86 /src/regexp/syntax/parse_test.go
parent	28fd9fa8a6de5f5e75a3ca2eeaa55b5ae4a2722b (diff)
download	go-930cf59ba8091bfd56c71357085bc7de74daf421.tar.xz