diff options
| author | Patrick Steinhardt <ps@pks.im> | 2026-03-19 06:33:23 +0100 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2026-03-19 06:40:09 -0700 |
| commit | 405c98a6a0e017f41f5de9c649a8f6f1b3fc4314 (patch) | |
| tree | 28d9cb4bd22a88b06532ec0c34965a085b0fae04 /tools | |
| parent | fe309664ea804d17812bab22927756dc35e5e955 (diff) | |
| download | git-405c98a6a0e017f41f5de9c649a8f6f1b3fc4314.tar.xz | |
contrib: move "update-unicode.sh" script into "tools/"
The "update-unicode.sh" script is used to update the unicode data
compiled into Git whenever a new version of the Unicode standard has
been released. As such, it is a natural part of our developer-facing
tooling, and its presence in "contrib/" is misleading.
Promote the script into the new "tools/" directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'tools')
| -rw-r--r-- | tools/update-unicode/.gitignore | 3 | ||||
| -rw-r--r-- | tools/update-unicode/README | 20 | ||||
| -rwxr-xr-x | tools/update-unicode/update_unicode.sh | 33 |
3 files changed, 56 insertions, 0 deletions
diff --git a/tools/update-unicode/.gitignore b/tools/update-unicode/.gitignore new file mode 100644 index 0000000000..b0ebc6aad2 --- /dev/null +++ b/tools/update-unicode/.gitignore @@ -0,0 +1,3 @@ +uniset/ +UnicodeData.txt +EastAsianWidth.txt diff --git a/tools/update-unicode/README b/tools/update-unicode/README new file mode 100644 index 0000000000..151a197041 --- /dev/null +++ b/tools/update-unicode/README @@ -0,0 +1,20 @@ +TL;DR: Run update_unicode.sh after the publication of a new Unicode +standard and commit the resulting unicode-widths.h file. + +The long version +================ + +The Git source code ships the file unicode-widths.h which contains +tables of zero and double width Unicode code points, respectively. +These tables are generated using update_unicode.sh in this directory. +update_unicode.sh itself uses a third-party tool, uniset, to query two +Unicode data files for the interesting code points. + +On first run, update_unicode.sh clones uniset from Github and builds it. +This requires a current-ish version of autoconf (2.69 works per December +2016). + +On each run, update_unicode.sh checks whether more recent Unicode data +files are available from the Unicode consortium, and rebuilds the header +unicode-widths.h with the new data. The new header can then be +committed. diff --git a/tools/update-unicode/update_unicode.sh b/tools/update-unicode/update_unicode.sh new file mode 100755 index 0000000000..aa90865bef --- /dev/null +++ b/tools/update-unicode/update_unicode.sh @@ -0,0 +1,33 @@ +#!/bin/sh +#See http://www.unicode.org/reports/tr44/ +# +#Me Enclosing_Mark an enclosing combining mark +#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width) +#Cf Format a format control character +# +cd "$(dirname "$0")" +UNICODEWIDTH_H=$(git rev-parse --show-toplevel)/unicode-width.h + +wget -N http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt \ + http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt && +if ! test -d uniset; then + git clone https://github.com/depp/uniset.git && + ( cd uniset && git checkout 4b186196dd ) +fi && +( + cd uniset && + if ! test -x uniset; then + autoreconf -i && + ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb' + fi && + make +) && +UNICODE_DIR=. && export UNICODE_DIR && +cat >$UNICODEWIDTH_H <<-EOF +static const struct interval zero_width[] = { + $(uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - U+00AD) +}; +static const struct interval double_width[] = { + $(uniset/uniset --32 eaw:F,W) +}; +EOF |
