annotate contrib/unicode/README @ 158:494b0b89df80 default tip

...
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Mon, 25 May 2020 18:13:55 +0900
parents 1830386684a0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
145
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
1 This directory contains a mechanism for GCC to have its own internal
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
2 implementation of wcwidth functionality. (cpp_wcwidth () in libcpp/charset.c).
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
3
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
4 The idea is to produce the necessary lookup table
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
5 (../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
6 following files that are distributed by the Unicode Consortium:
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
7
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
8 ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
9 ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
10 ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
11
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
12 These three files have been added to source control in this directory;
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
13 please see unicode-license.txt for the relevant copyright information.
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
14
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
15 In order to keep in sync with glibc's wcwidth as much as possible, it is
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
16 desirable for the logic that processes the Unicode data to be the same as
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
17 glibc's. To that end, we also put in this directory, in the from_glibc/
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
18 directory, the glibc python code that implements their logic. This code was
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
19 copied verbatim from glibc, and it can be updated at any time from the glibc
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
20 source code repository. The files copied from that respository are:
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
21
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
22 localedata/unicode-gen/unicode_utils.py
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
23 localedata/unicode-gen/utf8_gen.py
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
24
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
25 And the most recent versions added to GCC are from glibc git commit:
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
26 2a764c6ee848dfe92cb2921ed3b14085f15d9e79
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
27
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
28 Finally, the script gen_wcwidth.py found here contains the GCC-specific code to
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
29 map glibc's output to the lookup tables we require. This script should not need
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
30 to change, unless there are structural changes to the Unicode data files or to
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
31 the glibc code.
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
32
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
33 The procedure to update GCC's wcwidth tables is the following:
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
34
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
35 1. Update the three Unicode data files from the above URLs.
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
36
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
37 2. Update the two glibc files in from_glibc/ from glibc's git. Update
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
38 the commit number above in this README.
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
39
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
40 3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
41 (where X.Y is the version of the Unicode standard corresponding to the
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
42 Unicode data files being used, most recently, 12.1).
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
43
1830386684a0 gcc-9.2.0
anatofuz
parents:
diff changeset
44 After that, GCC's wcwidth will match the most recent glibc.