CbC/CbC_gcc: libcpp/charset.c annotate

author	kent <kent@cr.ie.u-ryukyu.ac.jp>
date	Fri, 17 Jul 2009 14:47:48 +0900
parents
children	77e2b8dfacca

rev	line source
0 a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	1 /* CPP Library - charsets
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	2 Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2008, 2009
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	3 Free Software Foundation, Inc.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	4
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	5 Broken out of c-lex.c Apr 2003, adding valid C99 UCN ranges.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	6
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	7 This program is free software; you can redistribute it and/or modify it
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	8 under the terms of the GNU General Public License as published by the
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	9 Free Software Foundation; either version 3, or (at your option) any
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	10 later version.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	11
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	12 This program is distributed in the hope that it will be useful,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	13 but WITHOUT ANY WARRANTY; without even the implied warranty of
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	14 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	15 GNU General Public License for more details.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	16
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	17 You should have received a copy of the GNU General Public License
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	18 along with this program; see the file COPYING3. If not see
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	19 <http://www.gnu.org/licenses/>. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	20
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	21 #include "config.h"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	22 #include "system.h"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	23 #include "cpplib.h"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	24 #include "internal.h"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	25
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	26 /* Character set handling for C-family languages.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	27
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	28 Terminological note: In what follows, "charset" or "character set"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	29 will be taken to mean both an abstract set of characters and an
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	30 encoding for that set.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	31
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	32 The C99 standard discusses two character sets: source and execution.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	33 The source character set is used for internal processing in translation
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	34 phases 1 through 4; the execution character set is used thereafter.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	35 Both are required by 5.2.1.2p1 to be multibyte encodings, not wide
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	36 character encodings (see 3.7.2, 3.7.3 for the standardese meanings
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	37 of these terms). Furthermore, the "basic character set" (listed in
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	38 5.2.1p3) is to be encoded in each with values one byte wide, and is
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	39 to appear in the initial shift state.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	40
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	41 It is not explicitly mentioned, but there is also a "wide execution
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	42 character set" used to encode wide character constants and wide
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	43 string literals; this is supposed to be the result of applying the
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	44 standard library function mbstowcs() to an equivalent narrow string
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	45 (6.4.5p5). However, the behavior of hexadecimal and octal
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	46 \-escapes is at odds with this; they are supposed to be translated
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	47 directly to wchar_t values (6.4.4.4p5,6).
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	48
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	49 The source character set is not necessarily the character set used
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	50 to encode physical source files on disk; translation phase 1 converts
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	51 from whatever that encoding is to the source character set.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	52
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	53 The presence of universal character names in C99 (6.4.3 et seq.)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	54 forces the source character set to be isomorphic to ISO 10646,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	55 that is, Unicode. There is no such constraint on the execution
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	56 character set; note also that the conversion from source to
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	57 execution character set does not occur for identifiers (5.1.1.2p1#5).
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	58
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	59 For convenience of implementation, the source character set's
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	60 encoding of the basic character set should be identical to the
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	61 execution character set OF THE HOST SYSTEM's encoding of the basic
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	62 character set, and it should not be a state-dependent encoding.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	63
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	64 cpplib uses UTF-8 or UTF-EBCDIC for the source character set,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	65 depending on whether the host is based on ASCII or EBCDIC (see
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	66 respectively Unicode section 2.3/ISO10646 Amendment 2, and Unicode
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	67 Technical Report #16). With limited exceptions, it relies on the
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	68 system library's iconv() primitive to do charset conversion
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	69 (specified in SUSv2). */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	70
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	71 #if !HAVE_ICONV
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	72 /* Make certain that the uses of iconv(), iconv_open(), iconv_close()
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	73 below, which are guarded only by if statements with compile-time
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	74 constant conditions, do not cause link errors. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	75 #define iconv_open(x, y) (errno = EINVAL, (iconv_t)-1)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	76 #define iconv(a,b,c,d,e) (errno = EINVAL, (size_t)-1)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	77 #define iconv_close(x) (void)0
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	78 #define ICONV_CONST
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	79 #endif
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	80
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	81 #if HOST_CHARSET == HOST_CHARSET_ASCII
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	82 #define SOURCE_CHARSET "UTF-8"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	83 #define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0x7e
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	84 #elif HOST_CHARSET == HOST_CHARSET_EBCDIC
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	85 #define SOURCE_CHARSET "UTF-EBCDIC"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	86 #define LAST_POSSIBLY_BASIC_SOURCE_CHAR 0xFF
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	87 #else
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	88 #error "Unrecognized basic host character set"
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	89 #endif
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	90
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	91 #ifndef EILSEQ
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	92 #define EILSEQ EINVAL
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	93 #endif
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	94
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	95 /* This structure is used for a resizable string buffer throughout. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	96 /* Don't call it strbuf, as that conflicts with unistd.h on systems
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	97 such as DYNIX/ptx where unistd.h includes stropts.h. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	98 struct _cpp_strbuf
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	99 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	100 uchar *text;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	101 size_t asize;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	102 size_t len;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	103 };
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	104
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	105 /* This is enough to hold any string that fits on a single 80-column
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	106 line, even if iconv quadruples its size (e.g. conversion from
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	107 ASCII to UTF-32) rounded up to a power of two. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	108 #define OUTBUF_BLOCK_SIZE 256
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	109
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	110 /* Conversions between UTF-8 and UTF-16/32 are implemented by custom
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	111 logic. This is because a depressing number of systems lack iconv,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	112 or have have iconv libraries that do not do these conversions, so
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	113 we need a fallback implementation for them. To ensure the fallback
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	114 doesn't break due to neglect, it is used on all systems.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	115
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	116 UTF-32 encoding is nice and simple: a four-byte binary number,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	117 constrained to the range 00000000-7FFFFFFF to avoid questions of
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	118 signedness. We do have to cope with big- and little-endian
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	119 variants.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	120
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	121 UTF-16 encoding uses two-byte binary numbers, again in big- and
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	122 little-endian variants, for all values in the 00000000-0000FFFF
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	123 range. Values in the 00010000-0010FFFF range are encoded as pairs
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	124 of two-byte numbers, called "surrogate pairs": given a number S in
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	125 this range, it is mapped to a pair (H, L) as follows:
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	126
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	127 H = (S - 0x10000) / 0x400 + 0xD800
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	128 L = (S - 0x10000) % 0x400 + 0xDC00
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	129
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	130 Two-byte values in the D800...DFFF range are ill-formed except as a
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	131 component of a surrogate pair. Even if the encoding within a
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	132 two-byte value is little-endian, the H member of the surrogate pair
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	133 comes first.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	134
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	135 There is no way to encode values in the 00110000-7FFFFFFF range,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	136 which is not currently a problem as there are no assigned code
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	137 points in that range; however, the author expects that it will
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	138 eventually become necessary to abandon UTF-16 due to this
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	139 limitation. Note also that, because of these pairs, UTF-16 does
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	140 not meet the requirements of the C standard for a wide character
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	141 encoding (see 3.7.3 and 6.4.4.4p11).
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	142
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	143 UTF-8 encoding looks like this:
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	144
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	145 value range encoded as
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	146 00000000-0000007F 0xxxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	147 00000080-000007FF 110xxxxx 10xxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	148 00000800-0000FFFF 1110xxxx 10xxxxxx 10xxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	149 00010000-001FFFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	150 00200000-03FFFFFF 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	151 04000000-7FFFFFFF 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	152
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	153 Values in the 0000D800 ... 0000DFFF range (surrogates) are invalid,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	154 which means that three-byte sequences ED xx yy, with A0 <= xx <= BF,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	155 never occur. Note also that any value that can be encoded by a
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	156 given row of the table can also be encoded by all successive rows,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	157 but this is not done; only the shortest possible encoding for any
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	158 given value is valid. For instance, the character 07C0 could be
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	159 encoded as any of DF 80, E0 9F 80, F0 80 9F 80, F8 80 80 9F 80, or
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	160 FC 80 80 80 9F 80. Only the first is valid.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	161
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	162 An implementation note: the transformation from UTF-16 to UTF-8, or
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	163 vice versa, is easiest done by using UTF-32 as an intermediary. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	164
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	165 /* Internal primitives which go from an UTF-8 byte stream to native-endian
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	166 UTF-32 in a cppchar_t, or vice versa; this avoids an extra marshal/unmarshal
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	167 operation in several places below. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	168 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	169 one_utf8_to_cppchar (const uchar *inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	170 cppchar_t *cp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	171 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	172 static const uchar masks[6] = { 0x7F, 0x1F, 0x0F, 0x07, 0x02, 0x01 };
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	173 static const uchar patns[6] = { 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	174
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	175 cppchar_t c;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	176 const uchar inbuf = inbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	177 size_t nbytes, i;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	178
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	179 if (*inbytesleftp < 1)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	180 return EINVAL;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	181
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	182 c = *inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	183 if (c < 0x80)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	184 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	185 *cp = c;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	186 *inbytesleftp -= 1;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	187 *inbufp += 1;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	188 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	189 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	190
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	191 /* The number of leading 1-bits in the first byte indicates how many
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	192 bytes follow. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	193 for (nbytes = 2; nbytes < 7; nbytes++)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	194 if ((c & ~masks[nbytes-1]) == patns[nbytes-1])
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	195 goto found;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	196 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	197 found:
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	198
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	199 if (*inbytesleftp < nbytes)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	200 return EINVAL;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	201
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	202 c = (c & masks[nbytes-1]);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	203 inbuf++;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	204 for (i = 1; i < nbytes; i++)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	205 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	206 cppchar_t n = *inbuf++;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	207 if ((n & 0xC0) != 0x80)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	208 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	209 c = ((c << 6) + (n & 0x3F));
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	210 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	211
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	212 /* Make sure the shortest possible encoding was used. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	213 if (c <= 0x7F && nbytes > 1) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	214 if (c <= 0x7FF && nbytes > 2) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	215 if (c <= 0xFFFF && nbytes > 3) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	216 if (c <= 0x1FFFFF && nbytes > 4) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	217 if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	218
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	219 /* Make sure the character is valid. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	220 if (c > 0x7FFFFFFF \|\| (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	221
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	222 *cp = c;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	223 *inbufp = inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	224 *inbytesleftp -= nbytes;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	225 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	226 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	227
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	228 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	229 one_cppchar_to_utf8 (cppchar_t c, uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	230 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	231 static const uchar masks[6] = { 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	232 static const uchar limits[6] = { 0x80, 0xE0, 0xF0, 0xF8, 0xFC, 0xFE };
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	233 size_t nbytes;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	234 uchar buf[6], *p = &buf[6];
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	235 uchar outbuf = outbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	236
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	237 nbytes = 1;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	238 if (c < 0x80)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	239 *--p = c;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	240 else
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	241 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	242 do
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	243 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	244 *--p = ((c & 0x3F) \| 0x80);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	245 c >>= 6;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	246 nbytes++;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	247 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	248 while (c >= 0x3F \|\| (c & limits[nbytes-1]));
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	249 *--p = (c \| masks[nbytes-1]);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	250 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	251
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	252 if (*outbytesleftp < nbytes)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	253 return E2BIG;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	254
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	255 while (p < &buf[6])
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	256 outbuf++ = p++;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	257 *outbytesleftp -= nbytes;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	258 *outbufp = outbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	259 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	260 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	261
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	262 /* The following four functions transform one character between the two
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	263 encodings named in the function name. All have the signature
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	264 int ()(iconv_t bigend, const uchar inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	265 uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	266
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	267 BIGEND must have the value 0 or 1, coerced to (iconv_t); it is
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	268 interpreted as a boolean indicating whether big-endian or
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	269 little-endian encoding is to be used for the member of the pair
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	270 that is not UTF-8.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	271
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	272 INBUFP, INBYTESLEFTP, OUTBUFP, OUTBYTESLEFTP work exactly as they
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	273 do for iconv.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	274
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	275 The return value is either 0 for success, or an errno value for
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	276 failure, which may be E2BIG (need more space), EILSEQ (ill-formed
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	277 input sequence), ir EINVAL (incomplete input sequence). */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	278
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	279 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	280 one_utf8_to_utf32 (iconv_t bigend, const uchar *inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	281 uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	282 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	283 uchar *outbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	284 cppchar_t s = 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	285 int rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	286
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	287 /* Check for space first, since we know exactly how much we need. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	288 if (*outbytesleftp < 4)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	289 return E2BIG;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	290
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	291 rval = one_utf8_to_cppchar (inbufp, inbytesleftp, &s);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	292 if (rval)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	293 return rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	294
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	295 outbuf = *outbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	296 outbuf[bigend ? 3 : 0] = (s & 0x000000FF);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	297 outbuf[bigend ? 2 : 1] = (s & 0x0000FF00) >> 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	298 outbuf[bigend ? 1 : 2] = (s & 0x00FF0000) >> 16;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	299 outbuf[bigend ? 0 : 3] = (s & 0xFF000000) >> 24;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	300
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	301 *outbufp += 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	302 *outbytesleftp -= 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	303 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	304 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	305
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	306 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	307 one_utf32_to_utf8 (iconv_t bigend, const uchar *inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	308 uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	309 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	310 cppchar_t s;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	311 int rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	312 const uchar *inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	313
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	314 if (*inbytesleftp < 4)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	315 return EINVAL;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	316
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	317 inbuf = *inbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	318
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	319 s = inbuf[bigend ? 0 : 3] << 24;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	320 s += inbuf[bigend ? 1 : 2] << 16;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	321 s += inbuf[bigend ? 2 : 1] << 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	322 s += inbuf[bigend ? 3 : 0];
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	323
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	324 if (s >= 0x7FFFFFFF \|\| (s >= 0xD800 && s <= 0xDFFF))
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	325 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	326
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	327 rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	328 if (rval)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	329 return rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	330
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	331 *inbufp += 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	332 *inbytesleftp -= 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	333 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	334 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	335
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	336 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	337 one_utf8_to_utf16 (iconv_t bigend, const uchar *inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	338 uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	339 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	340 int rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	341 cppchar_t s = 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	342 const uchar save_inbuf = inbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	343 size_t save_inbytesleft = *inbytesleftp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	344 uchar outbuf = outbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	345
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	346 rval = one_utf8_to_cppchar (inbufp, inbytesleftp, &s);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	347 if (rval)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	348 return rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	349
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	350 if (s > 0x0010FFFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	351 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	352 *inbufp = save_inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	353 *inbytesleftp = save_inbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	354 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	355 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	356
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	357 if (s < 0xFFFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	358 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	359 if (*outbytesleftp < 2)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	360 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	361 *inbufp = save_inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	362 *inbytesleftp = save_inbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	363 return E2BIG;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	364 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	365 outbuf[bigend ? 1 : 0] = (s & 0x00FF);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	366 outbuf[bigend ? 0 : 1] = (s & 0xFF00) >> 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	367
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	368 *outbufp += 2;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	369 *outbytesleftp -= 2;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	370 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	371 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	372 else
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	373 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	374 cppchar_t hi, lo;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	375
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	376 if (*outbytesleftp < 4)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	377 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	378 *inbufp = save_inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	379 *inbytesleftp = save_inbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	380 return E2BIG;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	381 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	382
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	383 hi = (s - 0x10000) / 0x400 + 0xD800;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	384 lo = (s - 0x10000) % 0x400 + 0xDC00;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	385
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	386 /* Even if we are little-endian, put the high surrogate first.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	387 ??? Matches practice? */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	388 outbuf[bigend ? 1 : 0] = (hi & 0x00FF);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	389 outbuf[bigend ? 0 : 1] = (hi & 0xFF00) >> 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	390 outbuf[bigend ? 3 : 2] = (lo & 0x00FF);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	391 outbuf[bigend ? 2 : 3] = (lo & 0xFF00) >> 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	392
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	393 *outbufp += 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	394 *outbytesleftp -= 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	395 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	396 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	397 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	398
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	399 static inline int
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	400 one_utf16_to_utf8 (iconv_t bigend, const uchar *inbufp, size_t inbytesleftp,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	401 uchar *outbufp, size_t outbytesleftp)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	402 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	403 cppchar_t s;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	404 const uchar inbuf = inbufp;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	405 int rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	406
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	407 if (*inbytesleftp < 2)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	408 return EINVAL;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	409 s = inbuf[bigend ? 0 : 1] << 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	410 s += inbuf[bigend ? 1 : 0];
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	411
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	412 /* Low surrogate without immediately preceding high surrogate is invalid. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	413 if (s >= 0xDC00 && s <= 0xDFFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	414 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	415 /* High surrogate must have a following low surrogate. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	416 else if (s >= 0xD800 && s <= 0xDBFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	417 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	418 cppchar_t hi = s, lo;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	419 if (*inbytesleftp < 4)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	420 return EINVAL;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	421
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	422 lo = inbuf[bigend ? 2 : 3] << 8;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	423 lo += inbuf[bigend ? 3 : 2];
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	424
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	425 if (lo < 0xDC00 \|\| lo > 0xDFFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	426 return EILSEQ;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	427
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	428 s = (hi - 0xD800) * 0x400 + (lo - 0xDC00) + 0x10000;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	429 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	430
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	431 rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	432 if (rval)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	433 return rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	434
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	435 /* Success - update the input pointers (one_cppchar_to_utf8 has done
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	436 the output pointers for us). */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	437 if (s <= 0xFFFF)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	438 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	439 *inbufp += 2;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	440 *inbytesleftp -= 2;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	441 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	442 else
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	443 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	444 *inbufp += 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	445 *inbytesleftp -= 4;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	446 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	447 return 0;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	448 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	449
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	450 /* Helper routine for the next few functions. The 'const' on
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	451 one_conversion means that we promise not to modify what function is
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	452 pointed to, which lets the inliner see through it. */
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	453
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	454 static inline bool
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	455 conversion_loop (int (const one_conversion)(iconv_t, const uchar , size_t ,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	456 uchar *, size_t ),
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	457 iconv_t cd, const uchar from, size_t flen, struct _cpp_strbuf to)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	458 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	459 const uchar *inbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	460 uchar *outbuf;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	461 size_t inbytesleft, outbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	462 int rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	463
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	464 inbuf = from;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	465 inbytesleft = flen;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	466 outbuf = to->text + to->len;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	467 outbytesleft = to->asize - to->len;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	468
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	469 for (;;)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	470 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	471 do
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	472 rval = one_conversion (cd, &inbuf, &inbytesleft,
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	473 &outbuf, &outbytesleft);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	474 while (inbytesleft && !rval);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	475
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	476 if (__builtin_expect (inbytesleft == 0, 1))
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	477 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	478 to->len = to->asize - outbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	479 return true;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	480 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	481 if (rval != E2BIG)
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	482 {
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	483 errno = rval;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	484 return false;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	485 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	486
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	487 outbytesleft += OUTBUF_BLOCK_SIZE;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	488 to->asize += OUTBUF_BLOCK_SIZE;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	489 to->text = XRESIZEVEC (uchar, to->text, to->asize);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	490 outbuf = to->text + to->asize - outbytesleft;
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	491 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	492 }
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	493
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	494
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	495 /* These functions convert entire strings between character sets.
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	496 They all have the signature
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	497
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	498 bool ()(iconv_t cd, const uchar from, size_t flen, struct _cpp_strbuf *to);
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	499
a06113de4d67 first commit kent <kent@cr.ie.u-ryukyu.ac.jp> parents: diff changeset	500 The input string FROM is converted as specified by the function

0

a06113de4d67 first commit