Blob


1 .TH RUNE 3
2 .SH NAME
3 runetochar, chartorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf \- rune/UTF conversion
4 .SH SYNOPSIS
5 .ta \w'\fLchar*xx'u
6 .B #include <u.h>
7 .br
8 .B #include <libc.h>
9 .PP
10 .B
11 int runetochar(char *s, Rune *r)
12 .PP
13 .B
14 int chartorune(Rune *r, char *s)
15 .PP
16 .B
17 int runelen(long r)
18 .PP
19 .B
20 int runenlen(Rune *r, int n)
21 .PP
22 .B
23 int fullrune(char *s, int n)
24 .PP
25 .B
26 char* utfecpy(char *s1, char *es1, char *s2)
27 .PP
28 .B
29 int utflen(char *s)
30 .PP
31 .B
32 int utfnlen(char *s, long n)
33 .PP
34 .B
35 char* utfrune(char *s, long c)
36 .PP
37 .B
38 char* utfrrune(char *s, long c)
39 .PP
40 .B
41 char* utfutf(char *s1, char *s2)
42 .SH DESCRIPTION
43 These routines convert to and from a
44 .SM UTF
45 byte stream and runes.
46 .PP
47 .I Runetochar
48 copies one rune at
49 .I r
50 to at most
51 .B UTFmax
52 bytes starting at
53 .I s
54 and returns the number of bytes copied.
55 .BR UTFmax ,
56 defined as
57 .B 3
58 in
59 .BR <libc.h> ,
60 is the maximum number of bytes required to represent a rune.
61 .PP
62 .I Chartorune
63 copies at most
64 .B UTFmax
65 bytes starting at
66 .I s
67 to one rune at
68 .I r
69 and returns the number of bytes copied.
70 If the input is not exactly in
71 .SM UTF
72 format,
73 .I chartorune
74 will convert to
75 .B Runeerror
76 (0xFFFD)
77 and return 1.
78 .PP
79 .I Runelen
80 returns the number of bytes
81 required to convert
82 .I r
83 into
84 .SM UTF.
85 .PP
86 .I Runenlen
87 returns the number of bytes
88 required to convert the
89 .I n
90 runes pointed to by
91 .I r
92 into
93 .SM UTF.
94 .PP
95 .I Fullrune
96 returns 1 if the string
97 .I s
98 of length
99 .I n
100 is long enough to be decoded by
101 .I chartorune
102 and 0 otherwise.
103 This does not guarantee that the string
104 contains a legal
105 .SM UTF
106 encoding.
107 This routine is used by programs that
108 obtain input a byte at
109 a time and need to know when a full rune
110 has arrived.
111 .PP
112 The following routines are analogous to the
113 corresponding string routines with
114 .B utf
115 substituted for
116 .B str
117 and
118 .B rune
119 substituted for
120 .BR chr .
121 .PP
122 .I Utfecpy
123 copies UTF sequences until a null sequence has been copied, but writes no
124 sequences beyond
125 .IR es1 .
126 If any sequences are copied,
127 .I s1
128 is terminated by a null sequence, and a pointer to that sequence is returned.
129 Otherwise, the original
130 .I s1
131 is returned.
132 .PP
133 .I Utflen
134 returns the number of runes that
135 are represented by the
136 .SM UTF
137 string
138 .IR s .
139 .PP
140 .I Utfnlen
141 returns the number of complete runes that
142 are represented by the first
143 .I n
144 bytes of
145 .SM UTF
146 string
147 .IR s .
148 If the last few bytes of the string contain an incompletely coded rune,
149 .I utfnlen
150 will not count them; in this way, it differs from
151 .IR utflen ,
152 which includes every byte of the string.
153 .PP
154 .I Utfrune
155 .RI ( utfrrune )
156 returns a pointer to the first (last)
157 occurrence of rune
158 .I c
159 in the
160 .SM UTF
161 string
162 .IR s ,
163 or 0 if
164 .I c
165 does not occur in the string.
166 The NUL byte terminating a string is considered to
167 be part of the string
168 .IR s .
169 .PP
170 .I Utfutf
171 returns a pointer to the first occurrence of
172 the
173 .SM UTF
174 string
175 .I s2
176 as a
177 .SM UTF
178 substring of
179 .IR s1 ,
180 or 0 if there is none.
181 If
182 .I s2
183 is the null string,
184 .I utfutf
185 returns
186 .IR s1 .
187 .SH SOURCE
188 .B \*9/src/lib9/utf/rune.c
189 .br
190 .B \*9/src/lib9/utf/utfrune.c
191 .SH SEE ALSO
192 .MR utf (7) ,
193 .MR tcs (1)