Blob


1 .TH RUNE 3
2 .SH NAME
3 runetochar, chartorune, runelen, runenlen, fullrune, utfecpy, utflen, utfnlen, utfrune, utfrrune, utfutf \- rune/UTF conversion
4 .SH SYNOPSIS
5 .ta \w'\fLchar*xx'u
6 .B #include <utf.h>
7 .PP
8 .B
9 int runetochar(char *s, Rune *r)
10 .PP
11 .B
12 int chartorune(Rune *r, char *s)
13 .PP
14 .B
15 int runelen(long r)
16 .PP
17 .B
18 int runenlen(Rune *r, int n)
19 .PP
20 .B
21 int fullrune(char *s, int n)
22 .PP
23 .B
24 char* utfecpy(char *s1, char *es1, char *s2)
25 .PP
26 .B
27 int utflen(char *s)
28 .PP
29 .B
30 int utfnlen(char *s, long n)
31 .PP
32 .B
33 char* utfrune(char *s, long c)
34 .PP
35 .B
36 char* utfrrune(char *s, long c)
37 .PP
38 .B
39 char* utfutf(char *s1, char *s2)
40 .SH DESCRIPTION
41 These routines convert to and from a
42 .SM UTF
43 byte stream and runes.
44 .PP
45 .I Runetochar
46 copies one rune at
47 .I r
48 to at most
49 .B UTFmax
50 bytes starting at
51 .I s
52 and returns the number of bytes copied.
53 .BR UTFmax ,
54 defined as
55 .B 3
56 in
57 .BR <libc.h> ,
58 is the maximum number of bytes required to represent a rune.
59 .PP
60 .I Chartorune
61 copies at most
62 .B UTFmax
63 bytes starting at
64 .I s
65 to one rune at
66 .I r
67 and returns the number of bytes copied.
68 If the input is not exactly in
69 .SM UTF
70 format,
71 .I chartorune
72 will convert to 0x80 and return 1.
73 .PP
74 .I Runelen
75 returns the number of bytes
76 required to convert
77 .I r
78 into
79 .SM UTF.
80 .PP
81 .I Runenlen
82 returns the number of bytes
83 required to convert the
84 .I n
85 runes pointed to by
86 .I r
87 into
88 .SM UTF.
89 .PP
90 .I Fullrune
91 returns 1 if the string
92 .I s
93 of length
94 .I n
95 is long enough to be decoded by
96 .I chartorune
97 and 0 otherwise.
98 This does not guarantee that the string
99 contains a legal
100 .SM UTF
101 encoding.
102 This routine is used by programs that
103 obtain input a byte at
104 a time and need to know when a full rune
105 has arrived.
106 .PP
107 The following routines are analogous to the
108 corresponding string routines with
109 .B utf
110 substituted for
111 .B str
112 and
113 .B rune
114 substituted for
115 .BR chr .
116 .PP
117 .I Utfecpy
118 copies UTF sequences until a null sequence has been copied, but writes no
119 sequences beyond
120 .IR es1 .
121 If any sequences are copied,
122 .I s1
123 is terminated by a null sequence, and a pointer to that sequence is returned.
124 Otherwise, the original
125 .I s1
126 is returned.
127 .PP
128 .I Utflen
129 returns the number of runes that
130 are represented by the
131 .SM UTF
132 string
133 .IR s .
134 .PP
135 .I Utfnlen
136 returns the number of complete runes that
137 are represented by the first
138 .I n
139 bytes of
140 .SM UTF
141 string
142 .IR s .
143 If the last few bytes of the string contain an incompletely coded rune,
144 .I utfnlen
145 will not count them; in this way, it differs from
146 .IR utflen ,
147 which includes every byte of the string.
148 .PP
149 .I Utfrune
150 .RI ( utfrrune )
151 returns a pointer to the first (last)
152 occurrence of rune
153 .I c
154 in the
155 .SM UTF
156 string
157 .IR s ,
158 or 0 if
159 .I c
160 does not occur in the string.
161 The NUL byte terminating a string is considered to
162 be part of the string
163 .IR s .
164 .PP
165 .I Utfutf
166 returns a pointer to the first occurrence of
167 the
168 .SM UTF
169 string
170 .I s2
171 as a
172 .SM UTF
173 substring of
174 .IR s1 ,
175 or 0 if there is none.
176 If
177 .I s2
178 is the null string,
179 .I utfutf
180 returns
181 .IR s1 .
182 .SH SOURCE
183 .B https://9fans.github.io/plan9port/unix
184 .SH SEE ALSO
185 .IR utf (7),
186 .IR tcs (1)