Blob


1 .TH TCS 1
2 .SH NAME
3 tcs \- translate character sets
4 .SH SYNOPSIS
5 .B tcs
6 [
7 .B -slcv
8 ]
9 [
10 .B -f
11 .I ics
12 ]
13 [
14 .B -t
15 .I ocs
16 ]
17 [
18 .I file ...
19 ]
20 .SH DESCRIPTION
21 .I Tcs
22 interprets the named
23 .I file(s)
24 (standard input default) as a stream of characters from the
25 .I ics
26 character set or format, converts them to runes,
27 and then converts them into a stream of characters from the
28 .I ocs
29 character set or format on the standard output.
30 The default value for
31 .I ics
32 and
33 .I ocs
34 is
35 .BR utf ,
36 the
37 .SM UTF
38 encoding described in
39 .MR utf (7) .
40 The
41 .B -l
42 option lists the character sets known to
43 .IR tcs .
44 Processing continues in the face of conversion errors (the
45 .B -s
46 option prevents reporting of these errors).
47 The
48 .B -c
49 option forces the output to contain only correctly converted characters;
50 otherwise,
51 .B 0x80
52 characters will be substituted for
53 .SM UTF
54 encoding errors and
55 .B 0xFFFD
56 characters will substituted for unknown characters.
57 .PP
58 The
59 .B -v
60 option generates various diagnostic and summary information on standard error,
61 or makes the
62 .B -l
63 output more verbose.
64 .PP
65 .I Tcs
66 recognizes an ever changing list of character sets.
67 In particular, it supports a variety of Russian and Japanese encodings.
68 Some of the supported encodings are
69 .TF jis-kanji
70 .TP
71 .B utf
72 The Plan 9
73 .SM UTF
74 encoding, known by ISO as UTF-8
75 .TP
76 .B utf1
77 The deprecated original
78 .SM UTF
79 encoding from ISO 10646
80 .TP
81 .B ascii
82 7-bit ASCII
83 .TP
84 .B 8859-1
85 Latin-1 (Central European)
86 .TP
87 .B 8859-2
88 Latin-2 (Czech .. Slovak)
89 .TP
90 .B 8859-3
91 Latin-3 (Dutch .. Turkish)
92 .TP
93 .B 8859-4
94 Latin-4 (Scandinavian)
95 .TP
96 .B 8859-5
97 Part 5 (Cyrillic)
98 .TP
99 .B 8859-6
100 Part 6 (Arabic)
101 .TP
102 .B 8859-7
103 Part 7 (Greek)
104 .TP
105 .B 8859-8
106 Part 8 (Hebrew)
107 .TP
108 .B 8859-9
109 Latin-5 (Finnish .. Portuguese)
110 .TP
111 .B koi8
112 KOI-8 (GOST 19769-74)
113 .TP
114 .B jis-kanji
115 ISO 2022-JP
116 .TP
117 .B ujis
118 EUC-JX: JIS 0208
119 .TP
120 .B ms-kanji
121 Microsoft, or Shift-JIS
122 .TP
123 .B jis
124 (from only) guesses between ISO 2022-JP, EUC or Shift-Jis
125 .TP
126 .B gb
127 Chinese national standard (GB2312-80)
128 .TP
129 .B big5
130 Big 5 (HKU version)
131 .TP
132 .B unicode
133 Unicode Standard 1.0
134 .TP
135 .B tis
136 Thai character set plus
137 .SM ASCII
138 (TIS 620-1986)
139 .TP
140 .B msdos
141 IBM PC: CP 437
142 .TP
143 .B atari
144 Atari-ST character set
145 .SH EXAMPLES
146 .TP
147 .B tcs -f 8859-1
148 Convert 8859-1 (Latin-1) characters into
149 .SM UTF
150 format.
151 .TP
152 .B tcs -s -f jis
153 Convert characters encoded in one of several shift JIS encodings into
154 .SM UTF
155 format.
156 Unknown Kanji will be converted into
157 .B 0xFFFD
158 characters.
159 .TP
160 .B tcs -lv
161 Print an up to date list of the supported character sets.
162 .SH SOURCE
163 .B \*9/src/cmd/tcs
164 .SH SEE ALSO
165 .IR ascii (1),
166 .IR rune (3),
167 .MR utf (7) .