Blob


1 .TH AWK 1
2 .SH NAME
3 awk \- pattern-directed scanning and processing language
4 .SH SYNOPSIS
5 .B awk
6 [
7 .B -F
8 .I fs
9 ]
10 [
11 .B -d
12 ]
13 [
14 .BI -mf
15 .I n
16 ]
17 [
18 .B -mr
19 .I n
20 ]
21 [
22 .B -safe
23 ]
24 [
25 .B -v
26 .I var=value
27 ]
28 [
29 .B -f
30 .I progfile
31 |
32 .I prog
33 ]
34 [
35 .I file ...
36 ]
37 .SH DESCRIPTION
38 .I Awk
39 scans each input
40 .I file
41 for lines that match any of a set of patterns specified literally in
42 .I prog
43 or in one or more files
44 specified as
45 .B -f
46 .IR progfile .
47 With each pattern
48 there can be an associated action that will be performed
49 when a line of a
50 .I file
51 matches the pattern.
52 Each line is matched against the
53 pattern portion of every pattern-action statement;
54 the associated action is performed for each matched pattern.
55 The file name
56 .L -
57 means the standard input.
58 Any
59 .IR file
60 of the form
61 .I var=value
62 is treated as an assignment, not a file name,
63 and is executed at the time it would have been opened if it were a file name.
64 The option
65 .B -v
66 followed by
67 .I var=value
68 is an assignment to be done before the program
69 is executed;
70 any number of
71 .B -v
72 options may be present.
73 .B -F
74 .IR fs
75 option defines the input field separator to be the regular expression
76 .IR fs .
77 .PP
78 An input line is normally made up of fields separated by white space,
79 or by regular expression
80 .BR FS .
81 The fields are denoted
82 .BR $1 ,
83 .BR $2 ,
84 \&..., while
85 .B $0
86 refers to the entire line.
87 If
88 .BR FS
89 is null, the input line is split into one field per character.
90 .PP
91 To compensate for inadequate implementation of storage management,
92 the
93 .B -mr
94 option can be used to set the maximum size of the input record,
95 and the
96 .B -mf
97 option to set the maximum number of fields.
98 .PP
99 The
100 .B -safe
101 option causes
102 .I awk
103 to run in
104 ``safe mode,''
105 in which it is not allowed to
106 run shell commands or open files
107 and the environment is not made available
108 in the
109 .B ENVIRON
110 variable.
111 .PP
112 A pattern-action statement has the form
113 .IP
114 .IB pattern " { " action " }
115 .PP
116 A missing
117 .BI { " action " }
118 means print the line;
119 a missing pattern always matches.
120 Pattern-action statements are separated by newlines or semicolons.
121 .PP
122 An action is a sequence of statements.
123 A statement can be one of the following:
124 .PP
125 .EX
126 .ta \w'\fLdelete array[expression]'u
127 if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
128 while(\fI expression \fP)\fI statement\fP
129 for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
130 for(\fI var \fPin\fI array \fP)\fI statement\fP
131 do\fI statement \fPwhile(\fI expression \fP)
132 break
133 continue
134 {\fR [\fP\fI statement ... \fP\fR] \fP}
135 \fIexpression\fP #\fR commonly\fP\fI var = expression\fP
136 print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
137 printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
138 return\fR [ \fP\fIexpression \fP\fR]\fP
139 next #\fR skip remaining patterns on this input line\fP
140 nextfile #\fR skip rest of this file, open next, start at top\fP
141 delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
142 delete\fI array\fP #\fR delete all elements of array\fP
143 exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
144 .EE
145 .DT
146 .PP
147 Statements are terminated by
148 semicolons, newlines or right braces.
149 An empty
150 .I expression-list
151 stands for
152 .BR $0 .
153 String constants are quoted \&\fL"\ "\fR,
154 with the usual C escapes recognized within.
155 Expressions take on string or numeric values as appropriate,
156 and are built using the operators
157 .B + \- * / % ^
158 (exponentiation), and concatenation (indicated by white space).
159 The operators
160 .B
161 ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
162 are also available in expressions.
163 Variables may be scalars, array elements
164 (denoted
165 .IB x [ i ] )
166 or fields.
167 Variables are initialized to the null string.
168 Array subscripts may be any string,
169 not necessarily numeric;
170 this allows for a form of associative memory.
171 Multiple subscripts such as
172 .B [i,j,k]
173 are permitted; the constituents are concatenated,
174 separated by the value of
175 .BR SUBSEP .
176 .PP
177 The
178 .B print
179 statement prints its arguments on the standard output
180 (or on a file if
181 .BI > file
182 or
183 .BI >> file
184 is present or on a pipe if
185 .BI | cmd
186 is present), separated by the current output field separator,
187 and terminated by the output record separator.
188 .I file
189 and
190 .I cmd
191 may be literal names or parenthesized expressions;
192 identical string values in different statements denote
193 the same open file.
194 The
195 .B printf
196 statement formats its expression list according to the format
197 (see
198 .IR fprintf (3)) .
199 The built-in function
200 .BI close( expr )
201 closes the file or pipe
202 .IR expr .
203 The built-in function
204 .BI fflush( expr )
205 flushes any buffered output for the file or pipe
206 .IR expr .
207 If
208 .IR expr
209 is omitted or is a null string, all open files are flushed.
210 .PP
211 The mathematical functions
212 .BR exp ,
213 .BR log ,
214 .BR sqrt ,
215 .BR sin ,
216 .BR cos ,
217 and
218 .BR atan2
219 are built in.
220 Other built-in functions:
221 .TF length
222 .TP
223 .B length
224 If its argument is a string, the string's length is returned.
225 If its argument is an array, the number of subscripts in the array is returned.
226 If no argument, the length of
227 .B $0
228 is returned.
229 .TP
230 .B rand
231 random number on (0,1)
232 .TP
233 .B srand
234 sets seed for
235 .B rand
236 and returns the previous seed.
237 .TP
238 .B int
239 truncates to an integer value
240 .TP
241 .B utf
242 converts its numerical argument, a character number, to a
243 .SM UTF
244 string
245 .TP
246 .BI substr( s , " m" , " n\fL)
247 the
248 .IR n -character
249 substring of
250 .I s
251 that begins at position
252 .IR m
253 counted from 1.
254 .TP
255 .BI index( s , " t" )
256 the position in
257 .I s
258 where the string
259 .I t
260 occurs, or 0 if it does not.
261 .TP
262 .BI match( s , " r" )
263 the position in
264 .I s
265 where the regular expression
266 .I r
267 occurs, or 0 if it does not.
268 The variables
269 .B RSTART
270 and
271 .B RLENGTH
272 are set to the position and length of the matched string.
273 .TP
274 .BI split( s , " a" , " fs\fL)
275 splits the string
276 .I s
277 into array elements
278 .IB a [1]\f1,
279 .IB a [2]\f1,
280 \&...,
281 .IB a [ n ]\f1,
282 and returns
283 .IR n .
284 The separation is done with the regular expression
285 .I fs
286 or with the field separator
287 .B FS
288 if
289 .I fs
290 is not given.
291 An empty string as field separator splits the string
292 into one array element per character.
293 .TP
294 .BI sub( r , " t" , " s\fL)
295 substitutes
296 .I t
297 for the first occurrence of the regular expression
298 .I r
299 in the string
300 .IR s .
301 If
302 .I s
303 is not given,
304 .B $0
305 is used.
306 .TP
307 .B gsub
308 same as
309 .B sub
310 except that all occurrences of the regular expression
311 are replaced;
312 .B sub
313 and
314 .B gsub
315 return the number of replacements.
316 .TP
317 .BI sprintf( fmt , " expr" , " ...\fL)
318 the string resulting from formatting
319 .I expr ...
320 according to the
321 .I printf
322 format
323 .I fmt
324 .TP
325 .BI system( cmd )
326 executes
327 .I cmd
328 and returns its exit status
329 .TP
330 .BI tolower( str )
331 returns a copy of
332 .I str
333 with all upper-case characters translated to their
334 corresponding lower-case equivalents.
335 .TP
336 .BI toupper( str )
337 returns a copy of
338 .I str
339 with all lower-case characters translated to their
340 corresponding upper-case equivalents.
341 .PD
342 .PP
343 The ``function''
344 .B getline
345 sets
346 .B $0
347 to the next input record from the current input file;
348 .B getline
349 .BI < file
350 sets
351 .B $0
352 to the next record from
353 .IR file .
354 .B getline
355 .I x
356 sets variable
357 .I x
358 instead.
359 Finally,
360 .IB cmd " | getline
361 pipes the output of
362 .I cmd
363 into
364 .BR getline ;
365 each call of
366 .B getline
367 returns the next line of output from
368 .IR cmd .
369 In all cases,
370 .B getline
371 returns 1 for a successful input,
372 0 for end of file, and \-1 for an error.
373 .PP
374 Patterns are arbitrary Boolean combinations
375 (with
376 .BR "! || &&" )
377 of regular expressions and
378 relational expressions.
379 Regular expressions are as in
380 .MR regexp (7) .
381 Isolated regular expressions
382 in a pattern apply to the entire line.
383 Regular expressions may also occur in
384 relational expressions, using the operators
385 .BR ~
386 and
387 .BR !~ .
388 .BI / re /
389 is a constant regular expression;
390 any string (constant or variable) may be used
391 as a regular expression, except in the position of an isolated regular expression
392 in a pattern.
393 .PP
394 A pattern may consist of two patterns separated by a comma;
395 in this case, the action is performed for all lines
396 from an occurrence of the first pattern
397 though an occurrence of the second.
398 .PP
399 A relational expression is one of the following:
400 .IP
401 .I expression matchop regular-expression
402 .br
403 .I expression relop expression
404 .br
405 .IB expression " in " array-name
406 .br
407 .BI ( expr , expr,... ") in " array-name
408 .PP
409 where a
410 .I relop
411 is any of the six relational operators in C,
412 and a
413 .I matchop
414 is either
415 .B ~
416 (matches)
417 or
418 .B !~
419 (does not match).
420 A conditional is an arithmetic expression,
421 a relational expression,
422 or a Boolean combination
423 of these.
424 .PP
425 The special patterns
426 .B BEGIN
427 and
428 .B END
429 may be used to capture control before the first input line is read
430 and after the last.
431 .B BEGIN
432 and
433 .B END
434 do not combine with other patterns.
435 .PP
436 Variable names with special meanings:
437 .TF FILENAME
438 .TP
439 .B CONVFMT
440 conversion format used when converting numbers
441 (default
442 .BR "%.6g" )
443 .TP
444 .B FS
445 regular expression used to separate fields; also settable
446 by option
447 .BI \-F fs\f1.
448 .TP
449 .BR NF
450 number of fields in the current record
451 .TP
452 .B NR
453 ordinal number of the current record
454 .TP
455 .B FNR
456 ordinal number of the current record in the current file
457 .TP
458 .B FILENAME
459 the name of the current input file
460 .TP
461 .B RS
462 input record separator (default newline)
463 .TP
464 .B OFS
465 output field separator (default blank)
466 .TP
467 .B ORS
468 output record separator (default newline)
469 .TP
470 .B OFMT
471 output format for numbers (default
472 .BR "%.6g" )
473 .TP
474 .B SUBSEP
475 separates multiple subscripts (default 034)
476 .TP
477 .B ARGC
478 argument count, assignable
479 .TP
480 .B ARGV
481 argument array, assignable;
482 non-null members are taken as file names
483 .TP
484 .B ENVIRON
485 array of environment variables; subscripts are names.
486 .PD
487 .PP
488 Functions may be defined (at the position of a pattern-action statement) thus:
489 .IP
490 .L
491 function foo(a, b, c) { ...; return x }
492 .PP
493 Parameters are passed by value if scalar and by reference if array name;
494 functions may be called recursively.
495 Parameters are local to the function; all other variables are global.
496 Thus local variables may be created by providing excess parameters in
497 the function definition.
498 .SH EXAMPLES
499 .TP
500 .L
501 length($0) > 72
502 Print lines longer than 72 characters.
503 .TP
504 .L
505 { print $2, $1 }
506 Print first two fields in opposite order.
507 .PP
508 .EX
509 BEGIN { FS = ",[ \et]*|[ \et]+" }
510 { print $2, $1 }
511 .EE
512 .ns
513 .IP
514 Same, with input fields separated by comma and/or blanks and tabs.
515 .PP
516 .EX
517 { s += $1 }
518 END { print "sum is", s, " average is", s/NR }
519 .EE
520 .ns
521 .IP
522 Add up first column, print sum and average.
523 .TP
524 .L
525 /start/, /stop/
526 Print all lines between start/stop pairs.
527 .PP
528 .EX
529 BEGIN { # Simulate echo(1)
530 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
531 printf "\en"
532 exit }
533 .EE
534 .SH SOURCE
535 .B \*9/src/cmd/awk
536 .SH SEE ALSO
537 .MR sed (1) ,
538 .MR regexp (7) ,
539 .br
540 A. V. Aho, B. W. Kernighan, P. J. Weinberger,
541 .I
542 The AWK Programming Language,
543 Addison-Wesley, 1988. ISBN 0-201-07981-X
544 .SH BUGS
545 There are no explicit conversions between numbers and strings.
546 To force an expression to be treated as a number add 0 to it;
547 to force it to be treated as a string concatenate
548 \&\fL""\fP to it.
549 .PP
550 The scope rules for variables in functions are a botch;
551 the syntax is worse.
552 .PP
553 UTF is not always dealt with correctly,
554 though
555 .I awk
556 does make an attempt to do so.
557 The
558 .I split
559 function with an empty string as final argument now copes
560 with UTF in the string being split.