Blob


1 .TH REGEXP 3
2 .SH NAME
3 regcomp, regcomplit, regcompnl, regexec, regsub, rregexec, rregsub, regerror \- regular expression
4 .SH SYNOPSIS
5 .B #include <u.h>
6 .br
7 .B #include <libc.h>
8 .br
9 .B #include <regexp.h>
10 .PP
11 .ta \w'\fLRegprog 'u
12 .B
13 Reprog *regcomp(char *exp)
14 .PP
15 .B
16 Reprog *regcomplit(char *exp)
17 .PP
18 .B
19 Reprog *regcompnl(char *exp)
20 .PP
21 .nf
22 .B
23 int regexec(Reprog *prog, char *string, Resub *match, int msize)
24 .PP
25 .nf
26 .B
27 void regsub(char *source, char *dest, int dlen, Resub *match, int msize)
28 .PP
29 .nf
30 .B
31 int rregexec(Reprog *prog, Rune *string, Resub *match, int msize)
32 .PP
33 .nf
34 .B
35 void rregsub(Rune *source, Rune *dest, int dlen, Resub *match, int msize)
36 .PP
37 .B
38 void regerror(char *msg)
39 .SH DESCRIPTION
40 .I Regcomp
41 compiles a
42 regular expression and returns
43 a pointer to the generated description.
44 The space is allocated by
45 .MR malloc (3)
46 and may be released by
47 .IR free .
48 Regular expressions are exactly as in
49 .MR regexp (7) .
50 .PP
51 .I Regcomplit
52 is like
53 .I regcomp
54 except that all characters are treated literally.
55 .I Regcompnl
56 is like
57 .I regcomp
58 except that the
59 .B .
60 metacharacter matches all characters, including newlines.
61 .PP
62 .I Regexec
63 matches a null-terminated
64 .I string
65 against the compiled regular expression in
66 .IR prog .
67 If it matches,
68 .I regexec
69 returns
70 .B 1
71 and fills in the array
72 .I match
73 with character pointers to the substrings of
74 .I string
75 that correspond to the
76 parenthesized subexpressions of
77 .IR exp :
78 .BI match[ i ].sp
79 points to the beginning and
80 .BI match[ i ].ep
81 points just beyond
82 the end of the
83 .IR i th
84 substring.
85 (Subexpression
86 .I i
87 begins at the
88 .IR i th
89 left parenthesis, counting from 1.)
90 Pointers in
91 .B match[0]
92 pick out the substring that corresponds to
93 the whole regular expression.
94 Unused elements of
95 .I match
96 are filled with zeros.
97 Matches involving
98 .LR * ,
99 .LR + ,
100 and
101 .L ?
102 are extended as far as possible.
103 The number of array elements in
104 .I match
105 is given by
106 .IR msize .
107 The structure of elements of
108 .I match
109 is:
110 .IP
111 .EX
112 typedef struct {
113 union {
114 char *sp;
115 Rune *rsp;
116 } s;
117 union {
118 char *ep;
119 Rune *rep;
120 } e;
121 } Resub;
122 .EE
123 .LP
124 If
125 .B match[0].s.sp
126 is nonzero on entry,
127 .I regexec
128 starts matching at that point within
129 .IR string .
130 If
131 .B match[0].e.ep
132 is nonzero on entry,
133 the last character matched is the one
134 preceding that point.
135 .PP
136 .I Regsub
137 places in
138 .I dest
139 a substitution instance of
140 .I source
141 in the context of the last
142 .I regexec
143 performed using
144 .IR match .
145 Each instance of
146 .BI \e n\f1,
147 where
148 .I n
149 is a digit, is replaced by the
150 string delimited by
151 .BI match[ n ].sp
152 and
153 .BI match[ n ].ep\f1.
154 Each instance of
155 .L &
156 is replaced by the string delimited by
157 .B match[0].sp
158 and
159 .BR match[0].ep .
160 The substitution will always be null terminated and
161 trimmed to fit into dlen bytes.
162 .PP
163 .IR Regerror ,
164 called whenever an error is detected in
165 .IR regcomp ,
166 writes the string
167 .I msg
168 on the standard error file and exits.
169 .I Regerror
170 can be replaced to perform
171 special error processing.
172 If the user supplied
173 .I regerror
174 returns rather than exits,
175 .I regcomp
176 will return 0.
177 .PP
178 .I Rregexec
179 and
180 .I rregsub
181 are variants of
182 .I regexec
183 and
184 .I regsub
185 that use strings of
186 .B Runes
187 instead of strings of
188 .BR chars .
189 With these routines, the
190 .I rsp
191 and
192 .I rep
193 fields of the
194 .I match
195 array elements should be used.
196 .SH SOURCE
197 .B \*9/src/libregexp
198 .SH "SEE ALSO"
199 .MR grep (1)
200 .SH DIAGNOSTICS
201 .I Regcomp
202 returns
203 .B 0
204 for an illegal expression
205 or other failure.
206 .I Regexec
207 returns 0
208 if
209 .I string
210 is not matched.
211 .SH BUGS
212 There is no way to specify or match a NUL character; NULs terminate patterns and strings.