Blob


1 .TH VENTI 8
2 .SH NAME
3 venti.conf \- venti configuration
4 .SH DESCRIPTION
5 Venti is a SHA1-addressed archival storage server.
6 See
7 .IR venti (7)
8 for a full introduction to the system.
9 This page documents the structure and operation of the server.
10 .PP
11 A venti server requires multiple disks or disk partitions,
12 each of which must be properly formatted before the server
13 can be run.
14 .SS Disk
15 The venti server maintains three disk structures, typically
16 stored on raw disk partitions:
17 the append-only
18 .IR "data log" ,
19 which holds, in sequential order,
20 the contents of every block written to the server;
21 the
22 .IR index ,
23 which helps locate a block in the data log given its score;
24 and optionally the
25 .IR "bloom filter" ,
26 a concise summary of which scores are present in the index.
27 The data log is the primary storage.
28 To improve the robustness, it should be stored on
29 a device that provides RAID functionality.
30 The index and the bloom filter are optimizations
31 employed to access the data log efficiently and can be rebuilt
32 if lost or damaged.
33 .PP
34 The data log is logically split into sections called
35 .IR arenas ,
36 typically sized for easy offline backup
37 (e.g., 500MB).
38 A data log may comprise many disks, each storing
39 one or more arenas.
40 Such disks are called
41 .IR "arena partitions" .
42 Arena partitions are filled in the order given in the configuration.
43 .PP
44 The index is logically split into block-sized pieces called
45 .IR buckets ,
46 each of which is responsible for a particular range of scores.
47 An index may be split across many disks, each storing many buckets.
48 Such disks are called
49 .IR "index sections" .
50 .PP
51 The index must be sized so that no bucket is full.
52 When a bucket fills, the server must be shut down and
53 the index made larger.
54 Since scores appear random, each bucket will contain
55 approximately the same number of entries.
56 Index entries are 40 bytes long. Assuming that a typical block
57 being written to the server is 8192 bytes and compresses to 4096
58 bytes, the active index is expected to be about 1% of
59 the active data log.
60 Storing smaller blocks increases the relative index footprint;
61 storing larger blocks decreases it.
62 To allow variation in both block size and the random distribution
63 of scores to buckets, the suggested index size is 5% of
64 the active data log.
65 .PP
66 The (optional) bloom filter is a large bitmap that is stored on disk but
67 also kept completely in memory while the venti server runs.
68 It helps the venti server efficiently detect scores that are
69 .I not
70 already stored in the index.
71 The bloom filter starts out zeroed.
72 Each score recorded in the bloom filter is hashed to choose
73 .I nhash
74 bits to set in the bloom filter.
75 A score is definitely not stored in the index of any of its
76 .I nhash
77 bits are not set.
78 The bloom filter thus has two parameters:
79 .I nhash
80 (maximum 32)
81 and the total bitmap size
82 (maximum 512MB, 2\s-2\u32\d\s+2 bits).
83 .PP
84 The bloom filter should be sized so that
85 .I nhash
86 \(ti
87 .I nblock
88 \(ti
89 0.7
90 \(<=
91 0.7 \(ti
92 .IR b ,
93 where
94 .I nblock
95 is the expected number of blocks stored on the server
96 and
97 .I b
98 is the bitmap size in bits.
99 The false positive rate of the bloom filter when sized
100 this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2.
101 .I Nhash
102 less than 10 are not very useful;
103 .I nhash
104 greater than 24 are probably a waste of memory.
105 .I Fmtbloom
106 (see
107 .IR venti-fmt (8))
108 can be given either
109 .I nhash
110 or
111 .IR nblock ;
112 if given
113 .IR nblock ,
114 it will derive an appropriate
115 .IR nhash .
116 .SS Memory
117 Venti can make effective use of large amounts of memory
118 for various caches.
119 .PP
120 The
121 .I "lump cache
122 holds recently-accessed venti data blocks, which the server refers to as
123 .IR lumps .
124 The lump cache should be at least 1MB but can profitably be much larger.
125 The lump cache can be thought of as the level-1 cache:
126 read requests handled by the lump cache can
127 be served instantly.
128 .PP
129 The
130 .I "block cache
131 holds recently-accessed
132 .I disk
133 blocks from the arena partitions.
134 The block cache needs to be able to simultaneously hold two blocks
135 from each arena plus four blocks for the currently-filling arena.
136 The block cache can be thought of as the level-2 cache:
137 read requests handled by the block cache are slower than those
138 handled by the lump cache, since the lump data must be extracted
139 from the raw disk blocks and possibly decompressed, but no
140 disk accesses are necessary.
141 .PP
142 The
143 .I "index cache
144 holds recently-accessed or prefetched
145 index entries.
146 The index cache needs to be able to hold index entries
147 for three or four arenas, at least, in order for prefetching
148 to work properly. Each index entry is 50 bytes.
149 Assuming 500MB arenas of
150 128,000 blocks that are 4096 bytes each after compression,
151 the minimum index cache size is about 6MB.
152 The index cache can be thought of as the level-3 cache:
153 read requests handled by the index cache must still go
154 to disk to fetch the arena blocks, but the costly random
155 access to the index is avoided.
156 .PP
157 The size of the index cache determines how long venti
158 can sustain its `burst' write throughput, during which time
159 the only disk accesses on the critical path
160 are sequential writes to the arena partitions.
161 For example, if you want to be able to sustain 10MB/s
162 for an hour, you need enough index cache to hold entries
163 for 36GB of blocks. Assuming 8192-byte blocks,
164 you need room for almost five million index entries.
165 Since index entries are 50 bytes each, you need 250MB
166 of index cache.
167 If the background index update process can make a single
168 pass through the index in an hour, which is possible,
169 then you can sustain the 10MB/s indefinitely (at least until
170 the arenas are all filled).
171 .PP
172 The
173 .I "bloom filter
174 requires memory equal to its size on disk,
175 as discussed above.
176 .PP
177 A reasonable starting allocation is to
178 divide memory equally (in thirds) between
179 the bloom filter, the index cache, and the lump and block caches;
180 the third of memory allocated to the lump and block caches
181 should be split unevenly, with more (say, two thirds)
182 going to the block cache.
183 .SS Network
184 The venti server announces two network services, one
185 (conventionally TCP port
186 .BR venti ,
187 17034) serving
188 the venti protocol as described in
189 .IR venti (7),
190 and one serving HTTP
191 (conventionally TCP port
192 .BR venti ,
193 80).
194 .PP
195 The venti web server provides the following
196 URLs for accessing status information:
197 .TP
198 .B /index
199 A summary of the usage of the arenas and index sections.
200 .TP
201 .B /xindex
202 An XML version of
203 .BR /index .
204 .TP
205 .B /storage
206 Brief storage totals.
207 .TP
208 .BI /set/ variable
209 The current integer value of
210 .IR variable .
211 Variables are:
212 .BR compress ,
213 whether or not to compress blocks
214 (for debugging);
215 .BR logging ,
216 whether to write entries to the debugging logs;
217 .BR stats ,
218 whether to collect run-time statistics;
219 .BR icachesleeptime ,
220 the time in milliseconds between successive updates
221 of megabytes of the index cache;
222 .BR arenasumsleeptime ,
223 the time in milliseconds between reads while
224 checksumming an arena in the background.
225 The two sleep times should be (but are not) managed by venti;
226 they exist to provide more experience with their effects.
227 The other variables exist only for debugging and
228 performance measurement.
229 .TP
230 .BI /set/ variable / value
231 Set
232 .I variable
233 to
234 .IR value .
235 .TP
236 .BI /graph/ name / param / param / \fR...
237 A PNG image graphing the named run-time statistic over time.
238 The details of names and parameters are undocumented;
239 see
240 .B httpd.c
241 in the venti sources.
242 .TP
243 .B /log
244 A list of all debugging logs present in the server's memory.
245 .TP
246 .BI /log/ name
247 The contents of the debugging log with the given
248 .IR name .
249 .TP
250 .B /flushicache
251 Force venti to begin flushing the index cache to disk.
252 The request response will not be sent until the flush
253 has completed.
254 .TP
255 .B /flushdcache
256 Force venti to begin flushing the arena block cache to disk.
257 The request response will not be sent until the flush
258 has completed.
259 .PD
260 .PP
261 Requests for other files are served by consulting a
262 directory named in the configuration file
263 (see
264 .B webroot
265 below).
266 .SS Configuration File
267 A venti configuration file
268 enumerates the various index sections and
269 arenas that constitute a venti system.
270 The components are indicated by the name of the file, typically
271 a disk partition, in which they reside. The configuration
272 file is the only location that file names are used. Internally,
273 venti uses the names assigned when the components were formatted
274 with
275 .I fmtarenas
276 or
277 .I fmtisect
278 (see
279 .IR venti-fmt (8)).
280 In particular, only the configuration needs to be
281 changed if a component is moved to a different file.
282 .PP
283 The configuration file consists of lines in the form described below.
284 Lines starting with
285 .B #
286 are comments.
287 .TP
288 .BI index " name
289 Names the index for the system.
290 .TP
291 .BI arenas " file
292 .I File
293 is an arena partition, formatted using
294 .IR fmtarenas .
295 .TP
296 .BI isect " file
297 .I File
298 is an index section, formatted using
299 .IR fmtisect .
300 .PP
301 After formatting a venti system using
302 .IR fmtindex ,
303 the order of arenas and index sections should not be changed.
304 Additional arenas can be appended to the configuration;
305 run
306 .I fmtindex
307 with the
308 .B -a
309 flag to update the index.
310 .PP
311 The configuration file also holds configuration parameters
312 for the venti server itself.
313 These are:
314 .TF httpaddr netaddr
315 .TP
316 .BI mem " size
317 lump cache size
318 .TP
319 .BI bcmem " size
320 block cache size
321 .TP
322 .BI icmem " size
323 index cache size
324 .TP
325 .BI addr " netaddr
326 network address to announce venti service
327 (default
328 .BR tcp!*!venti )
329 .TP
330 .BI httpaddr " netaddr
331 network address to announce HTTP service
332 (default
333 .BR tcp!*!http )
334 .TP
335 .B queuewrites
336 queue writes in memory
337 (default is not to queue)
338 .TP
339 .BI webroot " dir
340 directory tree containing files for HTTP server
341 to consult for unrecognized URLs
342 .PD
343 .PP
344 The units for the various cache sizes above can be specified by appending a
345 .LR k ,
346 .LR m ,
347 or
348 .LR g
349 (case-insensitive)
350 to indicate kilobytes, megabytes, or gigabytes respectively.
351 .SS Command Line
352 Options to
353 .I venti
354 are:
355 .TP
356 .BI -c " config
357 The server configuration file
358 (default
359 .BR venti.conf )
360 .TP
361 .BI -o " line
362 Set a server parameter, using the same syntax
363 as in the configuration file.
364 The
365 .B -o
366 options override the configuration file.
367 .TP
368 .B -d
369 Produce various debugging information on standard error.
370 Implies
371 .BR -s .
372 .TP
373 .B -L
374 Enable logging. By default all logging is disabled.
375 Logging slows server operation considerably.
376 .TP
377 .B -s
378 Do not run in the background.
379 Normally,
380 the foreground process will exit once the Venti server
381 is initialized and ready for connections.
382 .PD
383 .SH EXAMPLE
384 A simple configuration:
385 .IP
386 .EX
387 % cat venti.conf
388 index main
389 isect /tmp/disks/isect0
390 isect /tmp/disks/isect1
391 arenas /tmp/disks/arenas
392 mem 10M
393 bcmem 20M
394 icmem 30M
396 .EE
397 .PP
398 Format the index sections, the arena partition, and
399 finally the main index:
400 .IP
401 .EX
402 % venti/fmtisect isect0. /tmp/disks/isect0 &
403 % venti/fmtisect isect1. /tmp/disks/isect1 &
404 % venti/fmtarenas arenas0. /tmp/disks/arenas &
405 % wait
406 % venti/fmtindex venti.conf
408 .EE
409 .PP
410 Start the server and check the storage statistics:
411 .IP
412 .EX
413 % venti/venti
414 % hget http://$sysname/storage
415 .EE
416 .SH "SEE ALSO"
417 .IR venti (1),
418 .IR venti (3),
419 .IR venti (7),
420 .IR venti-backup (8)
421 .IR venti-fmt (8)
422 .br
423 Sean Quinlan and Sean Dorward,
424 ``Venti: a new approach to archival storage'',
425 .I "Usenix Conference on File and Storage Technologies" ,
426 2002.
427 .SH BUGS
428 Setting up a venti server is too complicated.
429 .PP
430 Venti should not require the user to decide how to
431 partition its memory usage.