Blob


1 .TH VENTI 7
2 .SH NAME
3 venti \- archival storage server
4 .SH DESCRIPTION
5 Venti is a block storage server intended for archival data.
6 In a Venti server, the SHA1 hash of a block's contents acts
7 as the block identifier for read and write operations.
8 This approach enforces a write-once policy, preventing
9 accidental or malicious destruction of data. In addition,
10 duplicate copies of a block are coalesced, reducing the
11 consumption of storage and simplifying the implementation
12 of clients.
13 .PP
14 This manual page documents the basic concepts of
15 block storage using Venti as well as the Venti network protocol.
16 .PP
17 .IR Venti (1)
18 documents some simple clients.
19 .IR Vac (1),
20 .IR vbackup (1),
21 .IR vacfs (4),
22 and
23 .IR vnfs (4)
24 are more complex clients.
25 .PP
26 .IR Venti (3)
27 describes a C library interface for accessing
28 Venti servers and manipulating Venti data structures.
29 .PP
30 .IR Venti.conf (7)
31 describes the Venti server configuration file.
32 .PP
33 .IR Venti (8)
34 describes the programs used to run a Venti server.
35 .PP
36 .SS "Scores
37 The SHA1 hash that identifies a block is called its
38 .IR score .
39 The score of the zero-length block is called the
40 .IR "zero score" .
41 .PP
42 Scores may have an optional
43 .IB label :
44 prefix, typically used to
45 describe the format of the data.
46 For example,
47 .IR vac (1)
48 uses a
49 .B vac:
50 prefix, while
51 .IR vbackup (1)
52 uses prefixes corresponding to the file system
53 types:
54 .BR ext2: ,
55 .BR ffs: ,
56 and so on.
57 .SS "Files and Directories
58 Venti accepts blocks up to 56 kilobytes in size.
59 By convention, Venti clients use hash trees of blocks to
60 represent arbitrary-size data
61 .IR files .
62 The data to be stored is split into fixed-size
63 blocks and written to the server, producing a list
64 of scores.
65 The resulting list of scores is split into fixed-size pointer
66 blocks (using only an integral number of scores per block)
67 and written to the server, producing a smaller list
68 of scores.
69 The process continues, eventually ending with the
70 score for the hash tree's top-most block.
71 Each file stored this way is summarized by
72 a
73 .B VtEntry
74 structure recording the top-most score, the depth
75 of the tree, the data block size, and the pointer block size.
76 One or more
77 .B VtEntry
78 structures can be concatenated
79 and stored as a special file called a
80 .IR directory .
81 In this
82 manner, arbitrary trees of files can be constructed
83 and stored.
84 .PP
85 Scores passed between programs conventionally refer
86 to
87 .B VtRoot
88 blocks, which contain descriptive information
89 as well as the score of a block containing a small number
90 of
91 .B VtEntries .
92 .SS "Block Types
93 To allow programs to traverse these structures without
94 needing to understand their higher-level meanings,
95 Venti tags each block with a type. The types are:
96 .PP
97 .nf
98 .ft L
99 VtDataType 000 \f1data\fL
100 VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
101 VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
102 \fR\&...\fL
103 VtDirType 010 VtEntry\fR structures\fL
104 VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
105 VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
106 \fR\&...\fL
107 VtRootType 020 VtRoot\fR structure\fL
108 .fi
109 .PP
110 The octal numbers listed are the type numbers used
111 by the commands below.
112 (For historical reasons, the type numbers used on
113 disk and on the wire are different from the above.
114 They do not distinguish
115 .BI VtDataType+ n
116 blocks from
117 .BI VtDirType+ n
118 blocks.)
119 .SS "Zero Truncation
120 To avoid storing the same short data blocks padded with
121 differing numbers of zeros, Venti clients working with fixed-size
122 blocks conventionally
123 `zero truncate' the blocks before writing them to the server.
124 For example, if a 1024-byte data block contains the
125 11-byte string
126 .RB ` hello " " world '
127 followed by 1013 zero bytes,
128 a client would store only the 11-byte block.
129 When the client later read the block from the server,
130 it would append zeros to the end as necessary to
131 reach the expected size.
132 .PP
133 When truncating pointer blocks
134 .RB ( VtDataType+ \fIn
135 and
136 .BI VtDirType+ n
137 blocks),
138 trailing zero scores are removed
139 instead of trailing zero bytes.
140 .PP
141 Because of the truncation convention,
142 any file consisting entirely of zero bytes,
143 no matter what the length, will be represented by the zero score:
144 the data blocks contain all zeros and are thus truncated
145 to the empty block, and the pointer blocks contain all zero scores
146 and are thus also truncated to the empty block,
147 and so on up the hash tree.
148 .SS NETWORK PROTOCOL
149 A Venti session begins when a
150 .I client
151 connects to the network address served by a Venti
152 .IR server ;
153 the conventional address is
154 .BI tcp! server !venti
155 (the
156 .B venti
157 port is 17034).
158 Both client and server begin by sending a version
159 string of the form
160 .BI venti- versions - comment \en \fR.
161 The
162 .I versions
163 field is a list of acceptable versions separated by
164 colons.
165 The protocol described here is version
166 .B 02 .
167 The client is responsible for choosing a common
168 version and sending it in the
169 .B VtThello
170 message, described below.
171 .PP
172 After the initial version exchange, the client transmits
173 .I requests
174 .RI ( T-messages )
175 to the server, which subsequently returns
176 .I replies
177 .RI ( R-messages )
178 to the client.
179 The combined act of transmitting (receiving) a request
180 of a particular type, and receiving (transmitting) its reply
181 is called a
182 .I transaction
183 of that type.
184 .PP
185 Each message consists of a sequence of bytes.
186 Two-byte fields hold unsigned integers represented
187 in big-endian order (most significant byte first).
188 Data items of variable lengths are represented by
189 a one-byte field specifying a count,
190 .IR n ,
191 followed by
192 .I n
193 bytes of data.
194 Text strings are represented similarly,
195 using a two-byte count with
196 the text itself stored as a UTF-8 encoded sequence
197 of Unicode characters (see
198 .IR utf (7)).
199 Text strings are not
200 .SM NUL\c
201 -terminated:
202 .I n
203 counts the bytes of UTF-8 data, which include no final
204 zero byte.
205 The
206 .SM NUL
207 character is illegal in text strings in the Venti protocol.
208 The maximum string length in Venti is 1024 bytes.
209 .PP
210 Each Venti message begins with a two-byte size field
211 specifying the length in bytes of the message,
212 not including the length field itself.
213 The next byte is the message type, one of the constants
214 in the enumeration in the include file
215 .BR <venti.h> .
216 The next byte is an identifying
217 .IR tag ,
218 used to match responses with requests.
219 The remaining bytes are parameters of different sizes.
220 In the message descriptions, the number of bytes in a field
221 is given in brackets after the field name.
222 The notation
223 .IR parameter [ n ]
224 where
225 .I n
226 is not a constant represents a variable-length parameter:
227 .IR n [1]
228 followed by
229 .I n
230 bytes of data forming the
231 .IR parameter .
232 The notation
233 .IR string [ s ]
234 (using a literal
235 .I s
236 character)
237 is shorthand for
238 .IR s [2]
239 followed by
240 .I s
241 bytes of UTF-8 text.
242 The notation
243 .IR parameter []
244 where
245 .I parameter
246 is the last field in the message represents a
247 variable-length field that comprises all remaining
248 bytes in the message.
249 .PP
250 All Venti RPC messages are prefixed with a field
251 .IR size [2]
252 giving the length of the message that follows
253 (not including the
254 .I size
255 field itself).
256 The message bodies are:
257 .ta \w'\fLVtTgoodbye 'u
258 .IP
259 .ne 2v
260 .B VtThello
261 .IR tag [1]
262 .IR version [ s ]
263 .IR uid [ s ]
264 .IR strength [1]
265 .IR crypto [ n ]
266 .IR codec [ n ]
267 .br
268 .B VtRhello
269 .IR tag [1]
270 .IR sid [ s ]
271 .IR rcrypto [1]
272 .IR rcodec [1]
273 .IP
274 .ne 2v
275 .B VtTping
276 .IR tag [1]
277 .br
278 .B VtRping
279 .IR tag [1]
280 .IP
281 .ne 2v
282 .B VtTread
283 .IR tag [1]
284 .IR score [20]
285 .IR type [1]
286 .IR pad [1]
287 .IR count [2]
288 .br
289 .B VtRead
290 .IR tag [1]
291 .IR data []
292 .IP
293 .ne 2v
294 .B VtTwrite
295 .IR tag [1]
296 .IR type [1]
297 .IR pad [3]
298 .IR data []
299 .br
300 .B VtRwrite
301 .IR tag [1]
302 .IR score [20]
303 .IP
304 .ne 2v
305 .B VtTsync
306 .IR tag [1]
307 .br
308 .B VtRsync
309 .IR tag [1]
310 .IP
311 .ne 2v
312 .B VtRerror
313 .IR tag [1]
314 .IR error [ s ]
315 .IP
316 .ne 2v
317 .B VtTgoodbye
318 .IR tag [1]
319 .PP
320 Each T-message has a one-byte
321 .I tag
322 field, chosen and used by the client to identify the message.
323 The server will echo the request's
324 .I tag
325 field in the reply.
326 Clients should arrange that no two outstanding
327 messages have the same tag field so that responses
328 can be distinguished.
329 .PP
330 The type of an R-message will either be one greater than
331 the type of the corresponding T-message or
332 .BR Rerror ,
333 indicating that the request failed.
334 In the latter case, the
335 .I error
336 field contains a string describing the reason for failure.
337 .PP
338 Venti connections must begin with a
339 .B hello
340 transaction.
341 The
342 .B VtThello
343 message contains the protocol
344 .I version
345 that the client has chosen to use.
346 The fields
347 .IR strength ,
348 .IR crypto ,
349 and
350 .IR codec
351 could be used to add authentication, encryption,
352 and compression to the Venti session
353 but are currently ignored.
354 The
355 .IR rcrypto ,
356 and
357 .I rcodec
358 fields in the
359 .B VtRhello
360 response are similarly ignored.
361 The
362 .IR uid
363 and
364 .IR sid
365 fields are intended to be the identity
366 of the client and server but, given the lack of
367 authentication, should be treated only as advisory.
368 The initial
369 .B hello
370 should be the only
371 .B hello
372 transaction during the session.
373 .PP
374 The
375 .B ping
376 message has no effect and
377 is used mainly for debugging.
378 Servers should respond immediately to pings.
379 .PP
380 The
381 .B read
382 message requests a block with the given
383 .I score
384 and
385 .I type .
386 Use
387 .I vttodisktype
388 and
389 .I vtfromdisktype
390 (see
391 .IR venti (3))
392 to convert a block type enumeration value
393 .RB ( VtDataType ,
394 etc.)
395 to the
396 .I type
397 used on disk and in the protocol.
398 The
399 .I count
400 field specifies the maximum expected size
401 of the block.
402 The
403 .I data
404 in the reply is the block's contents.
405 .PP
406 The
407 .B write
408 message writes a new block of the given
409 .I type
410 with contents
411 .I data
412 to the server.
413 The response includes the
414 .I score
415 to use to read the block,
416 which should be the SHA1 hash of
417 .IR data .
418 .PP
419 The Venti server may buffer written blocks in memory,
420 waiting until after responding to the
421 .B write
422 message before writing them to
423 permanent storage.
424 The server will delay the response to a
425 .B sync
426 message until after all blocks in earlier
427 .B write
428 messages have been written to permanent storage.
429 .PP
430 The
431 .B goodbye
432 message ends a session. There is no
433 .BR VtRgoodbye :
434 upon receiving the
435 .BR VtTgoodbye
436 message, the server terminates up the connection.
437 .SH SEE ALSO
438 .IR venti (1),
439 .IR venti (3),
440 .IR venti (8)
441 .br
442 Sean Quinlan and Sean Dorward,
443 ``Venti: a new approach to archival storage'',
444 .I "Usenix Conference on File and Storage Technologies" ,
445 2002.