3 venti \- archival storage server
5 Venti is a block storage server intended for archival data.
6 In a Venti server, the SHA1 hash of a block's contents acts
7 as the block identifier for read and write operations.
8 This approach enforces a write-once policy, preventing
9 accidental or malicious destruction of data. In addition,
10 duplicate copies of a block are coalesced, reducing the
11 consumption of storage and simplifying the implementation
14 This manual page documents the basic concepts of
15 block storage using Venti as well as the Venti network protocol.
18 documents some simple clients.
23 are more complex clients.
26 describes a C library interface for accessing
27 Venti servers and manipulating Venti data structures.
30 describes the programs used to run a Venti server.
33 The SHA1 hash that identifies a block is called its
35 The score of the zero-length block is called the
38 Scores may have an optional
40 prefix, typically used to
41 describe the format of the data.
48 uses prefixes corresponding to the file system
53 .SS "Files and Directories
54 Venti accepts blocks up to 56 kilobytes in size.
55 By convention, Venti clients use hash trees of blocks to
56 represent arbitrary-size data
58 The data to be stored is split into fixed-size
59 blocks and written to the server, producing a list
61 The resulting list of scores is split into fixed-size pointer
62 blocks (using only an integral number of scores per block)
63 and written to the server, producing a smaller list
65 The process continues, eventually ending with the
66 score for the hash tree's top-most block.
67 Each file stored this way is summarized by
70 structure recording the top-most score, the depth
71 of the tree, the data block size, and the pointer block size.
74 structures can be concatenated
75 and stored as a special file called a
78 manner, arbitrary trees of files can be constructed
81 Scores passed between programs conventionally refer
84 blocks, which contain descriptive information
85 as well as the score of a directory block containing a small number
88 Conventionally, programs do not mix data and directory entries
89 in the same file. Instead, they keep two separate files, one with
90 directory entries and one with metadata referencing those
92 Keeping this parallel representation is a minor annoyance
93 but makes it possible for general programs like
97 to traverse the block tree without knowing the specific details
98 of any particular program's data.
100 To allow programs to traverse these structures without
101 needing to understand their higher-level meanings,
102 Venti tags each block with a type. The types are:
106 VtDataType 000 \f1data\fL
107 VtDataType+1 001 \fRscores of \fPVtDataType\fR blocks\fL
108 VtDataType+2 002 \fRscores of \fPVtDataType+1\fR blocks\fL
110 VtDirType 010 VtEntry\fR structures\fL
111 VtDirType+1 011 \fRscores of \fLVtDirType\fR blocks\fL
112 VtDirType+2 012 \fRscores of \fLVtDirType+1\fR blocks\fL
114 VtRootType 020 VtRoot\fR structure\fL
117 The octal numbers listed are the type numbers used
118 by the commands below.
119 (For historical reasons, the type numbers used on
120 disk and on the wire are different from the above.
121 They do not distinguish
127 To avoid storing the same short data blocks padded with
128 differing numbers of zeros, Venti clients working with fixed-size
129 blocks conventionally
130 `zero truncate' the blocks before writing them to the server.
131 For example, if a 1024-byte data block contains the
133 .RB ` hello " " world '
134 followed by 1013 zero bytes,
135 a client would store only the 11-byte block.
136 When the client later read the block from the server,
137 it would append zero bytes to the end as necessary to
138 reach the expected size.
140 When truncating pointer blocks
141 .RB ( VtDataType+ \fIn
145 trailing zero scores are removed
146 instead of trailing zero bytes.
148 Because of the truncation convention,
149 any file consisting entirely of zero bytes,
150 no matter what its length, will be represented by the zero score:
151 the data blocks contain all zeros and are thus truncated
152 to the empty block, and the pointer blocks contain all zero scores
153 and are thus also truncated to the empty block,
154 and so on up the hash tree.
156 A Venti session begins when a
158 connects to the network address served by a Venti
160 the conventional address is
161 .BI tcp! server !venti
165 Both client and server begin by sending a version
167 .BI venti- versions - comment \en \fR.
170 field is a list of acceptable versions separated by
172 The protocol described here is version
174 The client is responsible for choosing a common
175 version and sending it in the
177 message, described below.
179 After the initial version exchange, the client transmits
182 to the server, which subsequently returns
186 The combined act of transmitting (receiving) a request
187 of a particular type, and receiving (transmitting) its reply
192 Each message consists of a sequence of bytes.
193 Two-byte fields hold unsigned integers represented
194 in big-endian order (most significant byte first).
195 Data items of variable lengths are represented by
196 a one-byte field specifying a count,
201 Text strings are represented similarly,
202 using a two-byte count with
203 the text itself stored as a UTF-encoded sequence
204 of Unicode characters (see
210 counts the bytes of UTF data, which include no final
214 character is illegal in text strings in the Venti protocol.
215 The maximum string length in Venti is 1024 bytes.
217 Each Venti message begins with a two-byte size field
218 specifying the length in bytes of the message,
219 not including the length field itself.
220 The next byte is the message type, one of the constants
221 in the enumeration in the include file
223 The next byte is an identifying
225 used to match responses to requests.
226 The remaining bytes are parameters of different sizes.
227 In the message descriptions, the number of bytes in a field
228 is given in brackets after the field name.
233 is not a constant represents a variable-length parameter:
237 bytes of data forming the
253 is the last field in the message represents a
254 variable-length field that comprises all remaining
255 bytes in the message.
257 All Venti RPC messages are prefixed with a field
259 giving the length of the message that follows
263 The message bodies are:
264 .ta \w'\fLVtTgoodbye 'u
327 Each T-message has a one-byte
329 field, chosen and used by the client to identify the message.
330 The server will echo the request's
333 Clients should arrange that no two outstanding
334 messages have the same tag field so that responses
335 can be distinguished.
337 The type of an R-message will either be one greater than
338 the type of the corresponding T-message or
340 indicating that the request failed.
341 In the latter case, the
343 field contains a string describing the reason for failure.
345 Venti connections must begin with a
350 message contains the protocol
352 that the client has chosen to use.
358 could be used to add authentication, encryption,
359 and compression to the Venti session
360 but are currently ignored.
367 response are similarly ignored.
372 fields are intended to be the identity
373 of the client and server but, given the lack of
374 authentication, should be treated only as advisory.
379 transaction during the session.
383 message has no effect and
384 is used mainly for debugging.
385 Servers should respond immediately to pings.
389 message requests a block with the given
399 to convert a block type enumeration value
404 used on disk and in the protocol.
407 field specifies the maximum expected size
411 in the reply is the block's contents.
415 message writes a new block of the given
420 The response includes the
422 to use to read the block,
423 which should be the SHA1 hash of
426 The Venti server may buffer written blocks in memory,
427 waiting until after responding to the
429 message before writing them to
431 The server will delay the response to a
433 message until after all blocks in earlier
435 messages have been written to permanent storage.
439 message ends a session. There is no
443 message, the server terminates up the connection.
449 Sean Quinlan and Sean Dorward,
450 ``Venti: a new approach to archival storage'',
451 .I "Usenix Conference on File and Storage Technologies" ,