commit b7fcec2b293376a394f9dd12165aaa9edd51dd9a from: rsc date: Thu Jul 14 14:55:52 2005 UTC gone commit - c6f92061c02e35c84a168edfc23466d56aa2543a commit + b7fcec2b293376a394f9dd12165aaa9edd51dd9a blob - 000d8aa4beabb01ad9ffc8a8a5f9fa17169f10ad (mode 644) blob + /dev/null --- man/man7/venti.conf.7 +++ /dev/null @@ -1,360 +0,0 @@ -.TH VENTI.CONF 7 -.SH NAME -venti.conf \- venti configuration -.SH DESCRIPTION -Venti is a SHA1-addressed archival storage server. -See -.IR venti (7) -for a full introduction to the system. -This page documents the structure and operation of the server. -.PP -A venti server requires multiple disks or disk partitions, -each of which must be properly formatted before the server -can be run. -.SS Disk -The venti server maintains three disk structures, typically -stored on raw disk partitions: -the append-only -.IR "data log" , -which holds, in sequential order, -the contents of every block written to the server; -the -.IR index , -which helps locate a block in the data log given its score; -and optionally the -.IR "bloom filter" , -a concise summary of which scores are present in the index. -The data log is the primary storage. -To improve the robustness, it should be stored on -a device that provides RAID functionality. -The index and the bloom filter are optimizations -employed to access the data log efficiently and can be rebuilt -if lost or damaged. -.PP -The data log is logically split into sections called -.IR arenas , -typically sized for easy offline backup -(e.g., 500MB). -A data log may comprise many disks, each storing -one or more arenas. -Such disks are called -.IR "arena partitions" . -Arena partitions are filled in the order given in the configuration. -.PP -The index is logically split into block-sized pieces called -.IR buckets , -each of which is responsible for a particular range of scores. -An index may be split across many disks, each storing many buckets. -Such disks are called -.IR "index sections" . -.PP -The index must be sized so that no bucket is full. -When a bucket fills, the server must be shut down and -the index made larger. -Since scores appear random, each bucket will contain -approximately the same number of entries. -Index entries are 40 bytes long. Assuming that a typical block -being written to the server is 8192 bytes and compresses to 4096 -bytes, the active index is expected to be about 1% of -the active data log. -Storing smaller blocks increases the relative index footprint; -storing larger blocks decreases it. -To allow variation in both block size and the random distribution -of scores to buckets, the suggested index size is 5% of -the active data log. -.PP -The (optional) bloom filter is a large bitmap that is stored on disk but -also kept completely in memory while the venti server runs. -It helps the venti server efficiently detect scores that are -.I not -already stored in the index. -The bloom filter starts out zeroed. -Each score recorded in the bloom filter is hashed to choose -.I nhash -bits to set in the bloom filter. -A score is definitely not stored in the index of any of its -.I nhash -bits are not set. -The bloom filter thus has two parameters: -.I nhash -(maximum 32) -and the total bitmap size -(maximum 512MB, 2\s-2\u32\d\s+2 bits). -.PP -The bloom filter should be sized so that -.I nhash -\(ti -.I nblock -\(ti -0.7 -\(<= -0.7 \(ti -.IR b , -where -.I nblock -is the expected number of blocks stored on the server -and -.I b -is the bitmap size in bits. -The false positive rate of the bloom filter when sized -this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. -.I Nhash -less than 10 are not very useful; -.I nhash -greater than 24 are probably a waste of memory. -.I Fmtbloom -(see -.IR venti-fmt (8)) -can be given either -.I nhash -or -.IR nblock ; -if given -.IR nblock , -it will derive an appropriate -.IR nhash . -.SS Memory -Venti can make effective use of large amounts of memory -for various caches. -.PP -The -.I "lump cache -holds recently-accessed venti data blocks, which the server refers to as -.IR lumps . -The lump cache should be at least 1MB but can profitably be much larger. -The lump cache can be thought of as the level-1 cache: -read requests handled by the lump cache can -be served instantly. -.PP -The -.I "block cache -holds recently-accessed -.I disk -blocks from the arena partitions. -The block cache needs to be able to simultaneously hold two blocks -from each arena plus four blocks for the currently-filling arena. -The block cache can be thought of as the level-2 cache: -read requests handled by the block cache are slower than those -handled by the lump cache, since the lump data must be extracted -from the raw disk blocks and possibly decompressed, but no -disk accesses are necessary. -.PP -The -.I "index cache -holds recently-accessed or prefetched -index entries. -The index cache needs to be able to hold index entries -for three or four arenas, at least, in order for prefetching -to work properly. Each index entry is 50 bytes. -Assuming 500MB arenas of -128,000 blocks that are 4096 bytes each after compression, -the minimum index cache size is about 6MB. -The index cache can be thought of as the level-3 cache: -read requests handled by the index cache must still go -to disk to fetch the arena blocks, but the costly random -access to the index is avoided. -.PP -The size of the index cache determines how long venti -can sustain its `burst' write throughput, during which time -the only disk accesses on the critical path -are sequential writes to the arena partitions. -For example, if you want to be able to sustain 10MB/s -for an hour, you need enough index cache to hold entries -for 36GB of blocks. Assuming 8192-byte blocks, -you need room for almost five million index entries. -Since index entries are 50 bytes each, you need 250MB -of index cache. -If the background index update process can make a single -pass through the index in an hour, which is possible, -then you can sustain the 10MB/s indefinitely (at least until -the arenas are all filled). -.PP -The -.I "bloom filter -requires memory equal to its size on disk, -as discussed above. -.PP -A reasonable starting allocation is to -divide memory equally (in thirds) between -the bloom filter, the index cache, and the lump and block caches; -the third of memory allocated to the lump and block caches -should be split unevenly, with more (say, two thirds) -going to the block cache. -.SS Network -The venti server announces two network services, one -(conventionally TCP port -.BR venti , -17034) serving -the venti protocol as described in -.IR venti (7), -and one serving HTTP -(conventionally TCP port -.BR venti , -80). -.PP -The venti web server provides the following -URLs for accessing status information: -.TP -.B /index -A summary of the usage of the arenas and index sections. -.TP -.B /xindex -An XML version of -.BR /index . -.TP -.B /storage -Brief storage totals. -.TP -.BI /set/ variable -The current integer value of -.IR variable . -Variables are: -.BR compress , -whether or not to compress blocks -(for debugging); -.BR logging , -whether to write entries to the debugging logs; -.BR stats , -whether to collect run-time statistics; -.BR icachesleeptime , -the time in milliseconds between successive updates -of megabytes of the index cache; -.BR arenasumsleeptime , -the time in milliseconds between reads while -checksumming an arena in the background. -The two sleep times should be (but are not) managed by venti; -they exist to provide more experience with their effects. -The other variables exist only for debugging and -performance measurement. -.TP -.BI /set/ variable / value -Set -.I variable -to -.IR value . -.TP -.BI /graph/ name / param / param / \fR... -A PNG image graphing the named run-time statistic over time. -The details of names and parameters are undocumented; -see -.B httpd.c -in the venti sources. -.TP -.B /log -A list of all debugging logs present in the server's memory. -.TP -.BI /log/ name -The contents of the debugging log with the given -.IR name . -.TP -.B /flushicache -Force venti to begin flushing the index cache to disk. -The request response will not be sent until the flush -has completed. -.TP -.B /flushdcache -Force venti to begin flushing the arena block cache to disk. -The request response will not be sent until the flush -has completed. -.PD -.PP -Requests for other files are served by consulting a -directory named in the configuration file -(see -.B webroot -below). -.SS Configuration File -A venti configuration file -enumerates the various index sections and -arenas that constitute a venti system. -The components are indicated by the name of the file, typically -a disk partition, in which they reside. The configuration -file is the only location that file names are used. Internally, -venti uses the names assigned when the components were formatted -with -.I fmtarenas -or -.I fmtisect -(see -.IR venti-fmt (8)). -In particular, only the configuration needs to be -changed if a component is moved to a different file. -.PP -The configuration file consists of lines in the form described below. -Lines starting with -.B # -are comments. -.TP -.BI index " name -Names the index for the system. -.TP -.BI arenas " file -.I File -is an arena partition, formatted using -.IR fmtarenas . -.TP -.BI isect " file -.I File -is an index section, formatted using -.IR fmtisect . -.PP -After formatting a venti system using -.IR fmtindex , -the order of arenas and index sections should not be changed. -Additional arenas can be appended to the configuration; -run -.I fmtindex -with the -.B -a -flag to update the index. -.PP -The configuration file also holds configuration parameters -for the venti server itself. -These are: -.TF httpaddr netaddr -.TP -.BI mem " size -lump cache size -.TP -.BI bcmem " size -block cache size -.TP -.BI icmem " size -index cache size -.TP -.BI addr " netaddr -network address to announce venti service -(default -.BR tcp!*!venti ) -.TP -.BI httpaddr " netaddr -network address to announce HTTP service -(default -.BR tcp!*!http ) -.TP -.B queuewrites -queue writes in memory -(default is not to queue) -.PD -See the server description in -.IR venti (8) -for explanations of these variables. -.SH EXAMPLE -.IP -.EX -index main -isect /tmp/disks/isect0 -isect /tmp/disks/isect1 -arenas /tmp/disks/arenas -mem 10M -bcmem 20M -icmem 30M -.EE -.SH "SEE ALSO" -.IR venti (8), -.IR venti-fmt (8) -.SH BUGS -Setting up a venti server is too complicated. -.PP -Venti should not require the user to decide how to -partition its memory usage.