Commit Diff


commit - c6f92061c02e35c84a168edfc23466d56aa2543a
commit + b7fcec2b293376a394f9dd12165aaa9edd51dd9a
blob - 000d8aa4beabb01ad9ffc8a8a5f9fa17169f10ad (mode 644)
blob + /dev/null
--- man/man7/venti.conf.7
+++ /dev/null
@@ -1,360 +0,0 @@
-.TH VENTI.CONF 7
-.SH NAME
-venti.conf  \- venti configuration
-.SH DESCRIPTION
-Venti is a SHA1-addressed archival storage server.
-See 
-.IR venti (7)
-for a full introduction to the system.
-This page documents the structure and operation of the server.
-.PP
-A venti server requires multiple disks or disk partitions,
-each of which must be properly formatted before the server
-can be run.
-.SS Disk 
-The venti server maintains three disk structures, typically
-stored on raw disk partitions:
-the append-only
-.IR "data log" ,
-which holds, in sequential order,
-the contents of every block written to the server;
-the 
-.IR index ,
-which helps locate a block in the data log given its score;
-and optionally the 
-.IR "bloom filter" ,
-a concise summary of which scores are present in the index.
-The data log is the primary storage.
-To improve the robustness, it should be stored on
-a device that provides RAID functionality.
-The index and the bloom filter are optimizations 
-employed to access the data log efficiently and can be rebuilt
-if lost or damaged.
-.PP
-The data log is logically split into sections called
-.IR arenas ,
-typically sized for easy offline backup
-(e.g., 500MB).
-A data log may comprise many disks, each storing
-one or more arenas.
-Such disks are called
-.IR "arena partitions" .
-Arena partitions are filled in the order given in the configuration.
-.PP
-The index is logically split into block-sized pieces called
-.IR buckets ,
-each of which is responsible for a particular range of scores.
-An index may be split across many disks, each storing many buckets.
-Such disks are called
-.IR "index sections" .
-.PP
-The index must be sized so that no bucket is full.
-When a bucket fills, the server must be shut down and
-the index made larger.
-Since scores appear random, each bucket will contain
-approximately the same number of entries.
-Index entries are 40 bytes long.  Assuming that a typical block
-being written to the server is 8192 bytes and compresses to 4096
-bytes, the active index is expected to be about 1% of
-the active data log.
-Storing smaller blocks increases the relative index footprint;
-storing larger blocks decreases it.
-To allow variation in both block size and the random distribution
-of scores to buckets, the suggested index size is 5% of
-the active data log.
-.PP
-The (optional) bloom filter is a large bitmap that is stored on disk but
-also kept completely in memory while the venti server runs.
-It helps the venti server efficiently detect scores that are
-.I not
-already stored in the index.
-The bloom filter starts out zeroed.
-Each score recorded in the bloom filter is hashed to choose
-.I nhash
-bits to set in the bloom filter.
-A score is definitely not stored in the index of any of its
-.I nhash 
-bits are not set.
-The bloom filter thus has two parameters: 
-.I nhash
-(maximum 32)
-and the total bitmap size 
-(maximum 512MB, 2\s-2\u32\d\s+2 bits).
-.PP
-The bloom filter should be sized so that
-.I nhash
-\(ti
-.I nblock
-\(ti
-0.7
-\(<=
-0.7 \(ti
-.IR b ,
-where
-.I nblock
-is the expected number of blocks stored on the server
-and
-.I b
-is the bitmap size in bits.
-The false positive rate of the bloom filter when sized
-this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2.
-.I Nhash
-less than 10 are not very useful;
-.I nhash
-greater than 24 are probably a waste of memory.
-.I Fmtbloom
-(see
-.IR venti-fmt (8))
-can be given either
-.I nhash
-or
-.IR nblock ;
-if given
-.IR nblock ,
-it will derive an appropriate
-.IR nhash .
-.SS Memory
-Venti can make effective use of large amounts of memory
-for various caches.
-.PP
-The
-.I "lump cache
-holds recently-accessed venti data blocks, which the server refers to as 
-.IR lumps .
-The lump cache should be at least 1MB but can profitably be much larger.
-The lump cache can be thought of as the level-1 cache:
-read requests handled by the lump cache can
-be served instantly.
-.PP
-The
-.I "block cache
-holds recently-accessed
-.I disk
-blocks from the arena partitions.
-The block cache needs to be able to simultaneously hold two blocks
-from each arena plus four blocks for the currently-filling arena.
-The block cache can be thought of as the level-2 cache:
-read requests handled by the block cache are slower than those
-handled by the lump cache, since the lump data must be extracted
-from the raw disk blocks and possibly decompressed, but no
-disk accesses are necessary.
-.PP
-The
-.I "index cache
-holds recently-accessed or prefetched
-index entries.
-The index cache needs to be able to hold index entries
-for three or four arenas, at least, in order for prefetching
-to work properly.  Each index entry is 50 bytes.
-Assuming 500MB arenas of
-128,000 blocks that are 4096 bytes each after compression,
-the minimum index cache size is about 6MB.
-The index cache can be thought of as the level-3 cache:
-read requests handled by the index cache must still go
-to disk to fetch the arena blocks, but the costly random
-access to the index is avoided.
-.PP
-The size of the index cache determines how long venti
-can sustain its `burst' write throughput, during which time
-the only disk accesses on the critical path
-are sequential writes to the arena partitions.
-For example, if you want to be able to sustain 10MB/s
-for an hour, you need enough index cache to hold entries
-for 36GB of blocks.  Assuming 8192-byte blocks,
-you need room for almost five million index entries.
-Since index entries are 50 bytes each, you need 250MB
-of index cache.
-If the background index update process can make a single
-pass through the index in an hour, which is possible,
-then you can sustain the 10MB/s indefinitely (at least until
-the arenas are all filled).
-.PP
-The
-.I "bloom filter
-requires memory equal to its size on disk,
-as discussed above.
-.PP
-A reasonable starting allocation is to
-divide memory equally (in thirds) between
-the bloom filter, the index cache, and the lump and block caches;
-the third of memory allocated to the lump and block caches 
-should be split unevenly, with more (say, two thirds)
-going to the block cache.
-.SS Network
-The venti server announces two network services, one 
-(conventionally TCP port 
-.BR venti ,
-17034) serving
-the venti protocol as described in
-.IR venti (7),
-and one serving HTTP
-(conventionally TCP port 
-.BR venti ,
-80).
-.PP
-The venti web server provides the following 
-URLs for accessing status information:
-.TP
-.B /index
-A summary of the usage of the arenas and index sections.
-.TP
-.B /xindex
-An XML version of
-.BR /index .
-.TP
-.B /storage
-Brief storage totals.
-.TP
-.BI /set/ variable
-The current integer value of
-.IR variable .
-Variables are:
-.BR compress ,
-whether or not to compress blocks
-(for debugging);
-.BR logging ,
-whether to write entries to the debugging logs;
-.BR stats ,
-whether to collect run-time statistics;
-.BR icachesleeptime ,
-the time in milliseconds between successive updates
-of megabytes of the index cache;
-.BR arenasumsleeptime ,
-the time in milliseconds between reads while
-checksumming an arena in the background.
-The two sleep times should be (but are not) managed by venti;
-they exist to provide more experience with their effects.
-The other variables exist only for debugging and
-performance measurement.
-.TP
-.BI /set/ variable / value
-Set
-.I variable
-to
-.IR value .
-.TP
-.BI /graph/ name / param / param / \fR...
-A PNG image graphing the named run-time statistic over time.
-The details of names and parameters are undocumented;
-see
-.B httpd.c
-in the venti sources.
-.TP
-.B /log
-A list of all debugging logs present in the server's memory.
-.TP
-.BI /log/ name
-The contents of the debugging log with the given
-.IR name .
-.TP
-.B /flushicache
-Force venti to begin flushing the index cache to disk.
-The request response will not be sent until the flush
-has completed.
-.TP
-.B /flushdcache
-Force venti to begin flushing the arena block cache to disk.
-The request response will not be sent until the flush
-has completed.
-.PD
-.PP
-Requests for other files are served by consulting a
-directory named in the configuration file
-(see
-.B webroot
-below).
-.SS Configuration File
-A venti configuration file 
-enumerates the various index sections and
-arenas that constitute a venti system.
-The components are indicated by the name of the file, typically
-a disk partition, in which they reside.  The configuration
-file is the only location that file names are used.  Internally,
-venti uses the names assigned when the components were formatted
-with 
-.I fmtarenas
-or 
-.I fmtisect
-(see
-.IR venti-fmt (8)).
-In particular, only the configuration needs to be
-changed if a component is moved to a different file.
-.PP
-The configuration file consists of lines in the form described below.
-Lines starting with
-.B #
-are comments.
-.TP
-.BI index " name
-Names the index for the system.
-.TP
-.BI arenas " file
-.I File
-is an arena partition, formatted using
-.IR fmtarenas .
-.TP
-.BI isect " file
-.I File
-is an index section, formatted using
-.IR fmtisect .
-.PP
-After formatting a venti system using
-.IR fmtindex ,
-the order of arenas and index sections should not be changed.
-Additional arenas can be appended to the configuration;
-run
-.I fmtindex
-with the
-.B -a
-flag to update the index.
-.PP
-The configuration file also holds configuration parameters
-for the venti server itself.
-These are:
-.TF httpaddr netaddr
-.TP
-.BI mem " size
-lump cache size
-.TP
-.BI bcmem " size
-block cache size
-.TP
-.BI icmem " size
-index cache size
-.TP
-.BI addr " netaddr
-network address to announce venti service
-(default
-.BR tcp!*!venti )
-.TP
-.BI httpaddr " netaddr
-network address to announce HTTP service
-(default
-.BR tcp!*!http )
-.TP
-.B queuewrites
-queue writes in memory
-(default is not to queue)
-.PD
-See the server description in
-.IR venti (8)
-for explanations of these variables.
-.SH EXAMPLE
-.IP
-.EX
-index main
-isect /tmp/disks/isect0
-isect /tmp/disks/isect1
-arenas /tmp/disks/arenas
-mem 10M
-bcmem 20M
-icmem 30M
-.EE
-.SH "SEE ALSO"
-.IR venti (8),
-.IR venti-fmt (8)
-.SH BUGS
-Setting up a venti server is too complicated.
-.PP
-Venti should not require the user to decide how to
-partition its memory usage.