commit bdf5b5cde1d463ce11b1e62eba68dab25f5b7422 from: rsc date: Wed Jul 13 13:23:47 2005 UTC new man pages commit - df03d60c047a3010f800b26883763764be8d5744 commit + bdf5b5cde1d463ce11b1e62eba68dab25f5b7422 blob - /dev/null blob + d707091bfb995efbf6f19552cc8358a44fbd72a2 (mode 644) --- /dev/null +++ man/man8/venti-backup.8 @@ -0,0 +1,106 @@ +.TH VENTI-BACKUP 8 +.SH NAME +rdarena, wrarena \- copy arenas between venti servers +.SH SYNOPSIS +.PP +.B venti/rdarena +[ +.B -v +] +.I arenapart +.I arenaname +.PP +.B venti/wrarena +[ +.B -o +.I fileoffset +] +[ +.B -h +.I host +] +.I arenafile +[ +.I clumpoffset +] +.SH DESCRIPTION +.PP +.I Rdarena +extracts the named +.I arena +from the arena partition +.I arenapart +and writes this arena to standard output. +This command is typically used to back up an arena to external media. +The +.B -v +option generates more verbose output on standard error. +.PP +.I Wrarena +writes the blocks contained in the arena +.I arenafile +(typically, the output of +.IR rdarena ) +to a Venti server. +It is typically used to reinitialize a Venti server from backups of the arenas. +For example, +.IP +.EX +venti/rdarena /dev/sdC0/arenas arena.0 >external.media +venti/wrarena -h venti2 external.media +.EE +.LP +writes the blocks contained in +.B arena.0 +to the Venti server +.B venti2 +(typically not the one using +.BR /dev/sdC0/arenas ). +.PP +The +.B -o +option specifies that the arena starts at byte +.I fileoffset +(default +.BR 0 ) +in +.I arenafile . +This is useful for reading directly from +the Venti arena partition: +.IP +.EX +venti/wrarena -h venti2 -o 335872 /dev/sdC0/arenas +.EE +.LP +(In this example, 335872 is the offset shown in the Venti +server's index list (344064) minus one block (8192). +You will need to substitute your own arena offsets +and block size.) +.PP +Finally, the optional +.I offset +argument specifies that the writing should begin with the +clump starting at +.I offset +within the arena. +.I Wrarena +prints the offset it stopped at (because there were no more data blocks). +This could be used to incrementally back up a Venti server +to another Venti server: +.IP +.EX +last=`{cat last} +venti/wrarena -h venti2 -o 335872 /dev/sdC0/arenas $last >output +awk '/^end offset/ { print $3 }' offset >last +.EE +.LP +Of course, one would need to add wrapper code to keep track +of which arenas have been processed. +See +.B /sys/src/cmd/venti/backup.example +for a version that does this. +.SH SOURCE +.B \*9/src/cmd/venti/srv +.SH SEE ALSO +.IR venti (7), +.IR venti (8) blob - /dev/null blob + 4c1303312f2dfdf6330293a1b58e19a703b96642 (mode 644) --- /dev/null +++ man/man8/venti-fmt.8 @@ -0,0 +1,346 @@ +.TH VENTI-FMT 8 +.SH NAME +buildindex, +checkarenas, +checkindex, +conf, +fmtarenas, +fmtindex, +fmtisect, +syncindex \- prepare and maintain a venti server +.SH SYNOPSIS +.PP +.B venti/fmtarenas +[ +.B -Z +] +[ +.B -a +.I arenasize +] +[ +.B -b +.I blocksize +] +.I name +.I file +.PP +.B venti/fmtisect +[ +.B -1Z +] +[ +.B -b +.I blocksize +] +.I name +.I file +.PP +.B venti/fmtindex +[ +.B -a +] +.I venti.conf +.PP +.B venti/conf +[ +.B -w +] +.I partition +[ +.I configfile +] +.if t .sp 0.5 +.PP +.B venti/buildindex +[ +.B -B +.I blockcachesize +] +[ +.B -Z +] +.I venti.conf +.I tmp +.PP +.B venti/checkindex +[ +.B -f +] +[ +.B -B +.I blockcachesize +] +.I venti.conf +.I tmp +.PP +.B venti/checkarenas +[ +.B -afv +] +.I file +.PP +.B venti/copy +[ +.B -f +] +.I src +.I dst +.I score +[ +.I type +] +.SH DESCRIPTION +These commands aid in the setup, maintenance, and debugging of +venti servers. +See +.IR venti (7) +for an overview of the venti system and +.IR venti (8) +for an overview of the data structures used by the venti server. +.PP +Note that the units for the various sizes in the following +commands can be specified by appending +.LR k , +.LR m , +or +.LR g +to indicate kilobytes, megabytes, or gigabytes respectively. +.SS Formatting +To prepare a server for its initial use, the arena partitions and +the index sections must be formatted individually, with +.I fmtarenas +and +.IR fmtisect . +Then the +collection of index sections must be combined into a venti +index with +.IR fmtindex . +.PP +.I Fmtarenas +formats the given +.IR file , +typically a disk partition, into an arena partition. +The arenas in the partition are given names of the form +.IR name%d , +where +.I %d +is replaced with a sequential number starting at 0. +.PP +Options to +.I fmtarenas +are: +.TP +.BI -a " arenasize +The arenas are of +.I arenasize +bytes. The default is +.BR 512M , +which was selected to provide a balance +between the number of arenas and the ability to copy an arena to external +media such as recordable CDs and tapes. +.TP +.BI -b " blocksize +The size, in bytes, for read and write operations to the file. +The size is recorded in the file, and is used by applications that access the arenas. +The default is +.BR 8k . +.TP +.B -4 +Create a `version 4' arena partition for backwards compatibility with old servers. +The default is version 5, used by the current venti server. +.TP +.B -Z +Do not zero the data sections of the arenas. +Using this option reduces the formatting time +but should only be used when it is known that the file was already zeroed. +(Version 4 only; version 5 sections are not and do not need to be zeroed.) +.PD +.PP +.I Fmtisect +formats the given +.IR file , +typically a disk partition, as a venti index section with the specified +.IR name . +Each of the index sections in a venti configuration must have a unique name. +.PP +Options to +.I fmtisect +are: +.TP +.BI -b " bucketsize +The size of an index bucket, in bytes. +All the index sections within a index must have the same bucket size. +The default is +.BR 8k . +.TP +.B -1 +Create a `version 1' index section for backwards compatibility with old servers. +The default is version 2, used by the current venti server. +.TP +.B -Z +Do not zero the index. +Using this option reduces the formatting time +but should only be used when it is known that the file was already zeroed. +(Version 1 only; version 2 sections are not and do not need to be zeroed.) +.PD +.I Fmtindex +reads the configuration file +.I venti.conf +and initializes the index sections to form a usable index structure. +The arena files and index sections must have previously been formatted +using +.I fmtarenas +and +.I fmtisect +respectively. +.PP +The function of a venti index is to map a SHA1 fingerprint to a location +in the data section of one of the arenas. The index is composed of +blocks, each of which contains the mapping for a fixed range of possible +fingerprint values. +.I Fmtindex +determines the mapping between SHA1 values and the blocks +of the collection of index sections. Once this mapping has been determined, +it cannot be changed without rebuilding the index. +The basic assumption in the current implementation is that the index +structure is sufficiently empty that individual blocks of the index will rarely +overflow. The total size of the index should be about 2% to 10% of +the total size of the arenas, but the exact percentage depends both on the +index block size and the compressed size of blocks stored. +See the discussion in +.IR venti (8) +for more. +.PP +.I Fmtindex +also computes a mapping between a linear address space and +the data section of the collection of arenas. The +.B -a +option can be used to add additional arenas to an index. +To use this feature, +add the new arenas to +.I venti.conf +after the existing arenas and then run +.I fmtindex +.BR -a . +.PP +A copy of the above mappings is stored in the header for each of the index sections. +These copies enable +.I buildindex +to restore a single index section without rebuilding the entire index. +.PP +To make it easier to bootstrap servers, the configuration +file can be stored in otherwise empty space +at the beginning of any venti partitions using +.IR conf . +A partition so branded with a configuration file can +be used in place of a configuration file when invoking any +of the venti commands. +By default, +.I conf +prints the configuration stored in +.IR partition . +When invoked with the +.B -w +flag, +.I conf +reads a configuration file from +.I configfile +(or else standard input) +and stores it in +.IR partition . +.SS Checking and Rebuilding +.PP +.I Buildindex +populates the index for the Venti system described in +.IR venti.conf . +The index must have previously been formatted using +.IR fmtindex . +This command is typically used to build a new index for a Venti +system when the old index becomes too small, or to rebuild +an index after media failure. +Small errors in an index can usually be fixed with +.IR checkindex . +.PP +The +.I tmp +file, usually a disk partition, must be large enough to store a copy of the index. +This temporary space is used to perform a merge sort of index entries +generated by reading the arenas. +.PP +Options to +.I buildindex +are: +.TP +.BI -B " blockcachesize +The amount of memory, in bytes, to use for caching raw disk accesses while running +.IR buildindex . +(This is not a property of the created index.) +The default is 8k. +.TP +.B -Z +Do not zero the index. +This option should only be used when it is known that the index was already zeroed. +(Version 1 indexes only; see the discussion in +.I fmtindex +above.) +.PD +.PP +.I Checkindex +examines the Venti index described in +.IR venti.conf . +The program detects various error conditions including: +blocks that are not indexed, index entries for blocks that do not exist, +and duplicate index entries. +If requested, an attempt can be made to fix errors that are found. +.PP +The +.I tmp +file, usually a disk partition, must be large enough to store a copy of the index. +This temporary space is used to perform a merge sort of index entries +generated by reading the arenas. +.PP +Options to +.I checkindex +are: +.TP +.BI -B " blockcachesize +The amount of memory, in bytes, to use for caching raw disk accesses while running +.IR checkindex . +The default is 8k. +.TP +.B -f +Attempt to fix any errors that are found. +.PD +.PP +.I Checkarenas +examines the Venti arenas contained in the given +.IR file . +The program detects various error conditions, and optionally attempts +to fix any errors that are found. +.PP +Options to +.I checkarenas +are: +.TP +.B -a +For each arena, scan the entire data section. +If this option is omitted, only the end section of +the arena is examined. +.TP +.B -f +Attempt to fix any errors that are found. +.TP +.B -v +Increase the verbosity of output. +.PD +.SH SOURCE +.B \*9/src/cmd/venti/srv +.SH SEE ALSO +.IR venti (7), +.IR venti (8) +.SH BUGS +.I Buildindex +should allow an individual index section to be rebuilt. +The merge sort could be performed in the space used to store the +index rather than requiring a temporary file. blob - /dev/null blob + 2327529fe60730984d4e5761ee91e3af21c4567a (mode 644) --- /dev/null +++ man/man8/venti.8 @@ -0,0 +1,431 @@ +.TH VENTI 8 +.SH NAME +venti.conf \- venti configuration +.SH DESCRIPTION +Venti is a SHA1-addressed archival storage server. +See +.IR venti (7) +for a full introduction to the system. +This page documents the structure and operation of the server. +.PP +A venti server requires multiple disks or disk partitions, +each of which must be properly formatted before the server +can be run. +.SS Disk +The venti server maintains three disk structures, typically +stored on raw disk partitions: +the append-only +.IR "data log" , +which holds, in sequential order, +the contents of every block written to the server; +the +.IR index , +which helps locate a block in the data log given its score; +and optionally the +.IR "bloom filter" , +a concise summary of which scores are present in the index. +The data log is the primary storage. +To improve the robustness, it should be stored on +a device that provides RAID functionality. +The index and the bloom filter are optimizations +employed to access the data log efficiently and can be rebuilt +if lost or damaged. +.PP +The data log is logically split into sections called +.IR arenas , +typically sized for easy offline backup +(e.g., 500MB). +A data log may comprise many disks, each storing +one or more arenas. +Such disks are called +.IR "arena partitions" . +Arena partitions are filled in the order given in the configuration. +.PP +The index is logically split into block-sized pieces called +.IR buckets , +each of which is responsible for a particular range of scores. +An index may be split across many disks, each storing many buckets. +Such disks are called +.IR "index sections" . +.PP +The index must be sized so that no bucket is full. +When a bucket fills, the server must be shut down and +the index made larger. +Since scores appear random, each bucket will contain +approximately the same number of entries. +Index entries are 40 bytes long. Assuming that a typical block +being written to the server is 8192 bytes and compresses to 4096 +bytes, the active index is expected to be about 1% of +the active data log. +Storing smaller blocks increases the relative index footprint; +storing larger blocks decreases it. +To allow variation in both block size and the random distribution +of scores to buckets, the suggested index size is 5% of +the active data log. +.PP +The (optional) bloom filter is a large bitmap that is stored on disk but +also kept completely in memory while the venti server runs. +It helps the venti server efficiently detect scores that are +.I not +already stored in the index. +The bloom filter starts out zeroed. +Each score recorded in the bloom filter is hashed to choose +.I nhash +bits to set in the bloom filter. +A score is definitely not stored in the index of any of its +.I nhash +bits are not set. +The bloom filter thus has two parameters: +.I nhash +(maximum 32) +and the total bitmap size +(maximum 512MB, 2\s-2\u32\d\s+2 bits). +.PP +The bloom filter should be sized so that +.I nhash +\(ti +.I nblock +\(ti +0.7 +\(<= +0.7 \(ti +.IR b , +where +.I nblock +is the expected number of blocks stored on the server +and +.I b +is the bitmap size in bits. +The false positive rate of the bloom filter when sized +this way is approximately 2\s-2\u\-\fInblock\fR\d\s+2. +.I Nhash +less than 10 are not very useful; +.I nhash +greater than 24 are probably a waste of memory. +.I Fmtbloom +(see +.IR venti-fmt (8)) +can be given either +.I nhash +or +.IR nblock ; +if given +.IR nblock , +it will derive an appropriate +.IR nhash . +.SS Memory +Venti can make effective use of large amounts of memory +for various caches. +.PP +The +.I "lump cache +holds recently-accessed venti data blocks, which the server refers to as +.IR lumps . +The lump cache should be at least 1MB but can profitably be much larger. +The lump cache can be thought of as the level-1 cache: +read requests handled by the lump cache can +be served instantly. +.PP +The +.I "block cache +holds recently-accessed +.I disk +blocks from the arena partitions. +The block cache needs to be able to simultaneously hold two blocks +from each arena plus four blocks for the currently-filling arena. +The block cache can be thought of as the level-2 cache: +read requests handled by the block cache are slower than those +handled by the lump cache, since the lump data must be extracted +from the raw disk blocks and possibly decompressed, but no +disk accesses are necessary. +.PP +The +.I "index cache +holds recently-accessed or prefetched +index entries. +The index cache needs to be able to hold index entries +for three or four arenas, at least, in order for prefetching +to work properly. Each index entry is 50 bytes. +Assuming 500MB arenas of +128,000 blocks that are 4096 bytes each after compression, +the minimum index cache size is about 6MB. +The index cache can be thought of as the level-3 cache: +read requests handled by the index cache must still go +to disk to fetch the arena blocks, but the costly random +access to the index is avoided. +.PP +The size of the index cache determines how long venti +can sustain its `burst' write throughput, during which time +the only disk accesses on the critical path +are sequential writes to the arena partitions. +For example, if you want to be able to sustain 10MB/s +for an hour, you need enough index cache to hold entries +for 36GB of blocks. Assuming 8192-byte blocks, +you need room for almost five million index entries. +Since index entries are 50 bytes each, you need 250MB +of index cache. +If the background index update process can make a single +pass through the index in an hour, which is possible, +then you can sustain the 10MB/s indefinitely (at least until +the arenas are all filled). +.PP +The +.I "bloom filter +requires memory equal to its size on disk, +as discussed above. +.PP +A reasonable starting allocation is to +divide memory equally (in thirds) between +the bloom filter, the index cache, and the lump and block caches; +the third of memory allocated to the lump and block caches +should be split unevenly, with more (say, two thirds) +going to the block cache. +.SS Network +The venti server announces two network services, one +(conventionally TCP port +.BR venti , +17034) serving +the venti protocol as described in +.IR venti (7), +and one serving HTTP +(conventionally TCP port +.BR venti , +80). +.PP +The venti web server provides the following +URLs for accessing status information: +.TP +.B /index +A summary of the usage of the arenas and index sections. +.TP +.B /xindex +An XML version of +.BR /index . +.TP +.B /storage +Brief storage totals. +.TP +.BI /set/ variable +The current integer value of +.IR variable . +Variables are: +.BR compress , +whether or not to compress blocks +(for debugging); +.BR logging , +whether to write entries to the debugging logs; +.BR stats , +whether to collect run-time statistics; +.BR icachesleeptime , +the time in milliseconds between successive updates +of megabytes of the index cache; +.BR arenasumsleeptime , +the time in milliseconds between reads while +checksumming an arena in the background. +The two sleep times should be (but are not) managed by venti; +they exist to provide more experience with their effects. +The other variables exist only for debugging and +performance measurement. +.TP +.BI /set/ variable / value +Set +.I variable +to +.IR value . +.TP +.BI /graph/ name / param / param / \fR... +A PNG image graphing the named run-time statistic over time. +The details of names and parameters are undocumented; +see +.B httpd.c +in the venti sources. +.TP +.B /log +A list of all debugging logs present in the server's memory. +.TP +.BI /log/ name +The contents of the debugging log with the given +.IR name . +.TP +.B /flushicache +Force venti to begin flushing the index cache to disk. +The request response will not be sent until the flush +has completed. +.TP +.B /flushdcache +Force venti to begin flushing the arena block cache to disk. +The request response will not be sent until the flush +has completed. +.PD +.PP +Requests for other files are served by consulting a +directory named in the configuration file +(see +.B webroot +below). +.SS Configuration File +A venti configuration file +enumerates the various index sections and +arenas that constitute a venti system. +The components are indicated by the name of the file, typically +a disk partition, in which they reside. The configuration +file is the only location that file names are used. Internally, +venti uses the names assigned when the components were formatted +with +.I fmtarenas +or +.I fmtisect +(see +.IR venti-fmt (8)). +In particular, only the configuration needs to be +changed if a component is moved to a different file. +.PP +The configuration file consists of lines in the form described below. +Lines starting with +.B # +are comments. +.TP +.BI index " name +Names the index for the system. +.TP +.BI arenas " file +.I File +is an arena partition, formatted using +.IR fmtarenas . +.TP +.BI isect " file +.I File +is an index section, formatted using +.IR fmtisect . +.PP +After formatting a venti system using +.IR fmtindex , +the order of arenas and index sections should not be changed. +Additional arenas can be appended to the configuration; +run +.I fmtindex +with the +.B -a +flag to update the index. +.PP +The configuration file also holds configuration parameters +for the venti server itself. +These are: +.TF httpaddr netaddr +.TP +.BI mem " size +lump cache size +.TP +.BI bcmem " size +block cache size +.TP +.BI icmem " size +index cache size +.TP +.BI addr " netaddr +network address to announce venti service +(default +.BR tcp!*!venti ) +.TP +.BI httpaddr " netaddr +network address to announce HTTP service +(default +.BR tcp!*!http ) +.TP +.B queuewrites +queue writes in memory +(default is not to queue) +.TP +.BI webroot " dir +directory tree containing files for HTTP server +to consult for unrecognized URLs +.PD +.PP +The units for the various cache sizes above can be specified by appending a +.LR k , +.LR m , +or +.LR g +(case-insensitive) +to indicate kilobytes, megabytes, or gigabytes respectively. +.SS Command Line +Options to +.I venti +are: +.TP +.BI -c " config +The server configuration file +(default +.BR venti.conf ) +.TP +.BI -o " line +Set a server parameter, using the same syntax +as in the configuration file. +The +.B -o +options override the configuration file. +.TP +.B -d +Produce various debugging information on standard error. +Implies +.BR -s . +.TP +.B -L +Enable logging. By default all logging is disabled. +Logging slows server operation considerably. +.TP +.B -s +Do not run in the background. +Normally, +the foreground process will exit once the Venti server +is initialized and ready for connections. +.PD +.SH EXAMPLE +A simple configuration: +.IP +.EX +% cat venti.conf +index main +isect /tmp/disks/isect0 +isect /tmp/disks/isect1 +arenas /tmp/disks/arenas +mem 10M +bcmem 20M +icmem 30M +% +.EE +.PP +Format the index sections, the arena partition, and +finally the main index: +.IP +.EX +% venti/fmtisect isect0. /tmp/disks/isect0 & +% venti/fmtisect isect1. /tmp/disks/isect1 & +% venti/fmtarenas arenas0. /tmp/disks/arenas & +% wait +% venti/fmtindex venti.conf +% +.EE +.PP +Start the server and check the storage statistics: +.IP +.EX +% venti/venti +% hget http://$sysname/storage +.EE +.SH "SEE ALSO" +.IR venti (1), +.IR venti (3), +.IR venti (7), +.IR venti-backup (8) +.IR venti-fmt (8) +.br +Sean Quinlan and Sean Dorward, +``Venti: a new approach to archival storage'', +.I "Usenix Conference on File and Storage Technologies" , +2002. +.SH BUGS +Setting up a venti server is too complicated. +.PP +Venti should not require the user to decide how to +partition its memory usage.