Blame


1 7c5190d2 2004-03-12 devnull /*
2 9ffbb5ad 2004-03-12 devnull * Index, mapping scores to log positions.
3 9ffbb5ad 2004-03-12 devnull *
4 9ffbb5ad 2004-03-12 devnull * The index is made up of some number of index sections, each of
5 9ffbb5ad 2004-03-12 devnull * which is typically stored on a different disk. The blocks in all the
6 9ffbb5ad 2004-03-12 devnull * index sections are logically numbered, with each index section
7 9ffbb5ad 2004-03-12 devnull * responsible for a range of blocks. Blocks are typically 8kB.
8 9ffbb5ad 2004-03-12 devnull *
9 9ffbb5ad 2004-03-12 devnull * Index Version 1:
10 7c5190d2 2004-03-12 devnull *
11 9ffbb5ad 2004-03-12 devnull * The N index blocks are treated as a giant hash table. The top 32 bits
12 9ffbb5ad 2004-03-12 devnull * of score are used as the key for a lookup. Each index block holds
13 9ffbb5ad 2004-03-12 devnull * one hash bucket, which is responsible for ceil(2^32 / N) of the key space.
14 7c5190d2 2004-03-12 devnull *
15 9ffbb5ad 2004-03-12 devnull * The index is sized so that a particular bucket is extraordinarily
16 9ffbb5ad 2004-03-12 devnull * unlikely to overflow: assuming compressed data blocks are 4kB
17 9ffbb5ad 2004-03-12 devnull * on disk, and assuming each block has a 40 byte index entry,
18 9ffbb5ad 2004-03-12 devnull * the index data will be 1% of the total data. Since scores are essentially
19 9ffbb5ad 2004-03-12 devnull * random, all buckets should be about the same fullness.
20 9ffbb5ad 2004-03-12 devnull * A factor of 5 gives us a wide comfort boundary to account for
21 9ffbb5ad 2004-03-12 devnull * random variation. So the index disk space should be 5% of the arena disk space.
22 9ffbb5ad 2004-03-12 devnull *
23 9ffbb5ad 2004-03-12 devnull * Problems with Index Version 1:
24 7c5190d2 2004-03-12 devnull *
25 9ffbb5ad 2004-03-12 devnull * Because the index size is chosen to handle the worst case load,
26 9ffbb5ad 2004-03-12 devnull * the index is very sparse, especially when the Venti server is mostly empty.
27 9ffbb5ad 2004-03-12 devnull * This has a few bad properties.
28 9ffbb5ad 2004-03-12 devnull *
29 9ffbb5ad 2004-03-12 devnull * Loading an index block (which typically requires a random disk seek)
30 9ffbb5ad 2004-03-12 devnull * is a very expensive operation, yet it yields only a couple index entries.
31 9ffbb5ad 2004-03-12 devnull * We're not making efficient use of the disk arm.
32 7c5190d2 2004-03-12 devnull *
33 9ffbb5ad 2004-03-12 devnull * Writing a block requires first checking to see if the block already
34 9ffbb5ad 2004-03-12 devnull * exists on the server, which in turn requires an index lookup. When
35 9ffbb5ad 2004-03-12 devnull * writing fresh data, these lookups will fail. The index entry cache
36 9ffbb5ad 2004-03-12 devnull * cannot serve these, so they must go to disk, which is expensive.
37 9ffbb5ad 2004-03-12 devnull *
38 9ffbb5ad 2004-03-12 devnull * Thus both the reading and the writing of blocks are held back by
39 9ffbb5ad 2004-03-12 devnull * the expense of accessing the index.
40 9ffbb5ad 2004-03-12 devnull *
41 9ffbb5ad 2004-03-12 devnull * Index Version 2:
42 9ffbb5ad 2004-03-12 devnull *
43 9ffbb5ad 2004-03-12 devnull * The index is sized to be exactly 2^M blocks. The index blocks are
44 9ffbb5ad 2004-03-12 devnull * logically arranged in a (not exactly balanced) binary tree of depth at
45 9ffbb5ad 2004-03-12 devnull * most M. The nodes are named by bit strings describing the path from
46 9ffbb5ad 2004-03-12 devnull * the root to the node. The root is . (dot). The left child of the root is .0,
47 9ffbb5ad 2004-03-12 devnull * the right child .1. The node you get to by starting at the root and going
48 9ffbb5ad 2004-03-12 devnull * left right right left is .0110. At the beginning, there is only the root block.
49 9ffbb5ad 2004-03-12 devnull * When a block with name .xxx fills, it splits into .xxx0 and .xxx1.
50 9ffbb5ad 2004-03-12 devnull * All the index data is kept in the leaves of the tree.
51 7c5190d2 2004-03-12 devnull *
52 9ffbb5ad 2004-03-12 devnull * Index leaf blocks are laid out on disk by interpreting the bitstring as a
53 9ffbb5ad 2004-03-12 devnull * binary fraction and multiplying by 2^M -- .0 is the first block, .1 is
54 9ffbb5ad 2004-03-12 devnull * the block halfway into the index, .0110 is at position 6/16, and
55 9ffbb5ad 2004-03-12 devnull * .xxx and .xxx0 map to the same block (but only one can be a leaf
56 9ffbb5ad 2004-03-12 devnull * node at any given time, so this is okay!). A cheap implementation of
57 9ffbb5ad 2004-03-12 devnull * this is to append zeros to the bit string to make it M bits long. That's
58 9ffbb5ad 2004-03-12 devnull * the integer index block number.
59 7c5190d2 2004-03-12 devnull *
60 9ffbb5ad 2004-03-12 devnull * To find the leaf block that should hold a score, use the bits of the
61 9ffbb5ad 2004-03-12 devnull * score one at a time to walk down the tree to a leaf node. If the tree
62 9ffbb5ad 2004-03-12 devnull * has leaf nodes .0, .10, and .11, then score 0110101... ends up in bucket
63 9ffbb5ad 2004-03-12 devnull * .0 while score 101110101... ends up in bucket .10. There is no leaf node
64 9ffbb5ad 2004-03-12 devnull * .1 because it has split into .10 and .11.
65 9ffbb5ad 2004-03-12 devnull *
66 9ffbb5ad 2004-03-12 devnull * If we know which disk blocks are in use, we can reconstruct the interior
67 9ffbb5ad 2004-03-12 devnull * of the tree: if .xxx1 is in use, then .xxx has been split. We keep an in-use
68 9ffbb5ad 2004-03-12 devnull * bitmap of all index disk blocks to aid in reconstructing the interior of the
69 9ffbb5ad 2004-03-12 devnull * tree. At one bit per index block, the bitmap is small enough to be kept
70 9ffbb5ad 2004-03-12 devnull * in memory even on the largest of Venti servers.
71 9ffbb5ad 2004-03-12 devnull *
72 9ffbb5ad 2004-03-12 devnull * After the root block splits, the index blocks being used will always be
73 9ffbb5ad 2004-03-12 devnull * at least half full (averaged across the entire index). So unlike Version 1,
74 9ffbb5ad 2004-03-12 devnull * Index Version 2 is quite dense, alleviating the two problems above.
75 9ffbb5ad 2004-03-12 devnull * Index block reads now return many index entries. More importantly,
76 9ffbb5ad 2004-03-12 devnull * small servers can keep most of the index in the disk cache, making them
77 9ffbb5ad 2004-03-12 devnull * more likely to handle negative lookups without going to disk.
78 9ffbb5ad 2004-03-12 devnull *
79 9ffbb5ad 2004-03-12 devnull * As the index becomes more full, Index Version 2's performance
80 9ffbb5ad 2004-03-12 devnull * degrades gracefully into Index Version 1. V2 is really an optimization
81 9ffbb5ad 2004-03-12 devnull * for little servers.
82 9ffbb5ad 2004-03-12 devnull *
83 9ffbb5ad 2004-03-12 devnull * Write Ordering for Index Version 2:
84 9ffbb5ad 2004-03-12 devnull *
85 9ffbb5ad 2004-03-12 devnull * Unlike Index Version 1, Version 2 must worry about write ordering
86 9ffbb5ad 2004-03-12 devnull * within the index. What happens if the in-use bitmap is out of sync
87 9ffbb5ad 2004-03-12 devnull * with the actual leaf nodes? What happens if .xxx splits into .xxx0 and
88 9ffbb5ad 2004-03-12 devnull * .xxx1 but only one of the new blocks gets written to disk?
89 9ffbb5ad 2004-03-12 devnull *
90 9ffbb5ad 2004-03-12 devnull * We arrange that when .xxx splits, the .xxx1 block is written first,
91 9ffbb5ad 2004-03-12 devnull * then the .xxx0 block, and finally the in-use bitmap entry for .xxx1.
92 9ffbb5ad 2004-03-12 devnull * The split is committed by the writing of .xxx0. This ordering leaves
93 9ffbb5ad 2004-03-12 devnull * two possible partial disk writes:
94 9ffbb5ad 2004-03-12 devnull *
95 9ffbb5ad 2004-03-12 devnull * (a) If .xxx1 is written but .xxx0 and the bitmap are not, then it's as if
96 9ffbb5ad 2004-03-12 devnull * the split never happened -- we won't think .xxx1 is in use, and we
97 9ffbb5ad 2004-03-12 devnull * won't go looking for it.
98 9ffbb5ad 2004-03-12 devnull *
99 9ffbb5ad 2004-03-12 devnull * (b) If .xxx1 and .xxx0 are written but the bitmap is not, then the first
100 9ffbb5ad 2004-03-12 devnull * time we try to load .xxx, we'll get .xxx0 instead, realize the bitmap is
101 9ffbb5ad 2004-03-12 devnull * out of date, and update the bitmap.
102 9ffbb5ad 2004-03-12 devnull *
103 9ffbb5ad 2004-03-12 devnull * Backwards Compatibility
104 9ffbb5ad 2004-03-12 devnull *
105 9ffbb5ad 2004-03-12 devnull * Because there are large Venti servers in use with Index V1, this code
106 9ffbb5ad 2004-03-12 devnull * will read either index version, but when asked to create a new index,
107 9ffbb5ad 2004-03-12 devnull * will only create V2.
108 7c5190d2 2004-03-12 devnull */
109 7c5190d2 2004-03-12 devnull
110 7a4ee46d 2003-11-23 devnull #include "stdinc.h"
111 7a4ee46d 2003-11-23 devnull #include "dat.h"
112 7a4ee46d 2003-11-23 devnull #include "fns.h"
113 7a4ee46d 2003-11-23 devnull
114 7a4ee46d 2003-11-23 devnull static int bucklook(u8int *score, int type, u8int *data, int n);
115 7a4ee46d 2003-11-23 devnull static int writebucket(ISect *is, u32int buck, IBucket *ib, DBlock *b);
116 7a4ee46d 2003-11-23 devnull static int okibucket(IBucket *ib, ISect *is);
117 9ffbb5ad 2004-03-12 devnull static int initindex1(Index*);
118 7a4ee46d 2003-11-23 devnull static ISect *initisect1(ISect *is);
119 333c1dcc 2004-03-13 devnull static int splitiblock(Index *ix, DBlock *b, ISect *is, u32int buck, IBucket *ib);
120 333c1dcc 2004-03-13 devnull
121 333c1dcc 2004-03-13 devnull #define KEY(k,d) ((d) ? (k)>>(32-(d)) : 0)
122 7a4ee46d 2003-11-23 devnull
123 7a4ee46d 2003-11-23 devnull //static QLock indexlock; //ZZZ
124 7a4ee46d 2003-11-23 devnull
125 7a4ee46d 2003-11-23 devnull static char IndexMagic[] = "venti index configuration";
126 7a4ee46d 2003-11-23 devnull
127 7a4ee46d 2003-11-23 devnull Index*
128 7a4ee46d 2003-11-23 devnull initindex(char *name, ISect **sects, int n)
129 7a4ee46d 2003-11-23 devnull {
130 7a4ee46d 2003-11-23 devnull IFile f;
131 7a4ee46d 2003-11-23 devnull Index *ix;
132 7a4ee46d 2003-11-23 devnull ISect *is;
133 7a4ee46d 2003-11-23 devnull u32int last, blocksize, tabsize;
134 9ffbb5ad 2004-03-12 devnull int i;
135 7a4ee46d 2003-11-23 devnull
136 7a4ee46d 2003-11-23 devnull if(n <= 0){
137 7a4ee46d 2003-11-23 devnull seterr(EOk, "no index sections to initialize index");
138 7a4ee46d 2003-11-23 devnull return nil;
139 7a4ee46d 2003-11-23 devnull }
140 7a4ee46d 2003-11-23 devnull ix = MKZ(Index);
141 7a4ee46d 2003-11-23 devnull if(ix == nil){
142 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't initialize index: out of memory");
143 7a4ee46d 2003-11-23 devnull freeindex(ix);
144 7a4ee46d 2003-11-23 devnull return nil;
145 7a4ee46d 2003-11-23 devnull }
146 7a4ee46d 2003-11-23 devnull
147 7a4ee46d 2003-11-23 devnull tabsize = sects[0]->tabsize;
148 7a4ee46d 2003-11-23 devnull if(partifile(&f, sects[0]->part, sects[0]->tabbase, tabsize) < 0)
149 7a4ee46d 2003-11-23 devnull return nil;
150 7a4ee46d 2003-11-23 devnull if(parseindex(&f, ix) < 0){
151 7a4ee46d 2003-11-23 devnull freeifile(&f);
152 7a4ee46d 2003-11-23 devnull freeindex(ix);
153 7a4ee46d 2003-11-23 devnull return nil;
154 7a4ee46d 2003-11-23 devnull }
155 7a4ee46d 2003-11-23 devnull freeifile(&f);
156 7a4ee46d 2003-11-23 devnull if(namecmp(ix->name, name) != 0){
157 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "mismatched index name: found %s expected %s", ix->name, name);
158 7a4ee46d 2003-11-23 devnull return nil;
159 7a4ee46d 2003-11-23 devnull }
160 7a4ee46d 2003-11-23 devnull if(ix->nsects != n){
161 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "mismatched number index sections: found %d expected %d", n, ix->nsects);
162 7a4ee46d 2003-11-23 devnull freeindex(ix);
163 7a4ee46d 2003-11-23 devnull return nil;
164 7a4ee46d 2003-11-23 devnull }
165 7a4ee46d 2003-11-23 devnull ix->sects = sects;
166 7a4ee46d 2003-11-23 devnull last = 0;
167 7a4ee46d 2003-11-23 devnull blocksize = ix->blocksize;
168 7a4ee46d 2003-11-23 devnull for(i = 0; i < ix->nsects; i++){
169 7a4ee46d 2003-11-23 devnull is = sects[i];
170 7a4ee46d 2003-11-23 devnull if(namecmp(ix->name, is->index) != 0
171 7a4ee46d 2003-11-23 devnull || is->blocksize != blocksize
172 7a4ee46d 2003-11-23 devnull || is->tabsize != tabsize
173 7a4ee46d 2003-11-23 devnull || namecmp(is->name, ix->smap[i].name) != 0
174 7a4ee46d 2003-11-23 devnull || is->start != ix->smap[i].start
175 7a4ee46d 2003-11-23 devnull || is->stop != ix->smap[i].stop
176 7a4ee46d 2003-11-23 devnull || last != is->start
177 7a4ee46d 2003-11-23 devnull || is->start > is->stop){
178 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "inconsistent index sections in %s", ix->name);
179 7a4ee46d 2003-11-23 devnull freeindex(ix);
180 7a4ee46d 2003-11-23 devnull return nil;
181 7a4ee46d 2003-11-23 devnull }
182 7a4ee46d 2003-11-23 devnull last = is->stop;
183 7a4ee46d 2003-11-23 devnull }
184 7a4ee46d 2003-11-23 devnull ix->tabsize = tabsize;
185 7a4ee46d 2003-11-23 devnull ix->buckets = last;
186 7a4ee46d 2003-11-23 devnull
187 9ffbb5ad 2004-03-12 devnull if(initindex1(ix) < 0){
188 7a4ee46d 2003-11-23 devnull freeindex(ix);
189 7a4ee46d 2003-11-23 devnull return nil;
190 7a4ee46d 2003-11-23 devnull }
191 7a4ee46d 2003-11-23 devnull
192 7a4ee46d 2003-11-23 devnull ix->arenas = MKNZ(Arena*, ix->narenas);
193 7a4ee46d 2003-11-23 devnull if(maparenas(ix->amap, ix->arenas, ix->narenas, ix->name) < 0){
194 7a4ee46d 2003-11-23 devnull freeindex(ix);
195 7a4ee46d 2003-11-23 devnull return nil;
196 7a4ee46d 2003-11-23 devnull }
197 9ffbb5ad 2004-03-12 devnull
198 7a4ee46d 2003-11-23 devnull return ix;
199 7a4ee46d 2003-11-23 devnull }
200 7a4ee46d 2003-11-23 devnull
201 9ffbb5ad 2004-03-12 devnull static int
202 9ffbb5ad 2004-03-12 devnull initindex1(Index *ix)
203 9ffbb5ad 2004-03-12 devnull {
204 9ffbb5ad 2004-03-12 devnull u32int buckets;
205 9ffbb5ad 2004-03-12 devnull
206 9ffbb5ad 2004-03-12 devnull switch(ix->version){
207 9ffbb5ad 2004-03-12 devnull case IndexVersion1:
208 9ffbb5ad 2004-03-12 devnull ix->div = (((u64int)1 << 32) + ix->buckets - 1) / ix->buckets;
209 9ffbb5ad 2004-03-12 devnull buckets = (((u64int)1 << 32) - 1) / ix->div + 1;
210 9ffbb5ad 2004-03-12 devnull if(buckets != ix->buckets){
211 9ffbb5ad 2004-03-12 devnull seterr(ECorrupt, "inconsistent math for divisor and buckets in %s", ix->name);
212 9ffbb5ad 2004-03-12 devnull return -1;
213 9ffbb5ad 2004-03-12 devnull }
214 9ffbb5ad 2004-03-12 devnull break;
215 9ffbb5ad 2004-03-12 devnull
216 9ffbb5ad 2004-03-12 devnull case IndexVersion2:
217 9ffbb5ad 2004-03-12 devnull buckets = ix->buckets - ix->bitblocks;
218 9ffbb5ad 2004-03-12 devnull if(ix->buckets < ix->bitblocks || (buckets&(buckets-1)))
219 9ffbb5ad 2004-03-12 devnull seterr(ECorrupt, "bucket count not a power of two in %s", ix->name);
220 9ffbb5ad 2004-03-12 devnull ix->maxdepth = u64log2(buckets);
221 9ffbb5ad 2004-03-12 devnull ix->bitkeylog = u64log2(ix->blocksize*8);
222 9ffbb5ad 2004-03-12 devnull ix->bitkeymask = (1<<ix->bitkeylog)-1;
223 9ffbb5ad 2004-03-12 devnull break;
224 9ffbb5ad 2004-03-12 devnull }
225 9ffbb5ad 2004-03-12 devnull return 0;
226 9ffbb5ad 2004-03-12 devnull }
227 9ffbb5ad 2004-03-12 devnull
228 7a4ee46d 2003-11-23 devnull int
229 7a4ee46d 2003-11-23 devnull wbindex(Index *ix)
230 7a4ee46d 2003-11-23 devnull {
231 7a4ee46d 2003-11-23 devnull Fmt f;
232 7a4ee46d 2003-11-23 devnull ZBlock *b;
233 7a4ee46d 2003-11-23 devnull int i;
234 7a4ee46d 2003-11-23 devnull
235 7a4ee46d 2003-11-23 devnull if(ix->nsects == 0){
236 7a4ee46d 2003-11-23 devnull seterr(EOk, "no sections in index %s", ix->name);
237 7a4ee46d 2003-11-23 devnull return -1;
238 7a4ee46d 2003-11-23 devnull }
239 7a4ee46d 2003-11-23 devnull b = alloczblock(ix->tabsize, 1);
240 7a4ee46d 2003-11-23 devnull if(b == nil){
241 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't write index configuration: out of memory");
242 7a4ee46d 2003-11-23 devnull return -1;
243 7a4ee46d 2003-11-23 devnull }
244 7a4ee46d 2003-11-23 devnull fmtzbinit(&f, b);
245 7a4ee46d 2003-11-23 devnull if(outputindex(&f, ix) < 0){
246 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't make index configuration: table storage too small %d", ix->tabsize);
247 7a4ee46d 2003-11-23 devnull freezblock(b);
248 7a4ee46d 2003-11-23 devnull return -1;
249 7a4ee46d 2003-11-23 devnull }
250 7a4ee46d 2003-11-23 devnull for(i = 0; i < ix->nsects; i++){
251 7a4ee46d 2003-11-23 devnull if(writepart(ix->sects[i]->part, ix->sects[i]->tabbase, b->data, ix->tabsize) < 0){
252 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't write index: %r");
253 7a4ee46d 2003-11-23 devnull freezblock(b);
254 7a4ee46d 2003-11-23 devnull return -1;
255 7a4ee46d 2003-11-23 devnull }
256 7a4ee46d 2003-11-23 devnull }
257 7a4ee46d 2003-11-23 devnull freezblock(b);
258 7a4ee46d 2003-11-23 devnull
259 7a4ee46d 2003-11-23 devnull for(i = 0; i < ix->nsects; i++)
260 7a4ee46d 2003-11-23 devnull if(wbisect(ix->sects[i]) < 0)
261 7a4ee46d 2003-11-23 devnull return -1;
262 7a4ee46d 2003-11-23 devnull
263 7a4ee46d 2003-11-23 devnull return 0;
264 7a4ee46d 2003-11-23 devnull }
265 7a4ee46d 2003-11-23 devnull
266 7a4ee46d 2003-11-23 devnull /*
267 9ffbb5ad 2004-03-12 devnull * index: IndexMagic '\n' version '\n' name '\n' blocksize '\n' [V2: bitblocks '\n'] sections arenas
268 7a4ee46d 2003-11-23 devnull * version, blocksize: u32int
269 7a4ee46d 2003-11-23 devnull * name: max. ANameSize string
270 7a4ee46d 2003-11-23 devnull * sections, arenas: AMap
271 7a4ee46d 2003-11-23 devnull */
272 7a4ee46d 2003-11-23 devnull int
273 7a4ee46d 2003-11-23 devnull outputindex(Fmt *f, Index *ix)
274 7a4ee46d 2003-11-23 devnull {
275 7a4ee46d 2003-11-23 devnull if(fmtprint(f, "%s\n%ud\n%s\n%ud\n", IndexMagic, ix->version, ix->name, ix->blocksize) < 0
276 9ffbb5ad 2004-03-12 devnull || (ix->version==IndexVersion2 && fmtprint(f, "%ud\n", ix->bitblocks) < 0)
277 7a4ee46d 2003-11-23 devnull || outputamap(f, ix->smap, ix->nsects) < 0
278 7a4ee46d 2003-11-23 devnull || outputamap(f, ix->amap, ix->narenas) < 0)
279 7a4ee46d 2003-11-23 devnull return -1;
280 7a4ee46d 2003-11-23 devnull return 0;
281 7a4ee46d 2003-11-23 devnull }
282 7a4ee46d 2003-11-23 devnull
283 7a4ee46d 2003-11-23 devnull int
284 7a4ee46d 2003-11-23 devnull parseindex(IFile *f, Index *ix)
285 7a4ee46d 2003-11-23 devnull {
286 7a4ee46d 2003-11-23 devnull AMapN amn;
287 7a4ee46d 2003-11-23 devnull u32int v;
288 7a4ee46d 2003-11-23 devnull char *s;
289 7a4ee46d 2003-11-23 devnull
290 7a4ee46d 2003-11-23 devnull /*
291 7a4ee46d 2003-11-23 devnull * magic
292 7a4ee46d 2003-11-23 devnull */
293 7a4ee46d 2003-11-23 devnull s = ifileline(f);
294 7a4ee46d 2003-11-23 devnull if(s == nil || strcmp(s, IndexMagic) != 0){
295 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "bad index magic for %s", f->name);
296 7a4ee46d 2003-11-23 devnull return -1;
297 7a4ee46d 2003-11-23 devnull }
298 7a4ee46d 2003-11-23 devnull
299 7a4ee46d 2003-11-23 devnull /*
300 7a4ee46d 2003-11-23 devnull * version
301 7a4ee46d 2003-11-23 devnull */
302 7a4ee46d 2003-11-23 devnull if(ifileu32int(f, &v) < 0){
303 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "syntax error: bad version number in %s", f->name);
304 7a4ee46d 2003-11-23 devnull return -1;
305 7a4ee46d 2003-11-23 devnull }
306 7a4ee46d 2003-11-23 devnull ix->version = v;
307 9ffbb5ad 2004-03-12 devnull if(ix->version != IndexVersion1 && ix->version != IndexVersion2){
308 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "bad version number in %s", f->name);
309 7a4ee46d 2003-11-23 devnull return -1;
310 7a4ee46d 2003-11-23 devnull }
311 7a4ee46d 2003-11-23 devnull
312 7a4ee46d 2003-11-23 devnull /*
313 7a4ee46d 2003-11-23 devnull * name
314 7a4ee46d 2003-11-23 devnull */
315 7a4ee46d 2003-11-23 devnull if(ifilename(f, ix->name) < 0){
316 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "syntax error: bad index name in %s", f->name);
317 7a4ee46d 2003-11-23 devnull return -1;
318 7a4ee46d 2003-11-23 devnull }
319 7a4ee46d 2003-11-23 devnull
320 7a4ee46d 2003-11-23 devnull /*
321 7a4ee46d 2003-11-23 devnull * block size
322 7a4ee46d 2003-11-23 devnull */
323 7a4ee46d 2003-11-23 devnull if(ifileu32int(f, &v) < 0){
324 9ffbb5ad 2004-03-12 devnull seterr(ECorrupt, "syntax error: bad block size number in %s", f->name);
325 7a4ee46d 2003-11-23 devnull return -1;
326 7a4ee46d 2003-11-23 devnull }
327 7a4ee46d 2003-11-23 devnull ix->blocksize = v;
328 7a4ee46d 2003-11-23 devnull
329 9ffbb5ad 2004-03-12 devnull if(ix->version == IndexVersion2){
330 9ffbb5ad 2004-03-12 devnull /*
331 9ffbb5ad 2004-03-12 devnull * free bitmap size
332 9ffbb5ad 2004-03-12 devnull */
333 9ffbb5ad 2004-03-12 devnull if(ifileu32int(f, &v) < 0){
334 9ffbb5ad 2004-03-12 devnull seterr(ECorrupt, "syntax error: bad bitmap size in %s", f->name);
335 9ffbb5ad 2004-03-12 devnull return -1;
336 9ffbb5ad 2004-03-12 devnull }
337 9ffbb5ad 2004-03-12 devnull ix->bitblocks = v;
338 9ffbb5ad 2004-03-12 devnull }
339 9ffbb5ad 2004-03-12 devnull
340 7a4ee46d 2003-11-23 devnull if(parseamap(f, &amn) < 0)
341 7a4ee46d 2003-11-23 devnull return -1;
342 7a4ee46d 2003-11-23 devnull ix->nsects = amn.n;
343 7a4ee46d 2003-11-23 devnull ix->smap = amn.map;
344 7a4ee46d 2003-11-23 devnull
345 7a4ee46d 2003-11-23 devnull if(parseamap(f, &amn) < 0)
346 7a4ee46d 2003-11-23 devnull return -1;
347 7a4ee46d 2003-11-23 devnull ix->narenas = amn.n;
348 7a4ee46d 2003-11-23 devnull ix->amap = amn.map;
349 7a4ee46d 2003-11-23 devnull
350 7a4ee46d 2003-11-23 devnull return 0;
351 7a4ee46d 2003-11-23 devnull }
352 7a4ee46d 2003-11-23 devnull
353 7a4ee46d 2003-11-23 devnull /*
354 7a4ee46d 2003-11-23 devnull * initialize an entirely new index
355 7a4ee46d 2003-11-23 devnull */
356 7a4ee46d 2003-11-23 devnull Index *
357 7a4ee46d 2003-11-23 devnull newindex(char *name, ISect **sects, int n)
358 7a4ee46d 2003-11-23 devnull {
359 7a4ee46d 2003-11-23 devnull Index *ix;
360 7a4ee46d 2003-11-23 devnull AMap *smap;
361 7a4ee46d 2003-11-23 devnull u64int nb;
362 9ffbb5ad 2004-03-12 devnull u32int div, ub, xb, fb, start, stop, blocksize, tabsize;
363 9ffbb5ad 2004-03-12 devnull int i, j, version;
364 7a4ee46d 2003-11-23 devnull
365 9ffbb5ad 2004-03-12 devnull version = IndexVersion2;
366 9ffbb5ad 2004-03-12 devnull
367 7a4ee46d 2003-11-23 devnull if(n < 1){
368 7a4ee46d 2003-11-23 devnull seterr(EOk, "creating index with no index sections");
369 7a4ee46d 2003-11-23 devnull return nil;
370 7a4ee46d 2003-11-23 devnull }
371 7a4ee46d 2003-11-23 devnull
372 7a4ee46d 2003-11-23 devnull /*
373 7a4ee46d 2003-11-23 devnull * compute the total buckets available in the index,
374 7a4ee46d 2003-11-23 devnull * and the total buckets which are used.
375 7a4ee46d 2003-11-23 devnull */
376 7a4ee46d 2003-11-23 devnull nb = 0;
377 7a4ee46d 2003-11-23 devnull blocksize = sects[0]->blocksize;
378 7a4ee46d 2003-11-23 devnull tabsize = sects[0]->tabsize;
379 7a4ee46d 2003-11-23 devnull for(i = 0; i < n; i++){
380 7a4ee46d 2003-11-23 devnull if(sects[i]->start != 0 || sects[i]->stop != 0
381 7a4ee46d 2003-11-23 devnull || sects[i]->index[0] != '\0'){
382 7a4ee46d 2003-11-23 devnull seterr(EOk, "creating new index using non-empty section %s", sects[i]->name);
383 7a4ee46d 2003-11-23 devnull return nil;
384 7a4ee46d 2003-11-23 devnull }
385 7a4ee46d 2003-11-23 devnull if(blocksize != sects[i]->blocksize){
386 7a4ee46d 2003-11-23 devnull seterr(EOk, "mismatched block sizes in index sections");
387 7a4ee46d 2003-11-23 devnull return nil;
388 7a4ee46d 2003-11-23 devnull }
389 7a4ee46d 2003-11-23 devnull if(tabsize != sects[i]->tabsize){
390 7a4ee46d 2003-11-23 devnull seterr(EOk, "mismatched config table sizes in index sections");
391 7a4ee46d 2003-11-23 devnull return nil;
392 7a4ee46d 2003-11-23 devnull }
393 7a4ee46d 2003-11-23 devnull nb += sects[i]->blocks;
394 7a4ee46d 2003-11-23 devnull }
395 7a4ee46d 2003-11-23 devnull
396 7a4ee46d 2003-11-23 devnull /*
397 7a4ee46d 2003-11-23 devnull * check for duplicate names
398 7a4ee46d 2003-11-23 devnull */
399 7a4ee46d 2003-11-23 devnull for(i = 0; i < n; i++){
400 7a4ee46d 2003-11-23 devnull for(j = i + 1; j < n; j++){
401 7a4ee46d 2003-11-23 devnull if(namecmp(sects[i]->name, sects[j]->name) == 0){
402 7a4ee46d 2003-11-23 devnull seterr(EOk, "duplicate section name %s for index %s", sects[i]->name, name);
403 7a4ee46d 2003-11-23 devnull return nil;
404 7a4ee46d 2003-11-23 devnull }
405 7a4ee46d 2003-11-23 devnull }
406 7a4ee46d 2003-11-23 devnull }
407 7a4ee46d 2003-11-23 devnull
408 7a4ee46d 2003-11-23 devnull if(nb >= ((u64int)1 << 32)){
409 7a4ee46d 2003-11-23 devnull seterr(EBug, "index too large");
410 7a4ee46d 2003-11-23 devnull return nil;
411 7a4ee46d 2003-11-23 devnull }
412 9ffbb5ad 2004-03-12 devnull
413 9ffbb5ad 2004-03-12 devnull div = 0;
414 9ffbb5ad 2004-03-12 devnull fb = 0;
415 9ffbb5ad 2004-03-12 devnull if(version == IndexVersion1){
416 9ffbb5ad 2004-03-12 devnull div = (((u64int)1 << 32) + nb - 1) / nb;
417 9ffbb5ad 2004-03-12 devnull ub = (((u64int)1 << 32) - 1) / div + 1;
418 9ffbb5ad 2004-03-12 devnull if(div < 100){
419 9ffbb5ad 2004-03-12 devnull seterr(EBug, "index divisor too coarse");
420 9ffbb5ad 2004-03-12 devnull return nil;
421 9ffbb5ad 2004-03-12 devnull }
422 9ffbb5ad 2004-03-12 devnull }else{
423 9ffbb5ad 2004-03-12 devnull fb = (nb+blocksize*8-1)/(blocksize*8);
424 9ffbb5ad 2004-03-12 devnull for(ub=1; ub<=((nb-fb)>>1); ub<<=1)
425 9ffbb5ad 2004-03-12 devnull ;
426 9ffbb5ad 2004-03-12 devnull ub += fb;
427 7a4ee46d 2003-11-23 devnull }
428 7a4ee46d 2003-11-23 devnull if(ub > nb){
429 7a4ee46d 2003-11-23 devnull seterr(EBug, "index initialization math wrong");
430 7a4ee46d 2003-11-23 devnull return nil;
431 7a4ee46d 2003-11-23 devnull }
432 9ffbb5ad 2004-03-12 devnull xb = nb - ub;
433 7a4ee46d 2003-11-23 devnull
434 7a4ee46d 2003-11-23 devnull /*
435 7a4ee46d 2003-11-23 devnull * initialize each of the index sections
436 7a4ee46d 2003-11-23 devnull * and the section map table
437 7a4ee46d 2003-11-23 devnull */
438 7a4ee46d 2003-11-23 devnull smap = MKNZ(AMap, n);
439 7a4ee46d 2003-11-23 devnull if(smap == nil){
440 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't create new index: out of memory");
441 7a4ee46d 2003-11-23 devnull return nil;
442 7a4ee46d 2003-11-23 devnull }
443 7a4ee46d 2003-11-23 devnull start = 0;
444 7a4ee46d 2003-11-23 devnull for(i = 0; i < n; i++){
445 7a4ee46d 2003-11-23 devnull stop = start + sects[i]->blocks - xb / n;
446 7a4ee46d 2003-11-23 devnull if(i == n - 1)
447 7a4ee46d 2003-11-23 devnull stop = ub;
448 7a4ee46d 2003-11-23 devnull sects[i]->start = start;
449 7a4ee46d 2003-11-23 devnull sects[i]->stop = stop;
450 7a4ee46d 2003-11-23 devnull namecp(sects[i]->index, name);
451 7a4ee46d 2003-11-23 devnull
452 7a4ee46d 2003-11-23 devnull smap[i].start = start;
453 7a4ee46d 2003-11-23 devnull smap[i].stop = stop;
454 7a4ee46d 2003-11-23 devnull namecp(smap[i].name, sects[i]->name);
455 7a4ee46d 2003-11-23 devnull start = stop;
456 7a4ee46d 2003-11-23 devnull }
457 7a4ee46d 2003-11-23 devnull
458 7a4ee46d 2003-11-23 devnull /*
459 7a4ee46d 2003-11-23 devnull * initialize the index itself
460 7a4ee46d 2003-11-23 devnull */
461 7a4ee46d 2003-11-23 devnull ix = MKZ(Index);
462 7a4ee46d 2003-11-23 devnull if(ix == nil){
463 7a4ee46d 2003-11-23 devnull seterr(EOk, "can't create new index: out of memory");
464 7a4ee46d 2003-11-23 devnull free(smap);
465 7a4ee46d 2003-11-23 devnull return nil;
466 7a4ee46d 2003-11-23 devnull }
467 9ffbb5ad 2004-03-12 devnull ix->version = version;
468 7a4ee46d 2003-11-23 devnull namecp(ix->name, name);
469 7a4ee46d 2003-11-23 devnull ix->sects = sects;
470 7a4ee46d 2003-11-23 devnull ix->smap = smap;
471 7a4ee46d 2003-11-23 devnull ix->nsects = n;
472 7a4ee46d 2003-11-23 devnull ix->blocksize = blocksize;
473 7a4ee46d 2003-11-23 devnull ix->buckets = ub;
474 7a4ee46d 2003-11-23 devnull ix->tabsize = tabsize;
475 9ffbb5ad 2004-03-12 devnull ix->div = div;
476 9ffbb5ad 2004-03-12 devnull ix->bitblocks = fb;
477 9ffbb5ad 2004-03-12 devnull
478 9ffbb5ad 2004-03-12 devnull if(initindex1(ix) < 0){
479 9ffbb5ad 2004-03-12 devnull free(smap);
480 9ffbb5ad 2004-03-12 devnull return nil;
481 9ffbb5ad 2004-03-12 devnull }
482 9ffbb5ad 2004-03-12 devnull
483 7a4ee46d 2003-11-23 devnull return ix;
484 7a4ee46d 2003-11-23 devnull }
485 7a4ee46d 2003-11-23 devnull
486 7a4ee46d 2003-11-23 devnull ISect*
487 7a4ee46d 2003-11-23 devnull initisect(Part *part)
488 7a4ee46d 2003-11-23 devnull {
489 7a4ee46d 2003-11-23 devnull ISect *is;
490 7a4ee46d 2003-11-23 devnull ZBlock *b;
491 7a4ee46d 2003-11-23 devnull int ok;
492 7a4ee46d 2003-11-23 devnull
493 7a4ee46d 2003-11-23 devnull b = alloczblock(HeadSize, 0);
494 7a4ee46d 2003-11-23 devnull if(b == nil || readpart(part, PartBlank, b->data, HeadSize) < 0){
495 7a4ee46d 2003-11-23 devnull seterr(EAdmin, "can't read index section header: %r");
496 7a4ee46d 2003-11-23 devnull return nil;
497 7a4ee46d 2003-11-23 devnull }
498 7a4ee46d 2003-11-23 devnull
499 7a4ee46d 2003-11-23 devnull is = MKZ(ISect);
500 7a4ee46d 2003-11-23 devnull if(is == nil){
501 7a4ee46d 2003-11-23 devnull freezblock(b);
502 7a4ee46d 2003-11-23 devnull return nil;
503 7a4ee46d 2003-11-23 devnull }
504 7a4ee46d 2003-11-23 devnull is->part = part;
505 7a4ee46d 2003-11-23 devnull ok = unpackisect(is, b->data);
506 7a4ee46d 2003-11-23 devnull freezblock(b);
507 7a4ee46d 2003-11-23 devnull if(ok < 0){
508 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "corrupted index section header: %r");
509 7a4ee46d 2003-11-23 devnull freeisect(is);
510 7a4ee46d 2003-11-23 devnull return nil;
511 7a4ee46d 2003-11-23 devnull }
512 7a4ee46d 2003-11-23 devnull
513 7a4ee46d 2003-11-23 devnull if(is->version != ISectVersion){
514 7a4ee46d 2003-11-23 devnull seterr(EAdmin, "unknown index section version %d", is->version);
515 7a4ee46d 2003-11-23 devnull freeisect(is);
516 7a4ee46d 2003-11-23 devnull return nil;
517 7a4ee46d 2003-11-23 devnull }
518 7a4ee46d 2003-11-23 devnull
519 7a4ee46d 2003-11-23 devnull return initisect1(is);
520 7a4ee46d 2003-11-23 devnull }
521 7a4ee46d 2003-11-23 devnull
522 7a4ee46d 2003-11-23 devnull ISect*
523 7a4ee46d 2003-11-23 devnull newisect(Part *part, char *name, u32int blocksize, u32int tabsize)
524 7a4ee46d 2003-11-23 devnull {
525 7a4ee46d 2003-11-23 devnull ISect *is;
526 7a4ee46d 2003-11-23 devnull u32int tabbase;
527 7a4ee46d 2003-11-23 devnull
528 7a4ee46d 2003-11-23 devnull is = MKZ(ISect);
529 7a4ee46d 2003-11-23 devnull if(is == nil)
530 7a4ee46d 2003-11-23 devnull return nil;
531 7a4ee46d 2003-11-23 devnull
532 7a4ee46d 2003-11-23 devnull namecp(is->name, name);
533 7a4ee46d 2003-11-23 devnull is->version = ISectVersion;
534 7a4ee46d 2003-11-23 devnull is->part = part;
535 7a4ee46d 2003-11-23 devnull is->blocksize = blocksize;
536 7a4ee46d 2003-11-23 devnull is->start = 0;
537 7a4ee46d 2003-11-23 devnull is->stop = 0;
538 7a4ee46d 2003-11-23 devnull tabbase = (PartBlank + HeadSize + blocksize - 1) & ~(blocksize - 1);
539 7a4ee46d 2003-11-23 devnull is->blockbase = (tabbase + tabsize + blocksize - 1) & ~(blocksize - 1);
540 7a4ee46d 2003-11-23 devnull is->blocks = is->part->size / blocksize - is->blockbase / blocksize;
541 7a4ee46d 2003-11-23 devnull
542 7a4ee46d 2003-11-23 devnull is = initisect1(is);
543 7a4ee46d 2003-11-23 devnull if(is == nil)
544 7a4ee46d 2003-11-23 devnull return nil;
545 7a4ee46d 2003-11-23 devnull
546 7a4ee46d 2003-11-23 devnull return is;
547 7a4ee46d 2003-11-23 devnull }
548 7a4ee46d 2003-11-23 devnull
549 7a4ee46d 2003-11-23 devnull /*
550 9ffbb5ad 2004-03-12 devnull * initialize the computed parameters for an index
551 7a4ee46d 2003-11-23 devnull */
552 7a4ee46d 2003-11-23 devnull static ISect*
553 7a4ee46d 2003-11-23 devnull initisect1(ISect *is)
554 7a4ee46d 2003-11-23 devnull {
555 7a4ee46d 2003-11-23 devnull u64int v;
556 7a4ee46d 2003-11-23 devnull
557 7a4ee46d 2003-11-23 devnull is->buckmax = (is->blocksize - IBucketSize) / IEntrySize;
558 7a4ee46d 2003-11-23 devnull is->blocklog = u64log2(is->blocksize);
559 7a4ee46d 2003-11-23 devnull if(is->blocksize != (1 << is->blocklog)){
560 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "illegal non-power-of-2 bucket size %d\n", is->blocksize);
561 7a4ee46d 2003-11-23 devnull freeisect(is);
562 7a4ee46d 2003-11-23 devnull return nil;
563 7a4ee46d 2003-11-23 devnull }
564 7a4ee46d 2003-11-23 devnull partblocksize(is->part, is->blocksize);
565 7a4ee46d 2003-11-23 devnull is->tabbase = (PartBlank + HeadSize + is->blocksize - 1) & ~(is->blocksize - 1);
566 7a4ee46d 2003-11-23 devnull if(is->tabbase >= is->blockbase){
567 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "index section config table overlaps bucket storage");
568 7a4ee46d 2003-11-23 devnull freeisect(is);
569 7a4ee46d 2003-11-23 devnull return nil;
570 7a4ee46d 2003-11-23 devnull }
571 7a4ee46d 2003-11-23 devnull is->tabsize = is->blockbase - is->tabbase;
572 7a4ee46d 2003-11-23 devnull v = is->part->size & ~(u64int)(is->blocksize - 1);
573 7a4ee46d 2003-11-23 devnull if(is->blockbase + (u64int)is->blocks * is->blocksize != v){
574 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "invalid blocks in index section %s", is->name);
575 7a4ee46d 2003-11-23 devnull //ZZZZZZZZZ
576 7a4ee46d 2003-11-23 devnull // freeisect(is);
577 7a4ee46d 2003-11-23 devnull // return nil;
578 7a4ee46d 2003-11-23 devnull }
579 7a4ee46d 2003-11-23 devnull
580 7a4ee46d 2003-11-23 devnull if(is->stop - is->start > is->blocks){
581 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "index section overflows available space");
582 7a4ee46d 2003-11-23 devnull freeisect(is);
583 7a4ee46d 2003-11-23 devnull return nil;
584 7a4ee46d 2003-11-23 devnull }
585 7a4ee46d 2003-11-23 devnull if(is->start > is->stop){
586 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "invalid index section range");
587 7a4ee46d 2003-11-23 devnull freeisect(is);
588 7a4ee46d 2003-11-23 devnull return nil;
589 7a4ee46d 2003-11-23 devnull }
590 7a4ee46d 2003-11-23 devnull
591 7a4ee46d 2003-11-23 devnull return is;
592 7a4ee46d 2003-11-23 devnull }
593 7a4ee46d 2003-11-23 devnull
594 7a4ee46d 2003-11-23 devnull int
595 7a4ee46d 2003-11-23 devnull wbisect(ISect *is)
596 7a4ee46d 2003-11-23 devnull {
597 7a4ee46d 2003-11-23 devnull ZBlock *b;
598 7a4ee46d 2003-11-23 devnull
599 7a4ee46d 2003-11-23 devnull b = alloczblock(HeadSize, 1);
600 7a4ee46d 2003-11-23 devnull if(b == nil)
601 7a4ee46d 2003-11-23 devnull //ZZZ set error?
602 7a4ee46d 2003-11-23 devnull return -1;
603 7a4ee46d 2003-11-23 devnull
604 7a4ee46d 2003-11-23 devnull if(packisect(is, b->data) < 0){
605 7a4ee46d 2003-11-23 devnull seterr(ECorrupt, "can't make index section header: %r");
606 7a4ee46d 2003-11-23 devnull freezblock(b);
607 7a4ee46d 2003-11-23 devnull return -1;
608 7a4ee46d 2003-11-23 devnull }
609 7a4ee46d 2003-11-23 devnull if(writepart(is->part, PartBlank, b->data, HeadSize) < 0){
610 7a4ee46d 2003-11-23 devnull seterr(EAdmin, "can't write index section header: %r");
611 7a4ee46d 2003-11-23 devnull freezblock(b);
612 7a4ee46d 2003-11-23 devnull return -1;
613 7a4ee46d 2003-11-23 devnull }
614 7a4ee46d 2003-11-23 devnull freezblock(b);
615 7a4ee46d 2003-11-23 devnull
616 7a4ee46d 2003-11-23 devnull return 0;
617 7a4ee46d 2003-11-23 devnull }
618 7a4ee46d 2003-11-23 devnull
619 7a4ee46d 2003-11-23 devnull void
620 7a4ee46d 2003-11-23 devnull freeisect(ISect *is)
621 7a4ee46d 2003-11-23 devnull {
622 7a4ee46d 2003-11-23 devnull if(is == nil)
623 7a4ee46d 2003-11-23 devnull return;
624 7a4ee46d 2003-11-23 devnull free(is);
625 7a4ee46d 2003-11-23 devnull }
626 7a4ee46d 2003-11-23 devnull
627 7a4ee46d 2003-11-23 devnull void
628 7a4ee46d 2003-11-23 devnull freeindex(Index *ix)
629 7a4ee46d 2003-11-23 devnull {
630 7a4ee46d 2003-11-23 devnull int i;
631 7a4ee46d 2003-11-23 devnull
632 7a4ee46d 2003-11-23 devnull if(ix == nil)
633 7a4ee46d 2003-11-23 devnull return;
634 7a4ee46d 2003-11-23 devnull free(ix->amap);
635 7a4ee46d 2003-11-23 devnull free(ix->arenas);
636 7a4ee46d 2003-11-23 devnull if(ix->sects)
637 7a4ee46d 2003-11-23 devnull for(i = 0; i < ix->nsects; i++)
638 7a4ee46d 2003-11-23 devnull freeisect(ix->sects[i]);
639 7a4ee46d 2003-11-23 devnull free(ix->sects);
640 7a4ee46d 2003-11-23 devnull free(ix->smap);
641 7a4ee46d 2003-11-23 devnull free(ix);
642 7a4ee46d 2003-11-23 devnull }
643 7a4ee46d 2003-11-23 devnull
644 7a4ee46d 2003-11-23 devnull /*
645 7a4ee46d 2003-11-23 devnull * write a clump to an available arena in the index
646 7a4ee46d 2003-11-23 devnull * and return the address of the clump within the index.
647 7a4ee46d 2003-11-23 devnull ZZZ question: should this distinguish between an arena
648 7a4ee46d 2003-11-23 devnull filling up and real errors writing the clump?
649 7a4ee46d 2003-11-23 devnull */
650 7a4ee46d 2003-11-23 devnull u64int
651 7a4ee46d 2003-11-23 devnull writeiclump(Index *ix, Clump *c, u8int *clbuf)
652 7a4ee46d 2003-11-23 devnull {
653 7a4ee46d 2003-11-23 devnull u64int a;
654 7a4ee46d 2003-11-23 devnull int i;
655 7a4ee46d 2003-11-23 devnull
656 7a4ee46d 2003-11-23 devnull for(i = ix->mapalloc; i < ix->narenas; i++){
657 7a4ee46d 2003-11-23 devnull a = writeaclump(ix->arenas[i], c, clbuf);
658 7a4ee46d 2003-11-23 devnull if(a != TWID64)
659 7a4ee46d 2003-11-23 devnull return a + ix->amap[i].start;
660 7a4ee46d 2003-11-23 devnull }
661 7a4ee46d 2003-11-23 devnull
662 7a4ee46d 2003-11-23 devnull seterr(EAdmin, "no space left in arenas");
663 7c5190d2 2004-03-12 devnull return TWID64;
664 7a4ee46d 2003-11-23 devnull }
665 7a4ee46d 2003-11-23 devnull
666 7a4ee46d 2003-11-23 devnull /*
667 9ffbb5ad 2004-03-12 devnull * convert an arena index to an relative arena address
668 7a4ee46d 2003-11-23 devnull */
669 7a4ee46d 2003-11-23 devnull Arena*
670 7a4ee46d 2003-11-23 devnull amapitoa(Index *ix, u64int a, u64int *aa)
671 7a4ee46d 2003-11-23 devnull {
672 7a4ee46d 2003-11-23 devnull int i, r, l, m;
673 7a4ee46d 2003-11-23 devnull
674 7a4ee46d 2003-11-23 devnull l = 1;
675 7a4ee46d 2003-11-23 devnull r = ix->narenas - 1;
676 7a4ee46d 2003-11-23 devnull while(l <= r){
677 7a4ee46d 2003-11-23 devnull m = (r + l) / 2;
678 7a4ee46d 2003-11-23 devnull if(ix->amap[m].start <= a)
679 7a4ee46d 2003-11-23 devnull l = m + 1;
680 7a4ee46d 2003-11-23 devnull else
681 7a4ee46d 2003-11-23 devnull r = m - 1;
682 7a4ee46d 2003-11-23 devnull }
683 7a4ee46d 2003-11-23 devnull l--;
684 7a4ee46d 2003-11-23 devnull
685 7a4ee46d 2003-11-23 devnull if(a > ix->amap[l].stop){
686 7a4ee46d 2003-11-23 devnull for(i=0; i<ix->narenas; i++)
687 7a4ee46d 2003-11-23 devnull print("arena %d: %llux - %llux\n", i, ix->amap[i].start, ix->amap[i].stop);
688 7a4ee46d 2003-11-23 devnull print("want arena %d for %llux\n", l, a);
689 7a4ee46d 2003-11-23 devnull seterr(ECrash, "unmapped address passed to amapitoa");
690 7a4ee46d 2003-11-23 devnull return nil;
691 7a4ee46d 2003-11-23 devnull }
692 7a4ee46d 2003-11-23 devnull
693 7a4ee46d 2003-11-23 devnull if(ix->arenas[l] == nil){
694 7a4ee46d 2003-11-23 devnull seterr(ECrash, "unmapped arena selected in amapitoa");
695 7a4ee46d 2003-11-23 devnull return nil;
696 7a4ee46d 2003-11-23 devnull }
697 7a4ee46d 2003-11-23 devnull *aa = a - ix->amap[l].start;
698 7a4ee46d 2003-11-23 devnull return ix->arenas[l];
699 7a4ee46d 2003-11-23 devnull }
700 7a4ee46d 2003-11-23 devnull
701 7a4ee46d 2003-11-23 devnull int
702 7a4ee46d 2003-11-23 devnull iaddrcmp(IAddr *ia1, IAddr *ia2)
703 7a4ee46d 2003-11-23 devnull {
704 7a4ee46d 2003-11-23 devnull return ia1->type != ia2->type
705 7a4ee46d 2003-11-23 devnull || ia1->size != ia2->size
706 7a4ee46d 2003-11-23 devnull || ia1->blocks != ia2->blocks
707 7a4ee46d 2003-11-23 devnull || ia1->addr != ia2->addr;
708 7a4ee46d 2003-11-23 devnull }
709 7a4ee46d 2003-11-23 devnull
710 7a4ee46d 2003-11-23 devnull /*
711 7a4ee46d 2003-11-23 devnull * lookup the score in the partition
712 7a4ee46d 2003-11-23 devnull *
713 7a4ee46d 2003-11-23 devnull * nothing needs to be explicitly locked:
714 7a4ee46d 2003-11-23 devnull * only static parts of ix are used, and
715 7a4ee46d 2003-11-23 devnull * the bucket is locked by the DBlock lock.
716 7a4ee46d 2003-11-23 devnull */
717 7a4ee46d 2003-11-23 devnull int
718 7a4ee46d 2003-11-23 devnull loadientry(Index *ix, u8int *score, int type, IEntry *ie)
719 7a4ee46d 2003-11-23 devnull {
720 7a4ee46d 2003-11-23 devnull ISect *is;
721 7a4ee46d 2003-11-23 devnull DBlock *b;
722 7a4ee46d 2003-11-23 devnull IBucket ib;
723 7a4ee46d 2003-11-23 devnull u32int buck;
724 7a4ee46d 2003-11-23 devnull int h, ok;
725 7a4ee46d 2003-11-23 devnull
726 7a4ee46d 2003-11-23 devnull ok = -1;
727 7a4ee46d 2003-11-23 devnull
728 7c5190d2 2004-03-12 devnull qlock(&stats.lock);
729 7c5190d2 2004-03-12 devnull stats.indexreads++;
730 7c5190d2 2004-03-12 devnull qunlock(&stats.lock);
731 9ffbb5ad 2004-03-12 devnull b = loadibucket(ix, score, &is, &buck, &ib);
732 7c5190d2 2004-03-12 devnull if(b == nil)
733 7c5190d2 2004-03-12 devnull return -1;
734 7a4ee46d 2003-11-23 devnull
735 7c5190d2 2004-03-12 devnull if(okibucket(&ib, is) < 0)
736 7c5190d2 2004-03-12 devnull goto out;
737 7a4ee46d 2003-11-23 devnull
738 7c5190d2 2004-03-12 devnull h = bucklook(score, type, ib.data, ib.n);
739 7c5190d2 2004-03-12 devnull if(h & 1){
740 7c5190d2 2004-03-12 devnull h ^= 1;
741 7c5190d2 2004-03-12 devnull unpackientry(ie, &ib.data[h]);
742 7c5190d2 2004-03-12 devnull ok = 0;
743 7c5190d2 2004-03-12 devnull goto out;
744 7a4ee46d 2003-11-23 devnull }
745 7c5190d2 2004-03-12 devnull
746 7c5190d2 2004-03-12 devnull out:
747 7a4ee46d 2003-11-23 devnull putdblock(b);
748 7a4ee46d 2003-11-23 devnull return ok;
749 7a4ee46d 2003-11-23 devnull }
750 7a4ee46d 2003-11-23 devnull
751 7a4ee46d 2003-11-23 devnull /*
752 7a4ee46d 2003-11-23 devnull * insert or update an index entry into the appropriate bucket
753 7a4ee46d 2003-11-23 devnull */
754 7a4ee46d 2003-11-23 devnull int
755 7a4ee46d 2003-11-23 devnull storeientry(Index *ix, IEntry *ie)
756 7a4ee46d 2003-11-23 devnull {
757 7a4ee46d 2003-11-23 devnull ISect *is;
758 7a4ee46d 2003-11-23 devnull DBlock *b;
759 7a4ee46d 2003-11-23 devnull IBucket ib;
760 7a4ee46d 2003-11-23 devnull u32int buck;
761 7a4ee46d 2003-11-23 devnull int h, ok;
762 7a4ee46d 2003-11-23 devnull
763 7a4ee46d 2003-11-23 devnull ok = 0;
764 7a4ee46d 2003-11-23 devnull
765 7c5190d2 2004-03-12 devnull qlock(&stats.lock);
766 7c5190d2 2004-03-12 devnull stats.indexwreads++;
767 7c5190d2 2004-03-12 devnull qunlock(&stats.lock);
768 7a4ee46d 2003-11-23 devnull
769 333c1dcc 2004-03-13 devnull top:
770 9ffbb5ad 2004-03-12 devnull b = loadibucket(ix, ie->score, &is, &buck, &ib);
771 7c5190d2 2004-03-12 devnull if(b == nil)
772 7c5190d2 2004-03-12 devnull return -1;
773 7a4ee46d 2003-11-23 devnull
774 7c5190d2 2004-03-12 devnull if(okibucket(&ib, is) < 0)
775 7c5190d2 2004-03-12 devnull goto out;
776 7a4ee46d 2003-11-23 devnull
777 7c5190d2 2004-03-12 devnull h = bucklook(ie->score, ie->ia.type, ib.data, ib.n);
778 7c5190d2 2004-03-12 devnull if(h & 1){
779 7c5190d2 2004-03-12 devnull h ^= 1;
780 7c5190d2 2004-03-12 devnull dirtydblock(b, DirtyIndex);
781 7c5190d2 2004-03-12 devnull packientry(ie, &ib.data[h]);
782 7c5190d2 2004-03-12 devnull ok = writebucket(is, buck, &ib, b);
783 7c5190d2 2004-03-12 devnull goto out;
784 7c5190d2 2004-03-12 devnull }
785 7a4ee46d 2003-11-23 devnull
786 7c5190d2 2004-03-12 devnull if(ib.n < is->buckmax){
787 7c5190d2 2004-03-12 devnull dirtydblock(b, DirtyIndex);
788 7c5190d2 2004-03-12 devnull memmove(&ib.data[h + IEntrySize], &ib.data[h], ib.n * IEntrySize - h);
789 7c5190d2 2004-03-12 devnull ib.n++;
790 7c5190d2 2004-03-12 devnull
791 7c5190d2 2004-03-12 devnull packientry(ie, &ib.data[h]);
792 7c5190d2 2004-03-12 devnull ok = writebucket(is, buck, &ib, b);
793 333c1dcc 2004-03-13 devnull goto out;
794 333c1dcc 2004-03-13 devnull }
795 333c1dcc 2004-03-13 devnull
796 333c1dcc 2004-03-13 devnull /* block is full -- not supposed to happen in V1 */
797 333c1dcc 2004-03-13 devnull if(ix->version == IndexVersion1){
798 333c1dcc 2004-03-13 devnull seterr(EAdmin, "index bucket %ud on %s is full\n", buck, is->part->name);
799 333c1dcc 2004-03-13 devnull ok = -1;
800 7c5190d2 2004-03-12 devnull goto out;
801 7a4ee46d 2003-11-23 devnull }
802 333c1dcc 2004-03-13 devnull
803 333c1dcc 2004-03-13 devnull /* in V2, split the block and try again; splitiblock puts b */
804 333c1dcc 2004-03-13 devnull if(splitiblock(ix, b, is, buck, &ib) < 0)
805 333c1dcc 2004-03-13 devnull return -1;
806 333c1dcc 2004-03-13 devnull goto top;
807 7a4ee46d 2003-11-23 devnull
808 7c5190d2 2004-03-12 devnull out:
809 7a4ee46d 2003-11-23 devnull putdblock(b);
810 7a4ee46d 2003-11-23 devnull return ok;
811 7a4ee46d 2003-11-23 devnull }
812 7a4ee46d 2003-11-23 devnull
813 7a4ee46d 2003-11-23 devnull static int
814 7a4ee46d 2003-11-23 devnull writebucket(ISect *is, u32int buck, IBucket *ib, DBlock *b)
815 7a4ee46d 2003-11-23 devnull {
816 333c1dcc 2004-03-13 devnull assert(b->dirty == DirtyIndex || b->dirty == DirtyIndexSplit);
817 24998851 2004-03-11 devnull
818 24998851 2004-03-11 devnull if(buck >= is->blocks){
819 7a4ee46d 2003-11-23 devnull seterr(EAdmin, "index write out of bounds: %d >= %d\n",
820 7a4ee46d 2003-11-23 devnull buck, is->blocks);
821 24998851 2004-03-11 devnull return -1;
822 24998851 2004-03-11 devnull }
823 7a4ee46d 2003-11-23 devnull qlock(&stats.lock);
824 7a4ee46d 2003-11-23 devnull stats.indexwrites++;
825 7a4ee46d 2003-11-23 devnull qunlock(&stats.lock);
826 7a4ee46d 2003-11-23 devnull packibucket(ib, b->data);
827 24998851 2004-03-11 devnull // return writepart(is->part, is->blockbase + ((u64int)buck << is->blocklog), b->data, is->blocksize);
828 7c5190d2 2004-03-12 devnull return 0;
829 7c5190d2 2004-03-12 devnull }
830 7c5190d2 2004-03-12 devnull
831 7c5190d2 2004-03-12 devnull static int
832 7a4ee46d 2003-11-23 devnull okibucket(IBucket *ib, ISect *is)
833 7a4ee46d 2003-11-23 devnull {
834 9ffbb5ad 2004-03-12 devnull if(ib->n <= is->buckmax)
835 7a4ee46d 2003-11-23 devnull return 0;
836 7a4ee46d 2003-11-23 devnull
837 9ffbb5ad 2004-03-12 devnull seterr(EICorrupt, "corrupted disk index bucket: n=%ud max=%ud, depth=%lud range=[%lud,%lud)",
838 9ffbb5ad 2004-03-12 devnull ib->n, is->buckmax, ib->depth, is->start, is->stop);
839 7a4ee46d 2003-11-23 devnull return -1;
840 7a4ee46d 2003-11-23 devnull }
841 7a4ee46d 2003-11-23 devnull
842 7a4ee46d 2003-11-23 devnull /*
843 7a4ee46d 2003-11-23 devnull * look for score within data;
844 7a4ee46d 2003-11-23 devnull * return 1 | byte index of matching index,
845 7a4ee46d 2003-11-23 devnull * or 0 | index of least element > score
846 7a4ee46d 2003-11-23 devnull */
847 7a4ee46d 2003-11-23 devnull static int
848 7a4ee46d 2003-11-23 devnull bucklook(u8int *score, int otype, u8int *data, int n)
849 7a4ee46d 2003-11-23 devnull {
850 7a4ee46d 2003-11-23 devnull int i, r, l, m, h, c, cc, type;
851 7a4ee46d 2003-11-23 devnull
852 7a4ee46d 2003-11-23 devnull type = vttodisktype(otype);
853 7a4ee46d 2003-11-23 devnull l = 0;
854 7a4ee46d 2003-11-23 devnull r = n - 1;
855 7a4ee46d 2003-11-23 devnull while(l <= r){
856 7a4ee46d 2003-11-23 devnull m = (r + l) >> 1;
857 7a4ee46d 2003-11-23 devnull h = m * IEntrySize;
858 7a4ee46d 2003-11-23 devnull for(i = 0; i < VtScoreSize; i++){
859 7a4ee46d 2003-11-23 devnull c = score[i];
860 7a4ee46d 2003-11-23 devnull cc = data[h + i];
861 7a4ee46d 2003-11-23 devnull if(c != cc){
862 7a4ee46d 2003-11-23 devnull if(c > cc)
863 7a4ee46d 2003-11-23 devnull l = m + 1;
864 7a4ee46d 2003-11-23 devnull else
865 7a4ee46d 2003-11-23 devnull r = m - 1;
866 7a4ee46d 2003-11-23 devnull goto cont;
867 7a4ee46d 2003-11-23 devnull }
868 7a4ee46d 2003-11-23 devnull }
869 7a4ee46d 2003-11-23 devnull cc = data[h + IEntryTypeOff];
870 7a4ee46d 2003-11-23 devnull if(type != cc){
871 7a4ee46d 2003-11-23 devnull if(type > cc)
872 7a4ee46d 2003-11-23 devnull l = m + 1;
873 7a4ee46d 2003-11-23 devnull else
874 7a4ee46d 2003-11-23 devnull r = m - 1;
875 7a4ee46d 2003-11-23 devnull goto cont;
876 7a4ee46d 2003-11-23 devnull }
877 7a4ee46d 2003-11-23 devnull return h | 1;
878 7a4ee46d 2003-11-23 devnull cont:;
879 7a4ee46d 2003-11-23 devnull }
880 7a4ee46d 2003-11-23 devnull
881 7a4ee46d 2003-11-23 devnull return l * IEntrySize;
882 7a4ee46d 2003-11-23 devnull }
883 7a4ee46d 2003-11-23 devnull
884 7a4ee46d 2003-11-23 devnull /*
885 7a4ee46d 2003-11-23 devnull * compare two IEntries; consistent with bucklook
886 7a4ee46d 2003-11-23 devnull */
887 7a4ee46d 2003-11-23 devnull int
888 7a4ee46d 2003-11-23 devnull ientrycmp(const void *vie1, const void *vie2)
889 7a4ee46d 2003-11-23 devnull {
890 7a4ee46d 2003-11-23 devnull u8int *ie1, *ie2;
891 7a4ee46d 2003-11-23 devnull int i, v1, v2;
892 7a4ee46d 2003-11-23 devnull
893 7a4ee46d 2003-11-23 devnull ie1 = (u8int*)vie1;
894 7a4ee46d 2003-11-23 devnull ie2 = (u8int*)vie2;
895 7a4ee46d 2003-11-23 devnull for(i = 0; i < VtScoreSize; i++){
896 7a4ee46d 2003-11-23 devnull v1 = ie1[i];
897 7a4ee46d 2003-11-23 devnull v2 = ie2[i];
898 7a4ee46d 2003-11-23 devnull if(v1 != v2){
899 7a4ee46d 2003-11-23 devnull if(v1 < v2)
900 7a4ee46d 2003-11-23 devnull return -1;
901 7a4ee46d 2003-11-23 devnull return 0;
902 7a4ee46d 2003-11-23 devnull }
903 7a4ee46d 2003-11-23 devnull }
904 7a4ee46d 2003-11-23 devnull v1 = ie1[IEntryTypeOff];
905 7a4ee46d 2003-11-23 devnull v2 = ie2[IEntryTypeOff];
906 7a4ee46d 2003-11-23 devnull if(v1 != v2){
907 7a4ee46d 2003-11-23 devnull if(v1 < v2)
908 7a4ee46d 2003-11-23 devnull return -1;
909 7a4ee46d 2003-11-23 devnull return 0;
910 7a4ee46d 2003-11-23 devnull }
911 7a4ee46d 2003-11-23 devnull return -1;
912 7a4ee46d 2003-11-23 devnull }
913 9ffbb5ad 2004-03-12 devnull
914 9ffbb5ad 2004-03-12 devnull /*
915 9ffbb5ad 2004-03-12 devnull * find the number of the index section holding bucket #buck
916 9ffbb5ad 2004-03-12 devnull */
917 9ffbb5ad 2004-03-12 devnull static int
918 9ffbb5ad 2004-03-12 devnull indexsect0(Index *ix, u32int buck)
919 9ffbb5ad 2004-03-12 devnull {
920 9ffbb5ad 2004-03-12 devnull int r, l, m;
921 9ffbb5ad 2004-03-12 devnull
922 9ffbb5ad 2004-03-12 devnull l = 1;
923 9ffbb5ad 2004-03-12 devnull r = ix->nsects - 1;
924 9ffbb5ad 2004-03-12 devnull while(l <= r){
925 9ffbb5ad 2004-03-12 devnull m = (r + l) >> 1;
926 9ffbb5ad 2004-03-12 devnull if(ix->sects[m]->start <= buck)
927 9ffbb5ad 2004-03-12 devnull l = m + 1;
928 9ffbb5ad 2004-03-12 devnull else
929 9ffbb5ad 2004-03-12 devnull r = m - 1;
930 9ffbb5ad 2004-03-12 devnull }
931 9ffbb5ad 2004-03-12 devnull return l - 1;
932 9ffbb5ad 2004-03-12 devnull }
933 9ffbb5ad 2004-03-12 devnull
934 9ffbb5ad 2004-03-12 devnull /*
935 9ffbb5ad 2004-03-12 devnull * load the index block at bucket #buck
936 9ffbb5ad 2004-03-12 devnull */
937 9ffbb5ad 2004-03-12 devnull static DBlock*
938 333c1dcc 2004-03-13 devnull loadibucket0(Index *ix, u32int buck, ISect **pis, u32int *pbuck, IBucket *ib, int read)
939 9ffbb5ad 2004-03-12 devnull {
940 9ffbb5ad 2004-03-12 devnull ISect *is;
941 9ffbb5ad 2004-03-12 devnull DBlock *b;
942 9ffbb5ad 2004-03-12 devnull
943 9ffbb5ad 2004-03-12 devnull is = ix->sects[indexsect0(ix, buck)];
944 9ffbb5ad 2004-03-12 devnull if(buck < is->start || is->stop <= buck){
945 9ffbb5ad 2004-03-12 devnull seterr(EAdmin, "index lookup out of range: %ud not found in index\n", buck);
946 9ffbb5ad 2004-03-12 devnull return nil;
947 9ffbb5ad 2004-03-12 devnull }
948 9ffbb5ad 2004-03-12 devnull
949 9ffbb5ad 2004-03-12 devnull buck -= is->start;
950 333c1dcc 2004-03-13 devnull if((b = getdblock(is->part, is->blockbase + ((u64int)buck << is->blocklog), read)) == nil)
951 9ffbb5ad 2004-03-12 devnull return nil;
952 9ffbb5ad 2004-03-12 devnull
953 333c1dcc 2004-03-13 devnull if(pis)
954 333c1dcc 2004-03-13 devnull *pis = is;
955 333c1dcc 2004-03-13 devnull if(pbuck)
956 333c1dcc 2004-03-13 devnull *pbuck = buck;
957 333c1dcc 2004-03-13 devnull if(ib)
958 333c1dcc 2004-03-13 devnull unpackibucket(ib, b->data);
959 9ffbb5ad 2004-03-12 devnull return b;
960 9ffbb5ad 2004-03-12 devnull }
961 9ffbb5ad 2004-03-12 devnull
962 9ffbb5ad 2004-03-12 devnull /*
963 9ffbb5ad 2004-03-12 devnull * find the number of the index section holding score
964 9ffbb5ad 2004-03-12 devnull */
965 9ffbb5ad 2004-03-12 devnull static int
966 9ffbb5ad 2004-03-12 devnull indexsect1(Index *ix, u8int *score)
967 9ffbb5ad 2004-03-12 devnull {
968 9ffbb5ad 2004-03-12 devnull return indexsect0(ix, hashbits(score, 32) / ix->div);
969 9ffbb5ad 2004-03-12 devnull }
970 9ffbb5ad 2004-03-12 devnull
971 9ffbb5ad 2004-03-12 devnull /*
972 9ffbb5ad 2004-03-12 devnull * load the index block responsible for score.
973 9ffbb5ad 2004-03-12 devnull */
974 9ffbb5ad 2004-03-12 devnull static DBlock*
975 9ffbb5ad 2004-03-12 devnull loadibucket1(Index *ix, u8int *score, ISect **pis, u32int *pbuck, IBucket *ib)
976 9ffbb5ad 2004-03-12 devnull {
977 333c1dcc 2004-03-13 devnull return loadibucket0(ix, hashbits(score, 32)/ix->div, pis, pbuck, ib, 1);
978 9ffbb5ad 2004-03-12 devnull }
979 9ffbb5ad 2004-03-12 devnull
980 9ffbb5ad 2004-03-12 devnull static u32int
981 9ffbb5ad 2004-03-12 devnull keytobuck(Index *ix, u32int key, int d)
982 9ffbb5ad 2004-03-12 devnull {
983 333c1dcc 2004-03-13 devnull /* clear all but top d bits; can't depend on boundary case shifts */
984 333c1dcc 2004-03-13 devnull if(d == 0)
985 333c1dcc 2004-03-13 devnull key = 0;
986 333c1dcc 2004-03-13 devnull else if(d != 32)
987 9ffbb5ad 2004-03-12 devnull key &= ~((1<<(32-d))-1);
988 9ffbb5ad 2004-03-12 devnull
989 9ffbb5ad 2004-03-12 devnull /* truncate to maxdepth bits */
990 9ffbb5ad 2004-03-12 devnull if(ix->maxdepth != 32)
991 9ffbb5ad 2004-03-12 devnull key >>= 32 - ix->maxdepth;
992 9ffbb5ad 2004-03-12 devnull
993 9ffbb5ad 2004-03-12 devnull return ix->bitblocks + key;
994 9ffbb5ad 2004-03-12 devnull }
995 9ffbb5ad 2004-03-12 devnull
996 9ffbb5ad 2004-03-12 devnull /*
997 9ffbb5ad 2004-03-12 devnull * to check whether .xxx has split, check whether .xxx1 is in use.
998 9ffbb5ad 2004-03-12 devnull * it might be worth caching the block for future lookups, but for now
999 9ffbb5ad 2004-03-12 devnull * let's assume the dcache is good enough.
1000 9ffbb5ad 2004-03-12 devnull */
1001 9ffbb5ad 2004-03-12 devnull static int
1002 9ffbb5ad 2004-03-12 devnull bitmapop(Index *ix, u32int key, int d, int set)
1003 9ffbb5ad 2004-03-12 devnull {
1004 9ffbb5ad 2004-03-12 devnull DBlock *b;
1005 9ffbb5ad 2004-03-12 devnull int inuse;
1006 333c1dcc 2004-03-13 devnull u32int key1, buck1;
1007 9ffbb5ad 2004-03-12 devnull
1008 9ffbb5ad 2004-03-12 devnull if(d >= ix->maxdepth)
1009 9ffbb5ad 2004-03-12 devnull return 0;
1010 9ffbb5ad 2004-03-12 devnull
1011 9ffbb5ad 2004-03-12 devnull
1012 333c1dcc 2004-03-13 devnull /* construct .xxx1 in bucket number format */
1013 333c1dcc 2004-03-13 devnull key1 = key | (1<<(32-d-1));
1014 333c1dcc 2004-03-13 devnull buck1 = keytobuck(ix, key1, d+1);
1015 9ffbb5ad 2004-03-12 devnull
1016 333c1dcc 2004-03-13 devnull if(0) fprint(2, "key %d/%0*ub key1 %d/%0*ub buck1 %08ux\n",
1017 333c1dcc 2004-03-13 devnull d, d, KEY(key, d), d+1, d+1, KEY(key1, d+1), buck1);
1018 333c1dcc 2004-03-13 devnull
1019 333c1dcc 2004-03-13 devnull /* check whether buck1 is in use */
1020 333c1dcc 2004-03-13 devnull
1021 333c1dcc 2004-03-13 devnull if((b = loadibucket0(ix, buck1 >> ix->bitkeylog, nil, nil, nil, 1)) == nil){
1022 9ffbb5ad 2004-03-12 devnull seterr(ECorrupt, "cannot load in-use bitmap block");
1023 333c1dcc 2004-03-13 devnull fprint(2, "loadibucket: %r\n");
1024 9ffbb5ad 2004-03-12 devnull return -1;
1025 9ffbb5ad 2004-03-12 devnull }
1026 333c1dcc 2004-03-13 devnull if(0) fprint(2, "buck1 %08ux bitkeymask %08ux bitkeylog %d\n", buck1, ix->bitkeymask, ix->bitkeylog);
1027 333c1dcc 2004-03-13 devnull buck1 &= ix->bitkeymask;
1028 333c1dcc 2004-03-13 devnull inuse = ((u32int*)b->data)[buck1>>5] & (1<<(buck1&31));
1029 9ffbb5ad 2004-03-12 devnull if(set && !inuse){
1030 9ffbb5ad 2004-03-12 devnull dirtydblock(b, DirtyIndexBitmap);
1031 333c1dcc 2004-03-13 devnull ((u32int*)b->data)[buck1>>5] |= (1<<(buck1&31));
1032 9ffbb5ad 2004-03-12 devnull }
1033 9ffbb5ad 2004-03-12 devnull putdblock(b);
1034 9ffbb5ad 2004-03-12 devnull return inuse;
1035 9ffbb5ad 2004-03-12 devnull }
1036 9ffbb5ad 2004-03-12 devnull
1037 9ffbb5ad 2004-03-12 devnull static int
1038 9ffbb5ad 2004-03-12 devnull issplit(Index *ix, u32int key, int d)
1039 9ffbb5ad 2004-03-12 devnull {
1040 9ffbb5ad 2004-03-12 devnull return bitmapop(ix, key, d, 0);
1041 9ffbb5ad 2004-03-12 devnull }
1042 9ffbb5ad 2004-03-12 devnull
1043 9ffbb5ad 2004-03-12 devnull static int
1044 9ffbb5ad 2004-03-12 devnull marksplit(Index *ix, u32int key, int d)
1045 9ffbb5ad 2004-03-12 devnull {
1046 9ffbb5ad 2004-03-12 devnull return bitmapop(ix, key, d, 1);
1047 9ffbb5ad 2004-03-12 devnull }
1048 9ffbb5ad 2004-03-12 devnull
1049 9ffbb5ad 2004-03-12 devnull /*
1050 9ffbb5ad 2004-03-12 devnull * find the number of the index section holding score.
1051 9ffbb5ad 2004-03-12 devnull * it's not terrible to be wrong once in a while, so we just
1052 9ffbb5ad 2004-03-12 devnull * do what the bitmap tells us and don't worry about the
1053 9ffbb5ad 2004-03-12 devnull * bitmap being out of date.
1054 9ffbb5ad 2004-03-12 devnull */
1055 9ffbb5ad 2004-03-12 devnull static int
1056 9ffbb5ad 2004-03-12 devnull indexsect2(Index *ix, u8int *score)
1057 9ffbb5ad 2004-03-12 devnull {
1058 9ffbb5ad 2004-03-12 devnull u32int key;
1059 9ffbb5ad 2004-03-12 devnull int d, is;
1060 9ffbb5ad 2004-03-12 devnull
1061 9ffbb5ad 2004-03-12 devnull key = hashbits(score, 32);
1062 9ffbb5ad 2004-03-12 devnull for(d=0; d<=ix->maxdepth; d++){
1063 9ffbb5ad 2004-03-12 devnull is = issplit(ix, key, d);
1064 9ffbb5ad 2004-03-12 devnull if(is == -1)
1065 9ffbb5ad 2004-03-12 devnull return 0; /* must return something valid! */
1066 9ffbb5ad 2004-03-12 devnull if(!is)
1067 9ffbb5ad 2004-03-12 devnull break;
1068 9ffbb5ad 2004-03-12 devnull }
1069 9ffbb5ad 2004-03-12 devnull
1070 9ffbb5ad 2004-03-12 devnull if(d > ix->maxdepth){
1071 9ffbb5ad 2004-03-12 devnull seterr(EBug, "index bitmap inconsistent with maxdepth");
1072 9ffbb5ad 2004-03-12 devnull return 0; /* must return something valid! */
1073 9ffbb5ad 2004-03-12 devnull }
1074 9ffbb5ad 2004-03-12 devnull
1075 9ffbb5ad 2004-03-12 devnull return indexsect0(ix, keytobuck(ix, key, d));
1076 9ffbb5ad 2004-03-12 devnull }
1077 9ffbb5ad 2004-03-12 devnull
1078 9ffbb5ad 2004-03-12 devnull /*
1079 9ffbb5ad 2004-03-12 devnull * load the index block responsible for score.
1080 9ffbb5ad 2004-03-12 devnull * (unlike indexsect2, must be correct always.)
1081 9ffbb5ad 2004-03-12 devnull */
1082 9ffbb5ad 2004-03-12 devnull static DBlock*
1083 9ffbb5ad 2004-03-12 devnull loadibucket2(Index *ix, u8int *score, ISect **pis, u32int *pbuck, IBucket *ib)
1084 9ffbb5ad 2004-03-12 devnull {
1085 9ffbb5ad 2004-03-12 devnull u32int key;
1086 9ffbb5ad 2004-03-12 devnull int d, try, is;
1087 9ffbb5ad 2004-03-12 devnull DBlock *b;
1088 9ffbb5ad 2004-03-12 devnull
1089 9ffbb5ad 2004-03-12 devnull for(try=0; try<32; try++){
1090 9ffbb5ad 2004-03-12 devnull key = hashbits(score, 32);
1091 9ffbb5ad 2004-03-12 devnull for(d=0; d<=ix->maxdepth; d++){
1092 9ffbb5ad 2004-03-12 devnull is = issplit(ix, key, d);
1093 9ffbb5ad 2004-03-12 devnull if(is == -1)
1094 9ffbb5ad 2004-03-12 devnull return nil;
1095 9ffbb5ad 2004-03-12 devnull if(!is)
1096 9ffbb5ad 2004-03-12 devnull break;
1097 9ffbb5ad 2004-03-12 devnull }
1098 9ffbb5ad 2004-03-12 devnull if(d > ix->maxdepth){
1099 9ffbb5ad 2004-03-12 devnull seterr(EBug, "index bitmap inconsistent with maxdepth");
1100 9ffbb5ad 2004-03-12 devnull return nil;
1101 9ffbb5ad 2004-03-12 devnull }
1102 9ffbb5ad 2004-03-12 devnull
1103 333c1dcc 2004-03-13 devnull if((b = loadibucket0(ix, keytobuck(ix, key, d), pis, pbuck, ib, 1)) == nil)
1104 9ffbb5ad 2004-03-12 devnull return nil;
1105 9ffbb5ad 2004-03-12 devnull
1106 9ffbb5ad 2004-03-12 devnull if(ib->depth == d)
1107 9ffbb5ad 2004-03-12 devnull return b;
1108 9ffbb5ad 2004-03-12 devnull
1109 9ffbb5ad 2004-03-12 devnull if(ib->depth < d){
1110 333c1dcc 2004-03-13 devnull fprint(2, "loaded block %ud for %d/%0*ub got %d/%0*ub\n",
1111 333c1dcc 2004-03-13 devnull *pbuck, d, d, KEY(key,d), ib->depth, ib->depth, KEY(key, ib->depth));
1112 9ffbb5ad 2004-03-12 devnull seterr(EBug, "index block has smaller depth than expected -- cannot happen");
1113 9ffbb5ad 2004-03-12 devnull putdblock(b);
1114 9ffbb5ad 2004-03-12 devnull return nil;
1115 9ffbb5ad 2004-03-12 devnull }
1116 9ffbb5ad 2004-03-12 devnull
1117 9ffbb5ad 2004-03-12 devnull /*
1118 9ffbb5ad 2004-03-12 devnull * ib->depth > d, meaning the bitmap was out of date.
1119 333c1dcc 2004-03-13 devnull * fix the bitmap and try again. if we actually updated
1120 333c1dcc 2004-03-13 devnull * the bitmap in splitiblock, this would only happen if
1121 333c1dcc 2004-03-13 devnull * venti crashed at an inopportune moment. but this way
1122 333c1dcc 2004-03-13 devnull * the code gets tested more.
1123 9ffbb5ad 2004-03-12 devnull */
1124 333c1dcc 2004-03-13 devnull if(0) fprint(2, "update bitmap: %d/%0*ub is split\n", d, d, KEY(key,d));
1125 9ffbb5ad 2004-03-12 devnull putdblock(b);
1126 9ffbb5ad 2004-03-12 devnull if(marksplit(ix, key, d) < 0)
1127 9ffbb5ad 2004-03-12 devnull return nil;
1128 9ffbb5ad 2004-03-12 devnull }
1129 9ffbb5ad 2004-03-12 devnull seterr(EBug, "loadibucket2 failed to sync bitmap with disk!");
1130 9ffbb5ad 2004-03-12 devnull return nil;
1131 9ffbb5ad 2004-03-12 devnull }
1132 9ffbb5ad 2004-03-12 devnull
1133 333c1dcc 2004-03-13 devnull static int
1134 333c1dcc 2004-03-13 devnull splitiblock(Index *ix, DBlock *b, ISect *is, u32int buck, IBucket *ib)
1135 333c1dcc 2004-03-13 devnull {
1136 333c1dcc 2004-03-13 devnull int i, d;
1137 333c1dcc 2004-03-13 devnull u8int *score;
1138 333c1dcc 2004-03-13 devnull u32int buck0, buck1, key0, key1, key, dmask;
1139 333c1dcc 2004-03-13 devnull DBlock *b0, *b1;
1140 333c1dcc 2004-03-13 devnull IBucket ib0, ib1;
1141 333c1dcc 2004-03-13 devnull
1142 333c1dcc 2004-03-13 devnull if(ib->depth == ix->maxdepth){
1143 333c1dcc 2004-03-13 devnull if(0) fprint(2, "depth=%d == maxdepth\n", ib->depth);
1144 333c1dcc 2004-03-13 devnull seterr(EAdmin, "index bucket %ud on %s is full\n", buck, is->part->name);
1145 333c1dcc 2004-03-13 devnull putdblock(b);
1146 333c1dcc 2004-03-13 devnull return -1;
1147 333c1dcc 2004-03-13 devnull }
1148 333c1dcc 2004-03-13 devnull
1149 333c1dcc 2004-03-13 devnull buck = is->start+buck - ix->bitblocks;
1150 333c1dcc 2004-03-13 devnull d = ib->depth+1;
1151 333c1dcc 2004-03-13 devnull buck0 = buck;
1152 333c1dcc 2004-03-13 devnull buck1 = buck0 | (1<<(ix->maxdepth-d));
1153 333c1dcc 2004-03-13 devnull if(ix->maxdepth == 32){
1154 333c1dcc 2004-03-13 devnull key0 = buck0;
1155 333c1dcc 2004-03-13 devnull key1 = buck1;
1156 333c1dcc 2004-03-13 devnull }else{
1157 333c1dcc 2004-03-13 devnull key0 = buck0 << (32-ix->maxdepth);
1158 333c1dcc 2004-03-13 devnull key1 = buck1 << (32-ix->maxdepth);
1159 333c1dcc 2004-03-13 devnull }
1160 333c1dcc 2004-03-13 devnull buck0 += ix->bitblocks;
1161 333c1dcc 2004-03-13 devnull buck1 += ix->bitblocks;
1162 333c1dcc 2004-03-13 devnull USED(buck0);
1163 333c1dcc 2004-03-13 devnull USED(key1);
1164 333c1dcc 2004-03-13 devnull
1165 333c1dcc 2004-03-13 devnull if(d == 32)
1166 333c1dcc 2004-03-13 devnull dmask = TWID32;
1167 333c1dcc 2004-03-13 devnull else
1168 333c1dcc 2004-03-13 devnull dmask = TWID32 ^ ((1<<(32-d))-1);
1169 333c1dcc 2004-03-13 devnull
1170 333c1dcc 2004-03-13 devnull /*
1171 333c1dcc 2004-03-13 devnull * Since we hold the lock for b, the bitmap
1172 333c1dcc 2004-03-13 devnull * thinks buck1 doesn't exist, and the bit
1173 333c1dcc 2004-03-13 devnull * for buck1 can't be updated without first
1174 333c1dcc 2004-03-13 devnull * locking and splitting b, it's safe to try to
1175 333c1dcc 2004-03-13 devnull * acquire the lock on buck1 without dropping b.
1176 333c1dcc 2004-03-13 devnull * No one else will go after it too.
1177 333c1dcc 2004-03-13 devnull *
1178 333c1dcc 2004-03-13 devnull * Also, none of the rest of the code ever locks
1179 333c1dcc 2004-03-13 devnull * more than one block at a time, so it's okay if
1180 333c1dcc 2004-03-13 devnull * we do.
1181 333c1dcc 2004-03-13 devnull */
1182 333c1dcc 2004-03-13 devnull if((b1 = loadibucket0(ix, buck1, nil, nil, &ib1, 0)) == nil){
1183 333c1dcc 2004-03-13 devnull putdblock(b);
1184 333c1dcc 2004-03-13 devnull return -1;
1185 333c1dcc 2004-03-13 devnull }
1186 333c1dcc 2004-03-13 devnull b0 = b;
1187 333c1dcc 2004-03-13 devnull ib0 = *ib;
1188 333c1dcc 2004-03-13 devnull
1189 333c1dcc 2004-03-13 devnull /*
1190 333c1dcc 2004-03-13 devnull * Funny locking going on here -- see dirtydblock.
1191 333c1dcc 2004-03-13 devnull * We must putdblock(b1) before putdblock(b0).
1192 333c1dcc 2004-03-13 devnull */
1193 333c1dcc 2004-03-13 devnull dirtydblock(b0, DirtyIndex);
1194 333c1dcc 2004-03-13 devnull dirtydblock(b1, DirtyIndexSplit);
1195 333c1dcc 2004-03-13 devnull
1196 333c1dcc 2004-03-13 devnull /*
1197 333c1dcc 2004-03-13 devnull * Split the block contents.
1198 333c1dcc 2004-03-13 devnull * The block is already sorted, so it's pretty easy:
1199 333c1dcc 2004-03-13 devnull * the first part stays, the second part goes to b1.
1200 333c1dcc 2004-03-13 devnull */
1201 333c1dcc 2004-03-13 devnull ib0.n = 0;
1202 333c1dcc 2004-03-13 devnull ib0.depth = d;
1203 333c1dcc 2004-03-13 devnull ib1.n = 0;
1204 333c1dcc 2004-03-13 devnull ib1.depth = d;
1205 333c1dcc 2004-03-13 devnull for(i=0; i<ib->n; i++){
1206 333c1dcc 2004-03-13 devnull score = ib->data+i*IEntrySize;
1207 333c1dcc 2004-03-13 devnull key = hashbits(score, 32);
1208 333c1dcc 2004-03-13 devnull if((key&dmask) != key0)
1209 333c1dcc 2004-03-13 devnull break;
1210 333c1dcc 2004-03-13 devnull }
1211 333c1dcc 2004-03-13 devnull ib0.n = i;
1212 333c1dcc 2004-03-13 devnull ib1.n = ib->n - ib0.n;
1213 333c1dcc 2004-03-13 devnull memmove(ib1.data, ib0.data+ib0.n*IEntrySize, ib1.n*IEntrySize);
1214 333c1dcc 2004-03-13 devnull if(0) fprint(2, "splitiblock %d in %d/%0*ub => %d in %d/%0*ub + %d in %d/%0*ub\n",
1215 333c1dcc 2004-03-13 devnull ib->n, d-1, d-1, key0>>(32-d+1),
1216 333c1dcc 2004-03-13 devnull ib0.n, d, d, key0>>(32-d),
1217 333c1dcc 2004-03-13 devnull ib1.n, d, d, key1>>(32-d));
1218 333c1dcc 2004-03-13 devnull
1219 333c1dcc 2004-03-13 devnull packibucket(&ib0, b0->data);
1220 333c1dcc 2004-03-13 devnull packibucket(&ib1, b1->data);
1221 333c1dcc 2004-03-13 devnull
1222 333c1dcc 2004-03-13 devnull /* order matters! see comment above. */
1223 333c1dcc 2004-03-13 devnull putdblock(b1);
1224 333c1dcc 2004-03-13 devnull putdblock(b0);
1225 333c1dcc 2004-03-13 devnull
1226 333c1dcc 2004-03-13 devnull /*
1227 333c1dcc 2004-03-13 devnull * let the recovery code take care of updating the bitmap.
1228 333c1dcc 2004-03-13 devnull */
1229 333c1dcc 2004-03-13 devnull
1230 333c1dcc 2004-03-13 devnull qlock(&stats.lock);
1231 333c1dcc 2004-03-13 devnull stats.indexsplits++;
1232 333c1dcc 2004-03-13 devnull qunlock(&stats.lock);
1233 333c1dcc 2004-03-13 devnull
1234 333c1dcc 2004-03-13 devnull return 0;
1235 333c1dcc 2004-03-13 devnull }
1236 333c1dcc 2004-03-13 devnull
1237 9ffbb5ad 2004-03-12 devnull int
1238 9ffbb5ad 2004-03-12 devnull indexsect(Index *ix, u8int *score)
1239 9ffbb5ad 2004-03-12 devnull {
1240 9ffbb5ad 2004-03-12 devnull if(ix->version == IndexVersion1)
1241 9ffbb5ad 2004-03-12 devnull return indexsect1(ix, score);
1242 9ffbb5ad 2004-03-12 devnull return indexsect2(ix, score);
1243 9ffbb5ad 2004-03-12 devnull }
1244 9ffbb5ad 2004-03-12 devnull
1245 9ffbb5ad 2004-03-12 devnull DBlock*
1246 9ffbb5ad 2004-03-12 devnull loadibucket(Index *ix, u8int *score, ISect **pis, u32int *pbuck, IBucket *ib)
1247 9ffbb5ad 2004-03-12 devnull {
1248 9ffbb5ad 2004-03-12 devnull if(ix->version == IndexVersion1)
1249 9ffbb5ad 2004-03-12 devnull return loadibucket1(ix, score, pis, pbuck, ib);
1250 9ffbb5ad 2004-03-12 devnull return loadibucket2(ix, score, pis, pbuck, ib);
1251 9ffbb5ad 2004-03-12 devnull }
1252 9ffbb5ad 2004-03-12 devnull
1253 9ffbb5ad 2004-03-12 devnull