CS321 Lecture: B-Trees Last revised 11/23/99
Introduction
------------
A. We have been devoting considerable attention recently to the subject
of search structures: Data structures that can be used to store and
retrieve information associated with a certain key. The operations
on such a structure can be pictured as follows:
Insertion:
__________________
Key, value | Search |
----------> | structure |
| (key,value pairs)|
|__________________|
Lookup:
___________
Key | Search | Value
----------> | structure | ------->
|___________|
Deletion: ____________________________
| Search |
----------> | structure |
| (key and its value removed)|
|____________________________|
B. Let's briefly review the structures we have considered thus far:
Structure Insert Lookup Delete
"Pile" O(1) O(n) O(1) if we know where victim is
Ordered array O(n) O(logn) O(n)
Linked list O(1) if we know O(n) O(1) if we know where victim is
where it goes
Binary search
tree O(logn)-O(n) O(logn)-O(n) O(logn)-O(n)
AVL, 234 trees O(logn) O(logn) O(logn)
Hash table O(1)-O(n) O(1)-O(n) O(1) if we know where victim is
C. All of these structures have one thing in common: they were designed for
use with search tables stored in main memory, where access to any item
is equally fast (true random access). However, it is often the case
that we must build search structures on disk rather than in primary
memory, for two reasons:
1. Size. Large structures cannot be kept in their entirety in Mp.
2. Permanency. Structures in Mp are volatile and need to be created
(or read in from disk whenever a program using them is run)
In particular, when we considered indexed files in CS122, we saw that
the indexed file organization requires an index structure on the primary
key and each of the secondary keys. These are, in fact, search structures
in which the values in the key value pairs are records in the file (or
pointers to them.)
D. When we build structures on disk, we must deal with certain realities of
access and transfer time:
1. Random access to disk typically requires on the order of 10-20 ms
access time to position the head and wait for data to come up under
it.
2. Once the head position is right, data can be transferred at rates in
excess of 1 million bytes/sec.
3. Observe, then, how total transfer times behave for different size
blocks (assuming a fairly fast 10 ms access time, and 1 megabyte/sec
transfer rate)
Size of block Access time Transfer time Total time
1 byte 10 ms 1 us 10.001 ms
10 bytes 10 ms 10 us 10.01 ms
100 bytes 10 ms 100 us 10.1 ms
1000 bytes 10 ms 1 ms 11 ms
10000 bytes 10 ms 10 ms 20 ms
Clearly, then, small transfers have very high overheads for access
time, and one would prefer to organize a search structure in such a
way as to allow fairly large transfers to/from disk (1000 bytes +)
4. Most of the structures we have considered do not lend themselves well
to on-disk implementation because they require accesses to too many
different parts of the structure.
a. E.g. binary search tree of 1000 nodes has minimum height 10, so
would require 10 disk accesses / item.
b. The exception is hashing, where we can use a hash function to
select a block potentially containing many items, then search that
using some other means to find the item wanted.
E. Today, we consider a search structure specifically designed for use with
disk files: the B-Tree.
1. A B-Tree is a form of search tree in which breadth is traded for depth.
Each node contains multiple keys (instead of 1 as in a binary search
tree), and so has multi-way branching rather than 2-way branching.
2. The result is a very "bushy" sort of tree, with typical heights in the
range 2-4, thus requiring only 2-4 disk accesses per operation.
I. Definitions
- -----------
A. Preliminary: An m-way search tree.
1. An m-way search tree is a tree of degree m in which:
a. Each node having s children (2 <= s <= m) contains s-1 keys.
Let the children be called t .. t and the keys k .. k
1 s 1 s-1
b. The keys and children obey the following properties:
i. k < k < k ... < k
1 2 3 s-1
ii. All the keys in child t are < k
1 1
iii. All the keys in child t (1<i<s) lie between k and k
i i-1 i
iv. All the keys in child t are > k
s s-1
2. Observe that a binary search tree is simply an m-way search tree
with m = 2.
3. Examples of 4-way search tress:
C F K A
/ / \ \ / \
A B D E HIJ L B
/ | \ / | \ /||\ /\ / \
C
(where the empty children are failure nodes) / \
D
/ \
E
/ \
F
/ \
etc..
a. Clearly, the first is much more desirable than the second!
b. Note that if the first example were implemented as a binary search
tree instead, it would have height 4 instead of 2, a sizable cost
increase at 10ms per disk access. (The savings become even larger
as m increases. For example, the first example could be implemented
as a 12-way search tree with only 1 level.)
4. Observe, then, that a 2-3-4 tree is simply a variant of a 4-way search
tree. In fact, B-Trees generalize some of the ideas we used with
2-3-4 trees, though the standard algorithms for maintaining B-Trees
are slightly different from those used with 2-3-4 trees.
B. Definition of a B-Tree
1. As was true with binary search trees, we recognize that m-way
search trees can be very efficient if well-balanced, but have
undesirable degenerate cases. With binary search trees, we defined a
variants such as AVL trees and Red-Black trees that avoid the
degenerate behavior. We do the same here.
2. A B-Tree of order m is an m-way search tree in which:
a. All the failure nodes are on the same level.
b. Each internal (non-failure) node, except the root, has at least
__ __
| m |
___ children
2
c. The root, if it is not a failure node (meaning the tree is
totally empty), has at least 2 children.
d. Of our two examples, only the first is a B-Tree
3. Examples: which of the following is/are B-Tree(s) of order 5:
E J O E J O E J O E J O
/ / \ \ / / \ \ / / \ \ / / \ \
ABC F KLM PQRS ABF GH KL PQRS AB FI KL PQRS AB HI KL PQRS
//\\ /\ //\\//|\\ //\\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\ /|\ /|\ /|\ //|\\
FG
/|\
no: Node "F" no: the tree is yes no: all the failure
has only 2 not a search tree, nodes are not on the
children since F > E same level
4. Note: because all the failure nodes of a B-Tree are on the same
(bottom) level, we normally do not bother to draw them. Thus,
we will draw the one good tree in the above example as follows from
now on:
E J O
/ / \ \
AB FI KL PQ
C. Some properties of a B-Tree
1. What is the MAXIMUM number of KEYS in a B-Tree of order m of
height h?
a. In such a tree, each non-failure node would have the maximal
number of children (m), and thus the maximal number of keys (m-1).
Thus, we would have:
1 node m-1 keys at level 1
m nodes m * (m-1) keys at level 2
m**2 nodes m * m * (m-1) keys at level 3
...
m**(h-1) nodes m**(h-1) * (m-1) keys at level h
--------------------
m**h - 1 keys total
b. Compare our result for complete binary trees of height h -
2**h - 1 nodes.
2. What is the MINIMUM number of KEYS in a B-Tree of order m of
height h?
a. In such a tree, the root would have only 2 children (1 key), since
this is the minimum allowed for a root. All other nodes would
have ceil(m/2) children, and ceil(m/2) - 1 keys.
b. For convenience, let c = ceil(m/2).
1 node 1 key at level 1
2 nodes 2 * (c-1) keys at level 2
2*c nodes 2 * c * (c-1) keys at level 3
2*c**2 nodes 2 * c**2 * (c-1) keys at level 4
...
2*c**(h-2) nodes 2 * c**(h-2)*(c-1) keys at level h
-----------
2 * [c**(h-1)-1] + 1 =
2 * c**(h-1) - 1 keys total
3. To determine the height of a B-Tree of order m containing n keys, we
solve each of the above for h, as follows:
a. From the equation for the maximum number of keys, we know:
n <= m**h - 1
or, solving for h:
n+1 <= m**h
log (n+1) <= h
m
Now, since h must be an integer, we can take the ceiling of the
log to obtain:
___________
| log (n+1) | <= h
m
b. From the equation for the minimum number of keys, we know:
n >= 2 * c**(h-1) - 1
or, solving for h
(n + 1)
------- >= c**(h-1)
2
log ((n+1)/2) >= h-1
c
h <= 1 + log ((n+1)/2)
- -
| m |
___
2
Now, since h must be an integer, we can use the floor of the log
to obtain:
h <= 1 + | log ((n+1)/2) | (floor of the log to the base
| _ _ | ceiling m/2 of ((n+1)/2))
| | m | |
| ___ |
| 2 |
|___________________|
c. Combining the above results from minimal and maximal trees, we
obtain the following bounds for h:
ceil(log (n+1)) <= h <= 1 + floor(log (n+1)/2)
m ceil(m/2)
4. Some examples:
a. 1 million keys - B-Tree of order 200: height is 3
- Lower bound is ceil(log 1,000,001) = 3
200
(Note that a maximal tree of height 2, order 200, contains
39,999 keys - so tree must have height at least 3)
- Upper bound is 1+ floor(log 500,001) = 3
100
(Note that a minimal tree of height 4, order 200, contains
1,999,999 keys, so tree must have height no greater than 3)
a. 2 million keys - B-Tree of order 200:
- Lower bound is still 3
- Upper bound is now 4
so the height could be 3 or 4.
D. An important note: In our discussion, we have talked only about nodes
containing KEYS. In practice, we build search structures to allow us
to associate keys with VALUES (e.g. name = key; phone number and address
= value). In the form of B-Tree we are discussing, then, a node actually
contains s pointers to children, s-1 keys, and s-1 values. These can be
stored in one of two ways:
1. The actual value can be stored in the node. This, however, can reduce
m, and thus the branching factor of the tree, if the size of the value
is large compared to that of the key (as it often is).
Example: node size 8000, key length 12, pointer size 4 bytes, allows
m = 500 if we don't store any value with the key
if we also have to store a value of size 36, however, we
would reduce m to 125
2. The node can contain a pointer to another disk block that stores the
actual value. (This additional pointer adds minimally to the size of
the node.) However, this means that successful searches require an
additional access to get the data.
3. Indexed files often use B-Trees, or one of the variants we will discuss
later. The index may be a B-Tree containing pointers to another area
where the records proper are found; or the whole file may be a B-Tree.
II. Operations on B-Trees:
-- ---------- -- -------
A. For our examples, we will use the following B-Tree of order 3 (sometimes
called a 2-3 tree, since each node has 2 or 3 children and 1 or 2 keys).
We use order 3 to keep the size of the examples down:
J T
C F M P Y
AB DE GH KL NO R UW Z
B. Locate a given key k in a tree rooted at t:
Locate(k: key; t: BTree);
if t = nil then
search fails
else
begin
i := 1;
while (i < Number of children of t) and (key[i] < k) do
i := i + 1;
if (i < Number of children of t) and (key[i] = k) then
return associated information
else
locate(k, child[i])
end
Example: Locate J succeeds immediately at the root
Locate L: at the root, we end the while loop with i = 2,
since key[2] = 'T', so we go to the second child of
the root
in that child, the while loop exits with i = 1,
(since key[1] = 'M' >'L'), so we go to the first
child.
In that node, we find what we are looking for.
Locate Z: at the root, we end the while loop with i = 3,
since i = Number of children, so we go to the third
child of the root
in that child, the while loop exits with i = 2, for
the same reason, so we go to the second child.
In that node, we find Z
Locate X: at the root, we end the while loop with i = 3
In the third child, the while loop exits with i = 1,
since key[1] = 'Y' > 'X', so we go to the first
child.
In that node, we exit the while loop with i = 3.
Since the third child (and all children, in fact) of
this node is a failure node, the search fails.
C. Inserting a new node. (Assume that we disallow duplicate keys).
1. We first proceed as in locate, until we get to a leaf - that is,
a node whose children are empty. (Of course, if we find the key we
are looking for along the way, we declare an error and quit.)
2. If the leaf node we have arrived at contains less than the maximum
number of keys, then we simply insert the new key at the appropriate
point, and add an extra "nil" child pointer.
Example: Insert S: We work our way down to the node containing
R. Since it contains only one key, and can
hold two, we add S, and our tree becomes:
J T
C F M P Y
AB DE GH KL NO RS UW Z
Note that inserting a new key in a leaf may require moving other
keys over.
Example: Insert Q in the original tree. Result:
J T
C F M P Y
AB DE GH KL NO QR UW Z
^
|__ R has been moved over one place
3. Life becomes more interesting if the leaf we reach is already
full. (E.g. consider trying to insert "X" in the above.) In this
case, we cannot add a new node on a lower level, since this would
violate one of the B-Tree constraints. Instead, we proceed as
follows:
a. Allocate a new node from a free list, or extend the file by one
node.
b. Redistribute the keys in the original node, plus the new key,
so that:
- The first half remain in the original node
- The middle key in the order of keys is held out for a use to
be explained shortly.
- The second half go into the new node.
Note:
The key we were inserting can go into either of the nodes, or
it might be the middle key. (e.g. if we are inserting X, it
will go into the new node; but if we are inserting V into
the same node, it would be the middle key.)
c. Insert the middle key we saved out, plus a pointer to the newly
created node, into the parent at an appropriate point, just after
the pointer that we followed to go down to the node we split. Of
course, this means we move keys and pointers into the parent to
make room.
Example: insert X into the original tree
__________ this key was promoted into parent
J T |
v
C F M P WY
AB DE GH KL NO R U X Z
^ ^
| |___ new node now contains X
|______ original node now contains only U
4. Observe that this strategy guarantees that the resulting tree will
still meet all the tests for a B-Tree.
a. Clearly, all the leaves are still on the same level.
b. What about the number of children of the new nodes?
- If we are forced to split a node, it is because it contained
the maximum number of keys before insertion - m-1. With the
new key, this gives m keys, to be divided two ways plus the
promoted key. This leaves m-1 keys to be divided.
- If m is odd, then each node gets (m-1)/2 keys, and has
(m+1)/2 children, which is exactly ceil(m/2), as required.
- If m is even, then one node gets (m-2)/2 keys, and the other
gets m/2 keys. The smaller node then has (m-2)/2 + 1 children =
m/2 children, which is exactly ceil(m/2), as required. (The
larger node has more than the minimum, which is fine.)
5. Now what if there is no room for the promoted key in the parent?
(Example: insert I into the original tree. Node GH splits, with
H to be promoted to node CF. But this node has no room for another
key and child.)
a. Solution: split the parent as before, creating a new parent to
hold half of the keys and pointers to half of the children.
Again, promote one key and a pointer to the new node up one
level.
b. Note that, if carried to its ultimate, this can result in splitting
the root, which is how the tree gains height. At this point, the
single middle key resulting from the splitting of the root becomes
the one key in the new root. (This is why we allow the root of a
B-Tree to have as few as 2 children.)
Example: insert I:
J
/ \
F T
/ \ / \
C H M P Y
/ \ / \ / | \ / \
AB DE G I KL NO R UW Z
6. You will note that the approach we have taken to splitting nodes in
a B-Tree is somewhat different from the one we used with 2-3-4 trees.
Here, we have postponed splitting a node until absolutely necessary.
The price is more complex code; but given the time required for disk
accesses this policy makes some sense, since it postpones height
increases until the absolute last moment possible. (One could use the
2-3-4 tree approach of anticipating the need for splits on the way down
the tree if one wanted to, however.)
D. Deletion from a B-Tree
1. As we have seen in other kinds of trees, deleting a key from a
leaf will be much simpler than deleting a key from an interior node.
As before, then, we use the trick of converting a deletion from an
interior node into a deletion from a leaf by promoting a key from
a leaf - typically the first key in the leftmost subtree of the
child just after the key.
Example: to delete J from the root of our original tree, we would
promote K to take its place, and delete K from the leaf.
2. Deleting a key from a leaf is basically trivial - we simply slide
other keys over as necessary to fill in the gap.
Example: Delete N from our original tree:
J T
C F M P Y
AB DE GH KL O R UW Z
3. However, we can run into a problem if the leaf we are deleting from
already contains the minimal number of keys.
Example: Delete R from our original tree.
4. In this case, we essentially reverse the process we used to deal with
an over-full node on insert.
a. We find one of the brothers of the node from which we are deleting
(we can use either side.)
b. We rearrange keys between the node we are working on and the brother
so as to give each the minimal number, if possible. This will mean
changing the divider key between them in the parent.
Example: When deleting R from the original tree, we can combine R's
node with NO and rearrange as follows:
J T
C F M O Y
AB DE GH KL N P UW Z
c. If, as a result, we do not have enough keys to make two legal
nodes (i.e. if the brother we are using also contains the minimal
number of keys), then we combine the two nodes into one, also
removing a key and child pointer from the parent.
Example: working with the above (not the original tree), we can now
try to delete P. Since the only node we can combine with
is N, and it has the minimal number of keys already, we must
pull O down from the parent and combine everything into one
node, recycling the other:
J T
C F M Y
AB DE GH KL NO UW Z
d. Of course, removing a key from the parent may get us into trouble
as well. (e.g. suppose that, in succession, we removed L, N, and
then O from the above). In this case, the parent may have to
"borrow" keys and children from a brother. In an extreme case,
we may even have to merge the parent with its brother, and could
ultimately even reduce the height of the tree if we had to merge
two children of the root.
Example: Recall the tree we got by splitting the root:
J
/ \
F T
/ \ / \
C H M P Y
/ \ / \ / | \ / \
AB DE G I KL NO R UW Z
Suppose we now try to delete Z:
- We have to merge UW with the now vacated node.
J
/ \
F T
/ \ / \
C H M P W
/ \ / \ / | \ / \
AB DE G I KL NO R U Y
Suppose we now delete Y: We must merge with U, but this pulls W
out of the parent, leaving it with too few keys. (Zero, in this
case; but in a higher degree tree we're in trouble when the
number of keys drops below ceil(m/2) - 1.
- We therefore rearrange keys and children with MP:
J
/ \
F T
/ \ / \
C H M P
/ \ / \ / \ / \
AB DE G I KL NO R UW
III. Variants of the B-Tree
--- -------- -- --- ------
A. We have seen that a B-Tree of order m containing n keys has height
ceil(log (n+1)) <= h <= 1 + floor(log ((n+1)/2) )
m ceil(m/2)
1. We have used this formula to calculate h given n and m. However, it
can also be used during design of the tree to calculate the m value
needed to achieve a given target value of h for some n. Let m be
the smallest m that does the job. Then we have min
h <= 1 + floor(log ((n+1)/2))
desired ceil(m / 2)
min
Since the log shrinks as m grows, at m the log will be just barely
min
less than the next largest integer, so we can approximate this by:
h = log ((n+1)/2)
desired (ceil(m / 2)
min
ceil(m / 2) ~ ((n+1)/2)**(1/h )
min desired
m = 2 * ((n+1)/2) ** (1/h) - 1
min
2. In practice, one can rarely achieve a height of one (since that would
mean the entire "tree" is a single node), and even a height of 2 is
seldom obtainable for applications of any size. However, keeping the
height of the tree to 3 is a reasonable goal for most applications.
3. Example - suppose we desire to limit h to 3. We require
m = ceil (2 * cuberoot((n+1)/2)) - 1
min
Then we obtain the following lower bounds for m for different values
of n:
n Minimum m
1000 15
10,000 34
100,000 73
1,000,000 158
10,000,000 341
100,000,000 736
Clearly, m grows rather slowly with n!
B. Nonetheless, considerations of node size may make it difficult to achieve
the necessary branching factor. Recall that, in a B-Tree, each node
contains up to m pointers, m-1 keys, AND M-1 VALUES ASSOCIATED WITH THE
M-1 KEYS.
1. Suppose we need an m-value of 20 for some application in which the
keys are 10 bytes long and the associated values are 100 bytes long.
If a pointer is 4 bytes, then the minimum node size is
4*20 + 19*(10+100) = 2170 bytes (over 4 disk blocks if blocks are 512
bytes each)
2. For efficient performance, we must guarantee that each node can be
read or written with a single disk access - which requires that the
entire node reside in contiguous blocks on disk. Further, we must
have an internal buffer big enough to store the node when it is
read or written. Even with a node size of 4+ blocks, the former
condition may be difficult to achieve on a disk with a smaller
cluster size.
C. Two techniques can be used to hold the node size down while still
achieving a desired branching factor. These techniques lead to two
variants of the B-Tree.
D. A B* Tree of order m is an m-way search tree in which each node (save
the root) has a minimum of (ceil (2m-1)/3) children and a maximum of m
children.
1. Nodes in a B* Tree of order m have the same size as those in a
B Tree of order m, but their minimum branching factor is greater.
2. This is achieved by using the following strategy on insertion of
a new key:
a. If the leaf in which the key belongs has room for the new key,
then it is put there (as with the basic B-Tree.)
b. However, if the leaf is full, then instead of splitting the
leaf we choose one of its brothers and attempt to redistribute
keys between the two leaves. (This is sort of the reverse of what
we did when deleting a key from a B-Tree.)
Example: A B* Tree of order 5 has 4-5 children (3-4 keys) for each
node. Consider the following example of such a tree:
E K P V
/ | | | \
ABCD FGHI LMN RSTU XYZ
If we go to insert Q, we find that leaf RSTU is full. Instead
of splitting it (which would force a split of the root and a new
level in the tree), we combine RSTU with one of its brothers - say
LMN - and rearrange keys between them and the divider in the
parent to get
E K Q V
/ | | | \
ABCD FGHI LMNP RSTU XYZ
c. If the chosen brother is also full, then we combine the keys from
the two nodes and split the result to give three nodes. This
preserves the ceil((2m-1)/3) branching factor, as follows:
- The two full nodes each had m-1 keys. Together with the
divider key between them and the new key, we have 2m keys
to work with.
- We need two keys to separate 3 nodes, so we are left with
2m-2 keys to divide between the 3 new nodes. The following
shows how these might be distributed, depending of m mod 3:
m mod 3 Distribution of keys
0 (2m-3)/3, (2m-3)/3, 2m/3
1 (2m-2)/3, (2m-2)/3, (2m-2)/3
2 (2m-4)/3, (2m-1)/3, (2m-1)/3
Since each node has one more child than keys, this leads to
the following distribution of children
m mod 3 Distribution of children
0 (2m)/3, (2m)/3, (2m+3)/3
1 (2m+1)/3, (2m+1)/3, (2m+1)/3
2 (2m-1)/3, (2m+2)/3, (2m+2)/3
Since all these values are integers, it can be seen that, in
each case, the node with the smallest number of children gets at
least ceil((2m-1)/3) children, as required.
3. To see the advantage of the B* Tree, consider the following table
showing the minimum number of keys in a B Tree and a B* Tree of
height 3 for different values of m
m minimal B Tree height 3 mimimal B* Tree height 3
5 17 17
10 49 97
20 199 337
50 1249 2177
100 4999 8977
200 19,999 35,377
(The advantage is even greater for higher trees)
E. Another way to get a high branching factor while keeping node size down
is the use of B+ Trees.
1. We saw that the total size of a node is generally the limiting factor
in terms of the value of "m" that can be used for a B-Tree. Here, of
course, the main villain is usually the value that is stored in the
node along with the key. If the value is - say - 10 times bigger than
the key, then its presence in the node reduces the potential branching
factor by a ratio of almost 10:1!
2. One way to address this would be to not store the values in the tree
at all.
a. Rather, each node would contain up to m child pointers, m-1 keys
and m-1 POINTERS TO VALUES STORED ELSEWHERE.
b. The difficulty with this scheme, though, is that once the tree has
been searched to find the desired key, an additional disk access is
needed to find the data. The effect on performance is the same as if
the height of the tree were increased by one, so this may undo a the
gain obtained by using the higher branching factor.
3. A B+ tree addresses this problem as follows:
a. Values are only stored in the lowest level of the tree. Nodes at
higher levels contain keys, but not values.
b. This means that the branching factor in the upper levels is much
greater than the branching factor at the lowest level (where the
children are failure nodes.)
Example: assume nodes are 512 bytes, keys are 10 bytes, values are 90
bytes, and pointers are 4 bytes.
Each node in the lowest level of a B+ tree could store up to 5
key-value pairs, with 12 bytes to spare. (No pointers need
be stored, because the 6 children are all failure nodes.)
Each node at upper levels would have branching factor 37.
It would store up to 36 key-pointer pairs, plus one extra
pointer, with 4 bytes to spare.
We assume that we can distinguish between a leaf and a
non-leaf node in some way during our processing - perhaps by
keeping track of the height of the tree or by tagging the
node itself or the pointer to it in some special way.
c. Of course, this means that all keys must occur at the lowest level
of the tree, so that a value can be stored with them. The keys in
the upper levels, then, are copies of keys stored lower down; some
keys are stored twice. In particular, each upper level key is a copy
of the greatest key in the subtree to its left.
Example: given the above scenario, assume that we have a B-Tree
that holds the 26 letters of the alphabet as keys.
Since the maximum branching factor of a leaf is 6, the
minimum branching factor would be 3, and each leaf would
hold 2-5 keys. Thus, we would have 6-13 leaves, which could
easily be accomodated as children of a single root node.
Thus, our tree might look like this:
------------------------------------
B E I K N Q S V
-------------------------------------
/ | | | | | | | \
AB CDE FGHI JK LMN OPQ RS TUV WXYZ
Note that the separator keys in the root are copies of the
last key in each leaf to the left of the separator. We
could also have chosen to store the first key in the leaf
to the right.
d. For this particular set of characteristics, we might contrast:
i. This B+ tree of height 2, with plenty of room to grow without
gaining height.
ii. An ordinary B-Tree of order 6 (where all levels hold values) -
which would be of height 3 for this configuration.
iii. An ordinary B-Tree where all levels hold keys and POINTERS to
values. Here, since a key plus two pointers requires 18 bytes,
we would have a maximum degree of 29 (28 key-value-pointer + 1
extra pointer). This could actually be done with height 1 for
this data, but would grow to height 2 after just a few more
insertions. Note, though, that since getting to the data requires
one more disk access it would behave like a tree of height 2 or
3 as the case may be.
e. Just to illustrate the concept of a B+ tree further, we consider the
what might happen with different assumptions about how many keys would
fit in a non-leaf node, so that we end up with a three-level tree,
like the following:
-----
| K |
-----
/ \
--------------- ------------------
| B E I | | N Q S V |
--------------- ------------------
/ | | | | | | | \
AB CDE FGHI JK LMN OPQ RS TUV WXYZ
IV. Implementing Various File Organizations with B-Trees
-- ------------ ------- ---- ------------- ---- -------
A. Variable-length records pose a special problem when storing files that
must support direct access to individual records. However B-Trees -
especially the B+ variant - can easily accomodate variable length records.
Instead of containing a fixed number of key-value pairs, each leaf would
hold a number of such pairs dependent on the size of each.
B. B-Trees are the most widely used structure for implementing indexed
files. Most commonly, the file is structured as a B+ tree on the basis
of the primary key.
C. Secondary indexes can be implemented by using additional B-Trees - one
per additional index.
1. These can share the same file as the primary tree; the file header
block simply will contain a pointer to the root of the primary tree
and each secondary tree.
2. One small question is how the secondary trees refer to the actual
data records in the primary tree, since they should contain only keys
and pointers to values to avoid duplicating whole records.
a. They might contain pointers to the block and offset in block of
the data. However, this would mean that splittings and
recombinations in the primary tree during insert/delete would
have to trigger updates in the secondary tree - not only for the
entries pertaining to the item inserted/deleted, but also for
others moved by a block split or recombination.
b. Or, the "pointer" might be a copy of the primary key of the
record - so a secondary key lookup would be followed by a lookup
of the primary key in the primary tree. This, however, slows
lookups.
c. Or, it might be expedient to use an ordinary B-Tree for the primary
tree, with pointers to values kept in an altogether separate area
NOT affected by splits and recombinations. (Note, then, that the
presence of secondary indexes might alter the decision as to the
optimal structure for the primary index.)
3. Another issue is the handling of duplicate keys. Recall that, in
most cases, we require the primary key for an indexed file to be
unique; but the secondary keys may allow several records to have the
same value. (Example: student file - primary key = ID, secondary
key = last name.) How shall we handle the duplicate keys in the
secondary index?
a. Multiple entries in the index for the same key represents one
possibility, but uses extra space to store the key more than
once.
b. We can let the index point to the FIRST record in the file with
the specified key value, and let each record in the file include
a link pointing to the next record with the same secondary key
value. This provides fast retrieval of subsequent records with a
given key after finding the first, but means that each record must
contain a link field for each secondary key. (This link is stored
with the record, but not returned as part of it when the record is
accessed.)
c. We can make use of variable-length records in the index - each
secondary key index entry consists of a key value and 1 or more
pointers to records having that key value in them.
D. Finally, B-Trees can be used to implement relative files! Here, the
relative record number serves as the key on which the tree is
organized. This has two advantages.
1. No space need actually be allocated for nonexistent records. If
the search ends in a leaf that does not contain an entry for a
desired record number, then we conclude the record does not exist.
2. We can accomodate variable length records.
However, finding a record by record number now requires multiple
disk accesses, whereas before one access would do, so this may not
be a good idea unless we need the space flexibility.
Copyright ©1999 - Russell C. Bjork