## § Cache oblivious B trees

Central idea: assume a memory model where computation is free, only cost is pulling data from cache into memory. Cache has total size $M$, can hold blocks of size $B$. So it can hold $M/B$ blocks of main memory. Memory memory has infinite size. Cost is number of transfers. We assume that the algorithm does not know M or B. We assume that the cache replacement strategy is optimal (kick out block that is going to be used farthest in the future). This is an OK assumption to make since an LRU cache using twice the memory of a "oracular" cache performs equally well (citation?) These data structures are cool since they essentially "Adapt" to varying cache hierarchies and even multiple level cache hierarchies. We study how to build cache-oblivious B-trees.

#### § Building optimal cache-oblivious B trees to solve search

• We use a balanced BST. We want to find an order to store nodes in memorysuch that when we search for an element, we minimize number of blockswe need to pull in.
• All standard orders such as level order, pre-order, post-order fail.
• Corrrect order is "VEB (Van Em De Boas) order": carve a tree at the middlelevel of its edges. Layout a "triangle" or smaller collectionof nodes linearly. Then Recursively layout the trees, linearly in memory.
• Supposedly if the number of nodes is $N$, we wil have roughly $\sqrt(N)$nodes on the top, and then $\sqrt(N)$ triangles at the bottom.

#### § Analysis Claim: we need to pull $O(\log_B N)$ blocks for any $B$ for any search query

$N$ is the number of nodes in the BST. Note that in the analysis, we know what B is, even though the algorithm does not.
• We look at a particuar level of recursion. We will call it a "level of detail"straddling B.
• We will have large triangles of size $\geq B$, inside which there are smallertriangles of size $\leq B$ (reminds me of sierpinski).
• We know that the algorithm recursively lays it out, and triangle storeseverything "inside" it in a contiguous region. So we stop at therequisite size where we know that the tree's triangles themselvescontain triangles which fit into the block size.
• A little triangle of size less than B can live in at most two memory blocksby straddling a block boundary: by eg. having $(B-1)$ bits in one blockand a single bit in another block.
1 2 3 4 5 6 7 8 <- index
|     |       | <- boundary
|-xxxxxxx-----| <-  data
• The question is that on a root-to-leaf bpath, how many such triangles dowe need to visit. Since we repeatedly divide the nodes in half with respect to height until the little triangle has number of nodes lessthan $B$, the height is going to be $O(\log B)$ since it's still a binary tree.
• total height in $O(\log N)$.
• so height of "chunked tree" where we view each triangle as a single nodeis $\log N / \log B = \log_B n$.
• insight: ou data structure construction in some sense permits us to"binary search on $B$" since we divide the data structure into levelsbased on $B$. if $B = N$, then the full data structure fits into memoryand we're good.

#### § Black box: ordered file maintainince

We need a black box: ordered file maintainance (linked list for arrays)
• Store $n$ elements in specified order in an array of linear size $O(N)$.Array permits gaps.
• updates: delete element, insert elements between 2 elements.
• cannot do this in linear time, but we can move elements in an interval ofsize $\log^2(N)$ amortized.
• We need $O(1)$ scans for the data structure.

#### § Next: dynamic BST (inserts and deletes): layout

we take a VEB static tree on top of an ordered file. Tree is a segtree that has max of nodes. Leaves are the members of the ordered file.

• search for node.
• update ordered file.
• propogate updates into the tree. This will have to be done in post-orderbecause we need the leaves to be fixed before we can update the parentmax.

• look at level of detail that straddles $B$.
• Note that when we perform post-order inside a triangle that has 3 trianglesof size $\leq B$, we need to alternate between parent triangle and child triangle.Since the parent triangle is of size $\leq B$ and can therefore takeat most $2B$ blocks of memory, similarly the child can take at most $2B$blocks of memory.
• So if our cache can hold $4$ blocks of memory, we're done.We won't need to kick anything out when performing the post-ordertraversal.