Hypersuccinct Trees – New universal tree source codes for optimal compressed tree data structures and range minima

Apr 2021 (Written: Feb 2021)

J. Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner, Sebastian Wild:

European Symposium on Algorithms (ESA) 2021

In this paper, we continue the study of distribution-succinct data structures, i.e., data structures that have (asymptotically) optimal expected space usage (as opposed to only optimal worst case) for a certain data-structuring task, and give a single tree data structure that serves as a universal source code for binary and ordinal trees.

Illustration of hypersuccinct trees: tree covering + Huffman code — Hypersuccinct trees use a tree-covering algorithm to partition the tree into *micro trees* and store their types with a Huffman code.

My video talk about hypersuccinct trees (long version of the talk for ESA). Key innovation and philosophy are covered in the first 2.5 minutes.

Entropy trees were a precursor to this work, in particular we therein introduced the notion of hypersuccinct trees (a tree-covering data structure with a Huffman code for micro-tree types) and noted that it achieves the optimal expected space of $1.736n + o(n)$ bits to represent a random binary search tree or (equivalently) a range-min data structures for a random permutation.

Here, we show that this same data structure actually achieves optimal compression up to lower order terms for a whole range of tree sources, and in particular all of following distributions of tree shapes:

binary search trees (BSTs) built from random insertions,
Cartesian trees of random arrays,
random fringe-balanced BSTs,
binary trees with a given number of binary/unary/leaf nodes,
random binary tries generated from memoryless sources,
(uniformly chosen) full binary trees (a.k.a. extended binary trees),
(uniformly chosen) unary paths,
(uniformly chosen) weight-balanced BSTs,
(uniformly chosen) AVL trees, and
(uniformly chosen) left-leaning red-black trees.

Hypersuccinct trees combine the compression achieved by the best known universal codes for binary trees with the operation support of the best succinct tree data structures, and they do so “automatically” (without tailoring to the tree family at hand).