All Publications

Rooting Out Entropy: Optimal Tree Extraction for Ultra-Succinct Graphs

Mar 2026 (Written: Mar 2026)

Ziad Ismaili Alaoui, Tamio-Vesa Nakajima, Namrata, and Sebastian Wild:

arXiv preprint

| read herePDFarXiv |

Encoding a graph (as adjacency lists or similar) requires choosing some names/indices for vertices, so we can describe the edges. Storing these for applications we where don’t care about vertex labels (for unlabeled graphs), is wasting up to $n\lg n$ bits of space. If we derive some canonical names from the structure of the graph itself, we can potentially avoid that; yet practical representations for general graphs like adjacency lists would not use less space.

We propose the generic framework of tree extraction to obtain canonical vertex names from a spanning tree of the graph and remove $n-1$ edges from the graph, which are stored with 2 bits per edge. Combining this idea with wavelet trees for storing the adjacency list of the remaining graph in entropy compressed form we obtain a practical, generic, compressed graph data structure.

Tree extraction is maximally effective if the resulting graph has low degree entropy. The question which tree to use is the minimum-entropy tree-extraction problem (MINETREX), which we study in this paper.

Results

We show that the MINETREX problem is NP-hard and indeed even hard to approximate with sublinear additive error. At the same time, a simple greedy algorithm yields an approximation with linear additive error (and small constants), thus mostly closing the gap of approximation algorithms for this problem.

Moreover, we show that for a large class of graphs, tree extraction gets close to the optimal space savings of $n \lg n$ bits.

Our results apply (with appropriate modifications) to both directed and undirected graphs.

Relation to other papers

This paper extends the data-structure part of our previous work on preferential-attachment graphs to arbitrary graphs.

Queries on the compressed graph, such as adjacency, degree, and iterating over neighbors all run in logarithmic time. For this, we store the extracted tree using LOUDS and the remaining adjacency list as a wavelet tree.