Adaptive sorting for large keys, strings, and database rows

Mar 2025 (Written: Oct 2024)

Marius Kuhrt, Bernhard Seeger, Sebastian Wild, and Goetz Graefe:

Conference on Database Systems for Business, Technology and Web

In the context of a (row-oriented) database system, sorting the rows of a table is a fundamental part of query processing and index creation. In this paper, we present a new adaptive sorting algorithm that is particularly well-suited for sorting large keys, strings, and database rows, based on Powersort.

Since comparisons are often very expensive – e.g., when sorting wrt. several columns, each potentially containing data with custom comparison functions such as international strings – a classic optimization is to first generate normalized keys, long binary strings that encode the sort order of the rows. Sorting rows is thus akin to sorting strings, and we can use some the same techniques, in particular remembering longest common prefixes of rows (or indeed, offset-value codes, which additionally contain the next few characters after the LCP) to avoids redundant character comparisons.

In this paper, we combine these techniques from string sorting with the run-adaptive Powersort, which avoids entire string comparisons when the input is partially sorted. We observe that the savings from both approaches are indeed orthogonal and nicely combine into overall performance improvements.