My current research revolves around two themes:
- Space-efficient data structures, computing over compressed data
Representing data with minimal storage requirements has a long tradition for communication, but in recent years it has gained importance for maximizing the size of data sets that can be kept in fast memory. Unlike for traditional compression, this requires algorithms that can run directly on the compressed representation to make use of the data. I develop such methods for various problems whose space requirements adapt to the data at hand.
- Analysis of algorithms, algorithm science, adaptive algorithms
I work on the design of efficient algorithms and data structures, and their mathematical analysis. By determining exact constant factors and analyzing how their performance depends on characteristics of the input (adaptive analysis), I develop improvements for fundamental building blocks of computer science like sorting, selecting, and searching in dictionaries.
Below is more detailed information for individual projects.
We found a practical drop-in replacement of Timsort’s merge rules that is provably better, can be implemented efficiently, and gives a principled solution of the underlying problem. Powersort has replaced Timsort’s merge policy in the list sorting methods of CPython and PyPy (covering the vast majority of Python installations).
Timsort is a widely used adaptive sorting method that exploits existing runs in the input. A correct, efficient implementation is not trivial as witnessed by the series of bugs in the Python and Java libraries.
Java nowadays uses a dual-pivot Quicksort, the Yaroslavskiy-Bentley-Bloch (YBB) Quicksort. I did the first average-case analysis of this algorithm, and an extension of the analysis of YBB Quicksort became my PhD project.
To see dual-pivot Quicksort in action, check out the animated visualization of the algorithm (thanks to Brad Lyon)!
Apart from the analysis of the new Quicksort variant, I was also able to fill some remaining gaps in the our understanding of standard Quicksort.
- We found the first precise and rigorous average-case analysis of branch mispredictions in Quicksort.
- I found a way to analyze median-of-k Quicksort on equal keys. The extension of Robert Sedgewick’s analysis was open since 1977. Even though my analysis applies only when there are many dulicates present, this is one of the results I am most proud of.
Rectification of Names. Note that I referred to YBB partitioning simply as Yaroslavskiy’s algorithm in my publications finished before 2016, but besides Vladimir Yaroslavskiy, also Jon Bentley and Joshua Bloch were involved in the development of the algorithm early on, so it is more appropriate to call their algorithm YBB Quicksort.
Apportionment and Stick Cutting
With a group of peers I’m exploring models for the evolution of cooperation.