Average Cost of QuickXsort with Pivot Sampling

The purpose of this paper is to simplify and improve
the analysis of a more distant relative of Quicksort: *Quick Xsort*.
For that, I precisely solve a variation of the Quicksort recurrence
for which previous works proved upper bounds using ad-hoc arguments
specific to certain algorithms X.

## Results

The main result is a transfer theorem that allows to obtain the costs of QuickXsort directly from the costs of X in isolation, including the option to choose median-of-$k$ pivots.

I demonstrate in the paper that the precise solution of the recurrence improves the prediction accuracy of comparison counts by one order of magnitude.

Stating the actual recurrence requires a good deal of notation, so for now, let me point to the arxiv paper for that.

## QuickXsort and QuickMergesort

QuickXsort is a scheme for turning many non-inplace sorting methods X into a constant-extra-space sorting method with almost the same number of comparisons, in fact with overall linear overhead. This sounds magic, but the idea is simple.

For the method to work, we require X to use its extra space only to temporarily
offload some elements to facilitate their rearrangement.
The stereotypical example of such a method is *Mergesort:*
merging requires a buffer to hold one of the runs
(unless we use one of the comparatively complicated inplace merging methods,
which are not deemed competitive in terms of running time with standard out-of-place
merging methods).
It is enough to have a buffer for the first run and merge back into the position
it previously occupied.
Using bottom-up Mergesort, a buffer for $n/2$ elements is the only
extra space we need to sort $n$ elements then.

QuickMergesort then works as follows: We start with a partitioning step as in Quicksort. Then we sort one of the resulting segments by Mergesort, using the other segment as buffer. Whenever a merge operation has to move a run into the buffer, we do so using swap operations. That way the original content of the buffer is never lost, but only permuted (in some hard-to-predict fashion), and when we are done with Mergesorting one segment, the other segment contains the same elements as before (but in arbitrary order). The second segment is then sorted recursively.

Since we only have one recursive call in QuickXsort, we can use tail-recursion elimination to avoid any extra space for these calls.

### Which segment do we sort with X?

The general scheme has some degrees of freedom,
one of them is how to choose the segment to sort with X.
A natural choice is to sort *the largest segment with X for which the other segment
offers sufficient buffer space*.
This ultimately depends on the amount of space needed by X,
and on the sizes of the segments produced by the partitioning step.

For the example of Mergesort, we obtain two difference regimes: For roughly balanced partitions, we can sort the larger segment with Mergesort, whereas for rather unbalanced partitions, we must sort the smaller segment with Mergesort to guarantee enough buffer space. This makes the recurrence of the costs more challenging to analyze.

## QuickHeapsort

Another good option—and indeed the historically first one—for X is Heapsort. But wait; Heapsort? Isn’t Heapsort that single elementary method with reasonable performance that achieves constant extra space right away!?

### External Heapsort

Yes, at first sight Heapsort appears to be the candidate *least* likely to profit
from the QuickXsort scheme.
Indeed, the typical inplace textbook versions of Heapsort wouldn’t profit.
But those have to maintain their in the very rigid shape imposed upon them by storing
it as an implicit heap in a contiguous region of the array.
And this rigid structure comes at the price of extra comparisons:

To extract the maximum, we have to move the last element in heap-part of the array
to the root of the heap and let it sink from there.
This requires up to *two* comparisons per level of the heap, since we have to find
the larger child (which is potentially promoted) and we have to compare this child
with the sinking element.
(Floyd’s optimization, and more sophisticated tricks exist to bring this number
further down, but it remains more than the height of the heap.)

By using an output buffer to hold elements extracted from the heap, we can instead simply promote elements to fill the gap left behind by the extracted max. So we only at most one comparison per level of the heap.

This “external heapsort” thus uses $\sim n\lg n$ comparisons, but requires a buffer to hold $n$ elements. By using it as our X in QuickXsort, we can avoid the extra space requirement.