Contents
Preface
Curriculum Vitae
0 Appetizer
1 Introduction
1.1 History
1.2 Little Glossary of Quicksort Terms
1.3 Recent Developments
1.4 The Memory Wall
1.5 Multiway Quicksort
1.6 Aim and Scope of this Dissertation
Outline
1.7 Related Work
1.7.1 Towards Classic Quicksort
1.7.2 Sedgewick's Work
1.7.3 Practical Tweaks in the Nineties
1.7.4 Off-the-Shelf Theorems for Solving Recurrences
1.7.5 Analysis of Pivot Sampling
1.7.6 Quicksort and Search Trees
1.7.7 Limit Distributions
1.7.8 Multiway Quicksort
1.7.9 Equal Keys
1.7.10 Bit Complexity and Symbol Comparisons
1.7.11 Memory Hierarchy
1.7.12 Branch Misses
1.7.13 Academic Quicksort Variants
1.7.14 My Previous Work
2 Mathematical Tools
2.1 Continuous Basics
2.1.1 Asymptotic Notation
2.1.2 Special Functions
2.1.3 Convexity
2.1.4 Lipschitz- and Hölder-Continuity
2.2 Discrete Basics
2.2.1 Binomial Coefficients
2.2.2 Harmonic Numbers
2.3 Generating Functions
2.3.1 Singularity Analysis
2.3.2 Euler Differential Equations and the Operator Method
2.4 Random Basics
2.4.1 Categorical Distribution
2.4.2 Exponential Distribution
2.4.3 Gamma Distribution
2.4.4 Beta and Dirichlet Distributions
2.4.5 Dirichlet-Calculus
2.4.6 Binomial and Multinomial Distributions
2.4.7 Beta-Binomial and Dirichlet-Multinomial Distributions
2.4.8 Shannon Entropy
2.4.9 Entropy and Binary Search Trees
2.5 Discrete Entropy
2.5.1 Entropy of Dirichlet Vectors
2.5.2 Properties of the Harmonic-Difference Function
2.5.3 Properties of the Discrete Entropy
2.6 A Master Theorem for Distributional Recurrences
2.6.1 The Continuous Master Theorem
2.6.2 Distributional Recurrences
2.6.3 A Simple Distributional Master Theorem
2.6.4 Self-Contained Proof of an Educational DMT
2.6.5 The Distributional Master Theorem
3 Models and Assumptions
3.1 Models of Input Distributions
3.1.1 The Random-Permutation Model
3.1.2 The Expected-Profile Model
3.1.3 The Exact-Profile Model
3.1.4 Enforcing Random Order
3.2 Cost Models
3.2.1 Information Costs: The Comparison Model
3.2.2 Rearrangement Costs: Swaps and Write Accesses
3.2.3 Primitive Instruction Counts
3.2.4 The Memory Hierarchy: Scanned Elements and Cache Misses
3.2.5 Pipelined Execution: Branch Mispredictions
4 Quicksort
4.1 From Partitioning to Sorting
4.2 Famous Quicksort Variants
4.2.1 Classic Quicksort
4.2.2 Lomuto's Partitioning
4.2.3 Yaroslavskiy-Bentley-Bloch Quicksort
4.2.4 Sedgewick's Dual-Pivot Method and Kciwegdes Partitioning
4.2.5 Waterloo Quicksort
4.2.6 Algorithmic Improvements
4.3 Generic One-Pass Partitioning
4.3.1 Master-Pivot Schemes
Algorithm 9: Partition
4.3.2 Master-Segment Schemes
4.3.3 Known Methods as Generic One-Pass Instances
4.4 Choosing Pivots From a Sample
4.4.1 Generalized Pivot Sampling
4.4.2 Random-Parameter Pivot Sampling
4.5 Randomness Preservation in the Presence of Pivot Sampling
4.6 Other Partitioning Methods
4.6.1 Removing Duplicates
4.6.2 Linear Memory Or Several Passes
4.6.3 Hennequin's Generalized Quicksort
4.6.4 Relation to Concurrent Work of Aumüller et al.
5 Analysis of Generic Partitioning
5.1 Toll-Function Notation
5.2 Stochastic Description of Partitioning Under Pivot Sampling
5.2.1 Class Probabilities
5.2.2 Segment Sizes and Subproblem Sizes
5.2.3 Distribution of Pivot Values
5.2.4 Random-Parameter Pivot Sampling
5.3 Generic Model: Element-wise Charging Schemes
5.3.1 General Charging Schemes
5.3.2 Relation to Classification Strategies
5.3.3 Separating State Frequencies From Element Costs
5.3.4 FSM-based Element-Wise Charging Schemes
5.4 Charging Schemes for Our Cost Measures
5.4.1 The State Automaton For One-Pass Partitioning
5.4.2 Key Comparisons
5.4.3 Scanned Elements
5.4.4 Write Accesses
5.4.5 Swaps resp. Cyclic Shifts
5.5 Execution Frequencies of Inner Loops
5.5.1 Master-Pivot Algorithm
5.5.2 Master-Segment Algorithms
5.5.3 A Note on Limit Distributions
5.6 Class-State Co-Occurrences
5.7 Comparison-Optimal Partitioning
5.8 Branch Misses
5.8.1 Conditional Independence
5.8.2 A Generic Comparison-Branch Location
6 The Quicksort Recurrence
6.1 Recursive Description of Costs
6.1.1 Nitpicks on Generalized Pivot Sampling
6.1.2 Recurrence for Expected Costs
6.1.3 Relation to Combinatorial Form for Subproblem Size Probabilities
6.2 Back-of-the-Envelope Approaches
6.2.1 Van Emden's Entropy-Reduction Argument
6.2.2 A Continuous Recurrence
6.2.3 Explicit Solution of Continuous Recurrence
6.2.4 A Master-Theorem-Style Argument
6.2.5 Leading Term by Ansatz
6.3 Solution with the Distributional Master Theorem
6.4 Random-Parameter Pivot Sampling
6.5 Solution with Generating Functions
6.5.1 A Differential Equation for the Generating Function
6.5.2 Differential-Operator Transformation
6.5.3 A General Solution to the Differential Equation
6.5.4 Coefficient Asymptotics
7 Results and Discussion
7.1 Average Costs
7.2 Disclaimer
7.3 Simulating Multiway Partitioning by Binary Partitioning
7.3.1 Waterloo vs. Classic
7.3.2 Entropy-Equivalent Sampling
7.3.3 s-Way vs. Binary Partitioning
7.4 The Optimal Meeting Point for Indices
7.4.1 Comparisons
7.4.2 Scanned Elements
7.4.3 Summary
7.5 Optimal Pivot Sampling
7.5.1 Optimal Sample Sizes?
7.5.2 Entropy-Tight Pairs
7.5.3 Linear Costs With Exact-Quantile Pivots
7.5.4 Scanned Elements
7.5.5 An Optimality Criterion for Finite Sample Sizes and Gibb's Gap
7.5.6 Two Comparisons Trees: Non-Linear Costs
7.5.7 Optimal t: Heuristics and Tables
7.6 The Optimal Number of Pivots
7.6.1 Key Comparisons
7.6.2 Sampling with a Limit Density
7.6.3 Scanned Elements
7.6.4 Exponential-Decay Sampling
7.6.5 Optimal s for fixed k
7.7 Optimal Comparison Trees
7.7.1 The Single-Tree Case
7.7.2 Two Comparison Trees
7.7.3 The Optimal Alphabetic Tree Problem
7.7.4 Entropy-Bounds for Comparison Trees
7.7.5 Optimal Choices for A Single Tree
7.7.6 The Benefit of Two Trees
7.7.7 Concluding Remarks on Optimal Comparison Trees
7.8 Interactions between Parameters: The Jellyfish Paradox
8 Equal Keys
8.1 Introduction
8.1.1 Ignoring Equals
8.2 Input Models with Equal Keys
8.3 Related Work
8.4 Fat-Pivot Partitioning
8.4.1 Existing Methods
8.4.2 New Concepts and Notations
8.4.3 Generic Multiway Fat-Pivot Partitioning
8.4.4 Fat-Separator Methods
8.4.5 Fat-Outsides Methods
8.5 Partitioning Costs
8.5.1 Stateless Cost Measures
8.5.2 Stochastic Model
8.5.3 (Ternary) Comparisons
8.5.4 Scanned Elements (Partitioning)
8.5.5 Scanned Elements (Cleanup)
8.6 Recurrence for Expected-Profile Inputs
8.6.1 Distributional Recurrence
8.7 Quicksort Recursion Trees
8.7.1 Recursion Trees
8.7.2 t-Fringe-Balanced s-Ary Search Trees with Leaf-Buffer Size w
8.7.3 Recursion Trees and stwl-Trees
8.7.4 Generic Search Costs
8.7.5 Search Costs in Expected-Profile Models
8.7.6 Distributional Recurrence for Weighted Path Length
8.7.7 Expected Weighted Path Length for Random U-ary Words
8.8 Quicksort Under Stateless Cost Measures
8.8.1 Random u-Ary Words
8.8.2 Quicksort Costs Value-Wise
8.8.3 Many Duplicates and Degeneracy
8.8.4 Proof of Theorem 8.16
8.9 Discussion
9 Conclusion
9.1 The Optimal Quicksort?
9.2 Open Problems and Future Work
A Index of Notation
A.1 Generic Mathematical Notation
A.2 Stochastics-related Notation
A.3 Input to the Algorithm
A.4 Notation for the Algorithm
A.5 Notation Specific to the Analysis (Uniform Model)
A.6 Notation for the Analysis with Equal Keys
B Bibliography