Pipelined execution in modern processors is slowed down by conditional branches unless their outcome is easy to predict. In sorting, branches corresponding to key comparisons are usually hard to predict because they are not mostly taken or not taken, but depend on the input.
In this paper, we give the first rigorous analysis of the expected number of branch misses in Quicksort, both for classic Quicksort and with YBB partitioning, and including pivot sampling. The latter is particularly important since it reduces the overall number of comparisons, which makes each of these comparisons necessarily less predictable.
In fact, the overall result is that median-of-$k$ Quicksort incurs more branch misses the larger $k$ gets.
A second interesting result is that YBB Quicksort and classic Quicksort perform almost the same w.r.t. branch misses; in particular, the number of branch misses cannot be the reason for the favorable running time of YBB Quicksort (rather, it is most likely a memory-hierarchy effect, see my dissertation).
Relation to Other Papers
In my dissertation, I sketch how to extend the analysis of this paper to generic $s$-way one-pass partitioning, and also give a rigorous proof that we can safely use the steady state of the predictor automaton to determine expected branch-miss counts.