My PyCon US 2023 Powersort Talk

I presented Powersort and its story at PyCon US 2023, the largest Python community conference. Here are some resources and impressions from the last couple of days here in Salt Lake City.

View on downtown Salt Lake City (with the conference venue!) and Utah State Capitol, from Ensign Peak.

Resources

Official Talk recording
My reupload with the antiphase audio fixed
Talk slides
(Speakerdeck seems to have an issue with my fancy transparency patterns, but you can download the pdf there.)
Colab notebook with the (educational) implementations of Timsort and Powersort
How-to for donating data to the Adaptive Sorting Benchmark

Here’s the blurb from the talk submission:

Quicksort, Timsort, Powersort - Algorithmic ideas, engineering tricks, and trivia behind CPython’s new sorting algorithm
Writing a sorting function is easy - coding a fast and reliable reference implementation less so. In this talk, I tell the story behind CPython’s latest updates of the list sort function.

Aims: entertain people with twists of history and algorithmic puzzles, which tell a lovely story of how a seemingly useless piece of theory lead to the fastest and most elegant solution of a practical challenge.

Target audience: geeks believing in the power of solid algorithmic thinking; programmers interested in engineering performance-critical code; all Python enthusiast curious about what makes (sorting lists in) Python fast.

Content: After using Quicksort for a long while, Tim Peters invented Timsort, a clever Mergesort variant, for the CPython reference implementation of Python. Timsort is both effective in Python and a popular export product: it is used in many languages and frameworks, notably OpenJDK, the Android runtime, and the V8 JavaScript engine.

Despite this success, algorithms researchers eventually pinpointed two flaws in Timsort’s underlying algorithm: The first could lead to a stack overflow in CPython (and Java); although it has meanwhile been fixed, it is curious that 10 years of widespread use didn’t bring it to surface. The second flaw is related to performance: the order in which detected sorted segments, the “runs” in the input, are merged, can be 50% more costly than necessary. Based on ideas from the little known puzzle of optimal alphabetic trees, the Powersort merge policy finds nearly optimal merging orders with negligible overhead, and is now (Python 3.11.0) part of the CPython implementation.

Impressions

PyCon US is huge; there were 2200 attendees on site, plus over 400 online participants; the industrial sponsors alone donated over $1 million (!) towards the event. (Cheers to JetBrains for bringing proper coffee to the US and giving it away for free (cappuccinos throughout the conference, yay ☕), and AWS & Superblocks for inviting the whole conference to their party 🍺).

Talk topics were very mixed and broad; some talks presented technical details on changes to the CPython implenentation (like the two talks by Mark and Brandt from the Faster CPython initiative, or the flurry of talks on PyScript, the attempts to make Python run in the browser(!)), whereas others provided an overview of an area or reported on particular projects (such as games on the micro:bit or algorithmic embroidery).

I was also deeply impressed by how much PyCon has going on in terms of community building at the conference (and outside of it), and how much personal appreciation people showed for each other. It is a remarkable achievement to not only get the technical aspects of the language right, but also the community around it. While one first needs a thing to gather around before a community can evolve, it makes me wonder whether a healthy community indeed doesn’t follow, but cause the success of Python.

I was very happy that Guido van Rossum made it to my talk. Of course, I couldn’t let the opportunity pass to interview him on sorting in CPython afterwards:

“I don’t remember the exact reasoning, but the mere fact that we used qsort for sorting lists shows you that I didn’t care much about sorting.”

Fair enough 😉
Had that been any different, who knows whether I would have had an excuse to enjoy PyCon US today?