Because of unresolved dependencies, installing pdf2htmlEX became challenging in recent Ubuntu.

Update [2022-09]

Much of the complication below can now be avoided! A few developers – worthy of our collective Thanks! – revived pdf2htmlEX and ported it to new versions of poppler and fontforge. Their effort lives on and they offer various prepackaged releases, including AppImages.

pdf2htmlEX in docker

I use pdf2htmlEX to make pdfs nicely readable in the browser. pdf2htmlEX relies on a custom version of the poppler library, and support for more recent versions of poppler has not been built into it yet. Since no new maintainer has been found, people started to look for alternatives to keep using pdf2htmlEX productively, without being forced to stay on old libraries systemwide. Docker containers are a solution for precisely such use cases.

I here describe the steps that it took me to get pdf2htmlEX running on Ubuntu 18.04.1 LTS; I was fine with a certain overhead (in time and space) for running it, but I wanted direct command-line interaction on individual files. Since docker containers are isolated from the host system, this requires some extra steps.

First install docker; I used the snap version, so I ran:

snap install docker

Next, I pulled the prepackaged docker container by bwits:

sudo docker pull bwits/pdf2htmlex

For running pdf2htmlEX conveniently and (somewhat) securely, you should be able to run docker as user; this is not possible directly since docker uses Unix sockets owned by root for communicating with containers. But if you create a group docker and add yourself to it, the socket will be owned by that group instead. So:

sudo groupadd docker
sudo usermod -aG docker $USER

You probably have to reboot (log out and restart the docker daemon) before this takes effect, you can test it with docker run hello-world.

If everything worked out, we can now run pdf2htmlEX as

docker run -ti --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX [args] file.pdf

to convert file.pdf in the current working directory. Note that the application inside the container only gets access to the the folder you map to /pdf using the -v option, i.e., in the above command the current directory.