linux
Install pdf2htmlEX on recent Ubuntu
Because of unresolved dependencies, installing pdf2htmlEX became challenging in recent Ubuntu.
Update [2024-11]
For Ubuntu 24.04, the situation seems to again have changed. While the version from pdf2htmlex.github.io still works, it does fail to convert some PDFs for me. I have not yet found a solution for this, but I will update this post when I do.
The old docker built by bwits is still available and works fine, including all the other steps described below, so for now (and again, until the team at pdf2htmlex.github.io has an updated built), the docker container is the way to go.
Update [2022-09]
Much of the complication below can now be avoided! A few developers – worthy of our collective Thanks! – revived pdf2htmlEX and ported it to new versions of poppler and fontforge. Their effort lives on pdf2htmlex.github.io and they offer various prepackaged releases, including AppImages.
pdf2htmlEX in docker
I use pdf2htmlEX to make pdfs nicely readable in the browser. pdf2htmlEX relies on a custom version of the poppler library, and support for more recent versions of poppler has not been built into it yet. Since no new maintainer has been found, people started to look for alternatives to keep using pdf2htmlEX productively, without being forced to stay on old libraries systemwide. Docker containers are a solution for precisely such use cases.
I here describe the steps that it took me to get pdf2htmlEX running on Ubuntu 18.04.1 LTS; I was fine with a certain overhead (in time and space) for running it, but I wanted direct command-line interaction on individual files. Since docker containers are isolated from the host system, this requires some extra steps.
First install docker; I used the snap version, so I ran:
snap install docker
Next, I pulled the prepackaged docker container by bwits:
sudo docker pull bwits/pdf2htmlex
For running pdf2htmlEX conveniently and (somewhat) securely,
you should be able to run docker as user;
this is not possible directly since docker uses Unix sockets owned by root
for communicating with containers.
But if you create a group docker
and add yourself to it,
the socket will be owned by that group instead.
So:
sudo groupadd docker
sudo usermod -aG docker $USER
You probably have to reboot (log out and restart the docker daemon) before
this takes effect, you can test it with docker run hello-world
.
If everything worked out, we can now run pdf2htmlEX as
docker run -ti --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX [args] file.pdf
to convert file.pdf
in the current working directory.
Note that the application inside the container only gets access to the
the folder you map to /pdf
using the -v
option,
i.e., in the above command the current directory.