Because of unresolved dependencies, installing pdf2htmlEX became challenging in recent Ubuntu.
I use pdf2htmlEX to make pdfs nicely readable in the browser. pdf2htmlEX relies on a custom version of the poppler library, and support for more recent versions of poppler has not been built into it yet. Since no new maintainer has been found, people started to look for alternatives to keep using pdf2htmlEX productively, without being forced to stay on old libraries systemwide. Docker containers are a solution for precisely such use cases.
I here describe the steps that it took me to get pdf2htmlEX running on Ubuntu 18.04.1 LTS; I was fine with a certain overhead (in time and space) for running it, but I wanted direct command-line interaction on individual files. Since docker containers are isolated from the host system, this requires some extra steps.
First install docker; I used the snap version, so I ran:
snap install docker
Next, I pulled the prepackaged docker container by bwits:
sudo docker pull bwits/pdf2htmlex
For running pdf2htmlEX conveniently and (somewhat) securely,
you should be able to run docker as user;
this is not possible directly since docker uses Unix sockets owned by root
for communicating with containers.
But if you create a group
docker and add yourself to it,
the socket will be owned by that group instead.
sudo groupadd docker sudo usermod -aG docker $USER
You probably have to reboot (log out and restart the docker daemon) before
this takes effect, you can test it with
docker run hello-world.
If everything worked out, we can now run pdf2htmlEX as
docker run -ti --rm -v `pwd`:/pdf bwits/pdf2htmlex pdf2htmlEX [args] file.pdf
file.pdf in the current working directory.
Note that the application inside the container only gets access to the
the folder you map to
i.e., in the above command the current directory.