Saturday, November 29, 2014

Converting SVG files in bulk and parallelizing for loops


What I learned about Inkscape today: it's easy and quick to export files from the command line, and for testing purposes, it has a rather cool interactive command line mode, accessed by the command "inkscape --shell"

I found out about this by reading Inkscape's man page to find out how to export files from the command line.  There I found this:

--shell

With this parameter, Inkscape will enter an interactive command line shell mode. In this mode, you type in commands at the prompt and Inkscape executes them, without you having to run a new copy of Inkscape for each command. This feature is mostly useful for scripting and server uses: it adds no new capabilities but allows you to improve the speed and memory requirements of any script that repeatedly calls Inkscape to perform command line tasks (such as export or conversions). Each command in shell mode must be a complete valid Inkscape command line but without the Inkscape program name, for example "file.svg --export-pdf=file.pdf".


I also found out that to export the page area of a file to a pdf, I can use the command "inkscape file.svg --export-area-page --export-pdf=file.pdf". (or, a bit shorter, "inkscape file.svg -C -A=file.pdf").

OK, let's see how this works. First I pull my list of svg files into vim:

$ ls *svg | vim -

Then, in vim, I add the command line parameters to export the whole page to the respective pdf file, giving a list that I can copy and paste. That's a one-liner:

:%s/\(.*\).svg/\1.svg -C -A=\1.pdf/

In another xterm, I enter inkscape interactive mode and see this:

$ inkscape --shell
Inkscape 0.48.4 r9939 interactive shell mode. Type 'quit' to quit.
>


I paste the first command from my list and check that it works out OK. All good, so paste the rest and all my files are converted within a few seconds.  If I weren't curious and procrastinatory, I'd have stopped here, but now I want to know what other options I have for bulk export, and how they stack up against each other.

First test: just string them together on the command line. Back to vim, type "vipj" to gather all my commands into one line, add the magic words "time inkscape" to the start of the line and paste into bash:

$ time inkscape egflitetext.svg -C -A=egflitetext.pdf EGFLogoMonoPathsBusinessCard.svg -C -A=EGFLogoMonoPathsBusinessCard.pdf [etc. etc. - long command line snipped here]

real    0m3.298s
user    0m3.132s
sys     0m0.144s


So exporting 12 svgs to pdf took just over 3 seconds.  Not bad, but a bit clunky if I want do do this frequently. I spent some time trying to figure out how to make find or xargs generate this commandline, but I can't figure it out.  I could probably do it in sed, but for my purposes that's just a bit too conceptually complex, so instead I used this loop:

$ time (for i in *svg ; do inkscape $i -C -A=$i.pdf; done)

real    0m5.512s
user    0m5.080s
sys     0m0.412s


That's 67% slower, but much more convenient. But wait, I have a 4 core processor in my laptop, so why not let them all work at once? xargs can run jobs in parallel.

 $ time (find . -maxdepth 1 -name "*svg" -print0 | xargs -0 -P4 -I % inkscape % -C -A=%.pdf)

real    0m2.572s
user    0m8.360s
sys     0m0.460s


Wow, quite a lot quicker when we do things in parallel! But hang on, if I detach the jobs in the for loop, they'll also run in parallel.  Let's see how that performs:

$ time (for i in *svg ; do inkscape $i -C -A=$i.pdf & done)

real    0m0.007s
user    0m0.000s
sys     0m0.004s


Oh, right: these processes run as separate jobs, so their time isn't counted towards the total. Bright ideas to profile this are welcome! It felt really quick, and it's simpler to remember than the find | xargs solution, but of course if I had hundreds of files to convert instead of a dozen, it might lock up my system, so then xargs would be the way to go.

As we all know, Inkscape isn't the lightest svg converter out there: librsvg (based on the cairo library since 2005) has a utility called rsvg-convert which is specially built for this task. Let's give it a spin. For comparison, I'll run the jobs first in series and then in parallel:

$ time (for i in *svg ; do rsvg-convert $i -f pdf > $i.pdf; done)

real    0m2.976s
user    0m2.772s
sys     0m0.160s


$ time (find . -maxdepth 1 -name "*svg" -print0 | xargs -0 -P4 -I % rsvg-convert % -f pdf > %.pdf)
 

real    0m1.341s
user    0m4.460s
sys     0m0.208s


Even when called once for each file, the librsvg tool easily outperforms Inkscape called as a single instance.  These svgs were originally made in Inkscape, so librsvg isn't guaranteed to give the same output, but when I compared the results, the pdfs from Inkscape's export and rsvg-convert had almost exactly the same file size, and the only difference I could see in the pdfs was the scaling: in evince, when the two pdfs were the same size on the screen, they had different zoom levels.

Even though librsvg is faster, I'm going to keep using Inkscape: even though I couldn't spot any differences between the output files, I don't want to have to worry about it. This would be different if I were running a server.

Of course, in all my examples except the one with the long command line, the filenames end with .svg.pdf .  If I want to fix that, it's a simple matter of typing "rename -f 's/.svg//' *svg.pdf"

Relevant XKCD:





No comments:

Post a Comment