Parallel jobs on the command line

I’m only posting it because (A) it’s so ugly, (B) I’m certain it could be done more easily, and (C) it would have taken me no longer to write a program to do this.

I wanted to get some data from one application into another. The source data was a tsv file, and I needed only the first and third columns. To import it into the second program, it needed to be inserted as command-line line arguments; the target database is just sqlite, so I’m sure I could have just transformed and imported it, but it seemed straightforward to just run the command.

And it was. The trivial bash loop was:

<input.tsv while read -r line; do                                                        ~
  sparkle -a "$(echo "$line" | cut -f1)" $(echo "$line" | cut -f3)
done

However, there are 9,000 lines in the input, and sparkle takes about a second to process each command. I thought, “I should be able to parallize this easily, shouldn’t I?”

Hah!

Ok, so the parallization isn’t hard; just put an & at the end of the import line. But forking off 9,000 jobs in shell would make my laptop unhappy, so what I needed was a job pool. It turns out, there are a couple of ways to do this, and I looked at 3 of them:

The solution ended up being far more difficult than I expected, mainly because shell expansion happens before the solutions perform argument replacement. So if you, e.g., try something like:

echo 1 2 3 | parallel -j 3 printf "%d %d\n" $(({} * {})) $(({} * 5))

you’ll get an error like:

zsh: bad math expression: illegal character {

If you need to mangle your input before executing a parallelized command on it, things start to get tricky.

BLUF: parallel #

The solution I ended up with used parallel, but it was way more difficult than it should have been.

parallel -j10  --link buku --nostdin -a ::: $(<.surf/bookmarks cut -f1) ::: $(<.surf/bookmarks cut -f3 | tr ' \n' ',\0' | xargs -0 -i echo x,{})

What I’m doing here is using parallel’s argument mixing. The ::: $(<.surf/bookmarks ...) is the first argument list; the second ::: $(...) makes the second list. The --link argument tells parallel to take one item from the first, and one item from the second, and run the command with them. Without --link, parallel will run the command with every permutation of the two lists, which is pretty cool, but not what I was looking for.

I had to make sure the second list never had any empty lines, though; frequently, the third column was missing in the input, so the echo X,{} ensures there’s always something for parallel to consume from the second list.

It took me obscenely long to figure this out; I may have been able to do something with bash -c, and split the line up in the command; this may have worked:

parallel -j10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)' :::: .surf/bookmarks

moreutils-parallel #

I didn’t play with this too much; I got stuck pretty quick with how to break the arguments apart and focused on parallel. However, in retrospect the bash -c trick might work with this.

xargs #

This was the most disappointing. First, the fact that xargs even has a -P option surprised me, but xargs behaved in odd ways, handling input differently depending on whether I used the -P flag or not. It was xargs that led me to the bash -c trick, but then I failed to get it working; arguments and lines would get swallowed, or inexplicably multiple lines would get joined together. I did spend some time on it, because the fewer tools I have to remember the better, but I never did get it working.

The point I’d start with if I come back to this is (again, I emphasize that it doesn’t work):

<.surf/bookmarks cut -f1,3  --output-delimiter ' ' | tr '\n' '\0' | xargs -0 -L1 -P10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)'

Summary #

The most important thing I learned from this is that I spent way, way too long trying to figure this out. I could have hacked a solution together in Go in under a half-hour, and while there’s value to learning CLI tricks, I have other ways I’d rather spend my time.