I’m only posting it because (A) it’s so ugly, (B) I’m certain it could be done more easily, and (C) it would have taken me no longer to write a program to do this.

I wanted to get some data from one application into another. The source data was a tsv file, and I needed only the first and third columns. To import it into the second program, it needed to be inserted as command-line line arguments; the target database is just sqlite, so I’m sure I could have just transformed and imported it, but it seemed straightforward to just run the command.

And it was. The trivial bash loop was:

<input.tsv while read -r line; do                                                        ~
  sparkle -a "$(echo "$line" | cut -f1)" $(echo "$line" | cut -f3)
done

However, there are 9,000 lines in the input, and sparkle takes about a second to process each command. I thought, “I should be able to parallize this easily, shouldn’t I?”

Hah!

Ok, so the parallization isn’t hard; just put an & at the end of the import line. But forking off 9,000 jobs in shell would make my laptop unhappy, so what I needed was a job pool. It turns out, there are a couple of ways to do this, and I looked at 3 of them:

  • GNU parallel, a perl tool
  • moreutils parallel, a compiled program that comes with the most excellent moreutils.
  • xargs

The solution ended up being far more difficult than I expected, mainly because shell expansion happens before the solutions perform argument replacement. So if you, e.g., try something like:

echo 1 2 3 | parallel -j 3 printf "%d %d\n" $(({} * {})) $(({} * 5))

you’ll get an error like:

zsh: bad math expression: illegal character {

If you need to mangle your input before executing a parallelized command on it, things start to get tricky.

BLUF: parallel

The solution I ended up with used parallel, but it was way more difficult than it should have been.

parallel -j10  --link buku --nostdin -a ::: $(<.surf/bookmarks cut -f1) ::: $(<.surf/bookmarks cut -f3 | tr ' \n' ',\0' | xargs -0 -i echo x,{})

What I’m doing here is using parallel’s argument mixing. The ::: $(<.surf/bookmarks ...) is the first argument list; the second ::: $(...) makes the second list. The --link argument tells parallel to take one item from the first, and one item from the second, and run the command with them. Without --link, parallel will run the command with every permutation of the two lists, which is pretty cool, but not what I was looking for.

I had to make sure the second list never had any empty lines, though; frequently, the third column was missing in the input, so the echo X,{} ensures there’s always something for parallel to consume from the second list.

It took me obscenely long to figure this out; I may have been able to do something with bash -c, and split the line up in the command; this may have worked:

parallel -j10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)' :::: .surf/bookmarks

moreutils-parallel

I didn’t play with this too much; I got stuck pretty quick with how to break the arguments apart and focused on parallel. However, in retrospect the bash -c trick might work with this.

xargs

This was the most disappointing. First, the fact that xargs even has a -P option surprised me, but xargs behaved in odd ways, handling input differently depending on whether I used the -P flag or not. It was xargs that led me to the bash -c trick, but then I failed to get it working; arguments and lines would get swallowed, or inexplicably multiple lines would get joined together. I did spend some time on it, because the fewer tools I have to remember the better, but I never did get it working.

The point I’d start with if I come back to this is (again, I emphasize that it doesn’t work):

<.surf/bookmarks cut -f1,3  --output-delimiter ' ' | tr '\n' '\0' | xargs -0 -L1 -P10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)'

Summary

The most important thing I learned from this is that I spent way, way too long trying to figure this out. I could have hacked a solution together in Go in under a half-hour, and while there’s value to learning CLI tricks, I have other ways I’d rather spend my time.