I’m only posting it because (A) it’s so ugly, (B) I’m certain it could be done more easily, and (C) it would have taken me no longer to write a program to do this.
I wanted to get some data from one application into another. The source data was a tsv file, and I needed only the first and third columns. To import it into the second program, it needed to be inserted as command-line line arguments; the target database is just sqlite, so I’m sure I could have just transformed and imported it, but it seemed straightforward to just run the command.
And it was. The trivial bash loop was:
<input.tsv while read -r line; do ~
sparkle -a "$(echo "$line" | cut -f1)" $(echo "$line" | cut -f3)
done
However, there are 9,000 lines in the input, and sparkle
takes about a second to process each command. I thought, “I should be able to parallize this easily, shouldn’t I?”
Hah!
Ok, so the parallization isn’t hard; just put an &
at the end of the import line. But forking off 9,000 jobs in shell would make my laptop unhappy, so what I needed was a job pool. It turns out, there are a couple of ways to do this, and I looked at 3 of them:
- GNU parallel, a perl tool
- moreutils parallel, a compiled program that comes with the most excellent moreutils.
- xargs
The solution ended up being far more difficult than I expected, mainly because shell expansion happens before the solutions perform argument replacement. So if you, e.g., try something like:
echo 1 2 3 | parallel -j 3 printf "%d %d\n" $(({} * {})) $(({} * 5))
you’ll get an error like:
zsh: bad math expression: illegal character {
If you need to mangle your input before executing a parallelized command on it, things start to get tricky.
BLUF: parallel
The solution I ended up with used parallel, but it was way more difficult than it should have been.
parallel -j10 --link buku --nostdin -a ::: $(<.surf/bookmarks cut -f1) ::: $(<.surf/bookmarks cut -f3 | tr ' \n' ',\0' | xargs -0 -i echo x,{})
What I’m doing here is using parallel’s argument mixing. The ::: $(<.surf/bookmarks ...)
is the first argument list; the second ::: $(...)
makes the second list. The --link
argument tells parallel to take one item from the first, and one item from the second, and run the command with them. Without --link
, parallel will run the command with every permutation of the two lists, which is pretty cool, but not what I was looking for.
I had to make sure the second list never had any empty lines, though; frequently, the third column was missing in the input, so the echo X,{}
ensures there’s always something for parallel to consume from the second list.
It took me obscenely long to figure this out; I may have been able to do something with bash -c
, and split the line up in the command; this may have worked:
parallel -j10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)' :::: .surf/bookmarks
moreutils-parallel
I didn’t play with this too much; I got stuck pretty quick with how to break the arguments apart and focused on parallel
. However, in retrospect the bash -c
trick might work with this.
xargs
This was the most disappointing. First, the fact that xargs even has a -P
option surprised me, but xargs behaved in odd ways, handling input differently depending on whether I used the -P
flag or not. It was xargs that led me to the bash -c
trick, but then I failed to get it working; arguments and lines would get swallowed, or inexplicably multiple lines would get joined together. I did spend some time on it, because the fewer tools I have to remember the better, but I never did get it working.
The point I’d start with if I come back to this is (again, I emphasize that it doesn’t work):
<.surf/bookmarks cut -f1,3 --output-delimiter ' ' | tr '\n' '\0' | xargs -0 -L1 -P10 bash -c 'buku --nostdin -a $(echo $1 | cut -f1) $(echo $1 | cut -f3)'
Summary
The most important thing I learned from this is that I spent way, way too long trying to figure this out. I could have hacked a solution together in Go in under a half-hour, and while there’s value to learning CLI tricks, I have other ways I’d rather spend my time.