Ruby vs. Go... FIGHT!

Note this was written 9 years ago, with correspondingly old versions of both Go and Ruby. It’s probably interesting only for PL historians.

Only sorta-kinda.  I’ve been trying to use Go for some tasks for which I’d normally reach to Ruby; most recently was grabbing some date elements from a large-ish XML file.  I know, I know… Ruby has the best XML library ever built into it, but I’m more aware than anybody of the performance issues it has, so I tend to use it for only very small files.  So when I needed to extract some information out of this fat XML, I thought I’d try Go.

Warning: Micro-Benchmarks ahead!  Keep this in mind as you read. Usually, micro-benchmarks are looked at with skepticism; however, I’m claiming this is useful information because it’s a real-world application, solving a real-world need… even if it is only a very tiny little program in a very large world, after all.

The file wasn’t so big that I was worried about memory use, but since I only needed a leaf from each branch of the tree, I chose the SAX-ish API anyway.  That’s going to make any code more bloated, but it wasn’t too bad; the results are in Version 1.

After I got everything working, I got to wondering about the performance, so I wrote it again in Ruby.  I did not try to use the same logic; instead, I did it in what, for me, is more idiomatic Ruby code. The timings (I chose the average-looking times; I did not properly benchmark these, but I did run each program several times and then grabbed the middlin’ looking one), which may or may not be surprising, look like this:

Version Language Total time (s) CPU usage
Version 1 Go 0.113 96%
Version R Ruby 0.099 66%

“Hmmm”, I hear you say.  Well, the Go version is actually parsing the XML, and we all know XML for the bloated, expensive-to-parse format that it is.  OTOH, Ruby is doing regexp on every line, and is additionally reading the entire file into memory first and splitting it into an array on line endings.  Hmmm. Well, let’s try a Go version that is a little more like the Ruby version.  That’s Version 2:

Version Language Total time (s) CPU usage
Version 2 Go 0.284 103%

Yowsa!  That’s going in the wrong direction.  Interestingly, it’s now using more than one core of my CPU, so it’s doing something thready underneath.  Maybe it’s because I’m reading the file line-by-line off the disk?  Let’s make it even more like the Ruby version; Version 3:

Version Language Total time (s) CPU usage
Version 3 Go 0.292 105%

Definitely going in the wrong direction. Maybe it’s the sre2 library? Let’s try Version 4:

Version Language Total time (s) CPU usage
Version 4 Go 0.037 89%

Ok, that’s better. Armed with this, I went back to not reading the file entirely into memory in Version 5:

Version Language Total time (s) CPU usage
Version 3 Go 0.292 105%
Version 2 Go 0.284 103%
Version 1 Go 0.113 96%
Version R Ruby 0.099 96%
Version 4 Go 0.037 89%
Version 5 Go 0.034 91%

Not a lot of difference; I did see a couple of runs where the CPU use dropped to 89% without affecting the total time, but these are pretty small numbers and we could be seeing actual run time being overwhelmed by the program initialization and what-not.

Anyway, I thought it was interesting.  Ruby is slow as all get-out, but for micro-tasks where most of the heavy lifting is running in native C (regexp in Ruby is native, as is IO), it’s more than capable enough. It’s also worth noticing that this was with ca. 30 lines of Go code, vs. 8 lines of Ruby code.