Erlang + TokyoTyrant / TokyoCabinet / medici

My first real, significant, observation about Erlang is this piece of advice to new Erlangers: repeat this mantra:

Mailboxes are not queues!

More experienced devs than I get this wrong. In any case, I’ve been working with Erlang and TokyoCabinet/Tyrant at work – I’ve been using MongoDB, too; it isn’t clear yet which will best fit our needs – and I’ve been using medici by Jim McCoy, a set of libraries for interfacing with TokyoTyrant/Cabinet. I found a bug today in the principe module, which the following code demonstrates:

-module(principe_test).
-export([runtest/0, loop/0]).

mongod() -> "mongod".  % Where is your mongod executable?

runtest() ->
    start_mongo(),
    Inserter = spawn_link( fun ?MODULE:loop/0 ),
    Inserter ! quit,
    Inserter ! go,
    timer:sleep(500),
    {ok, Fd} = file:open("mongod.pid",[read]),
    {ok, Data} = file:read_line( Fd ),
    os:cmd(io_lib:format("kill -HUP ~p",[string:strip(Data,both,10)])).

start_mongo() ->
    filelib:ensure_dir("test/file"),
    Cmd = io_lib:format(
        "~s --dbpath ~s --port 9999 --fork --logpath ~s --pidfilepath ~s", 
        [   mongod(),
            filename:absname("test"), 
            filename:absname("mongod.log"), 
            filename:absname("mongod.pid") ]),
    os:cmd(Cmd),
    timer:sleep(1000).


loop() ->
    {ok, P} = principe:connect( [{port, 9999}] ),
    receive
        go -> principe:put( P, "key", "value" )
    end.

The problem is that principe:get/3 (and many other functions) use receive to get call-backs from the socket library, and assume that they’re going to be receiving messages from only the socket library. This is a bad assumption. In effect, the methods hijack the mailbox of the calling process and then assume that they’ll only find socket messages in there. Happily, the fix is really small:

diff --git a/src/principe.erl b/src/principe.erl
--- a/src/principe.erl
+++ b/src/principe.erl
@@ -718,7 +718,7 @@
            {error, conn_closed};
         {tcp_error, _, _} -> 
            {error, conn_error};
-        Data -> 
+    {tcp, _, _} = Data ->
            ResponseHandler(Data)
     after ?TIMEOUT -> 
            {error, timeout}

Incidentally, MongoDB is about 4x as fast as Riak, and TokyoTyrant is about 2x as fast as MongoDB. MongoDB and TokyoTyrant are faster than opening a file directly on disk for each small record. Riak is about as fast as direct filesystem access. That’s almost certainly due to memory caching + bulk writes used by MDB & TT, vs. many, many inode create/write/close calls.