Back in the early 2000’s, I was interested in distributed issue tracking. Prior to that, I used a variety of mostly native, custom tracking systems, and when industry moved away from native GUI ticket trackers and to online trackers, I moved with it. I’ve always found web interfaces to be invariably clunky and slow, so I frequently used tools that either interfaced with the trackers with an API. The greatest weakness of web-based trackers, though, is when you don’t have a reliable internet connection, and replacing the front-end never helps with that.

So around 2010, I started using a tool called Ditz, which was a distributed issue tracking tool. It stored ticket information in a text-based file on the filesystem, and you’d check that in with the rest of your sourcecode. What was brilliant about this was the locality; when you moved around in the repository history, the exact state of the tickets kept sync. Ditz even had a web interface you could run to allow non-devs to submit tickets.

Prior Art

Ditz was not unique. There are a dozen other projects with distributed designs: lentil, Artemis, BugsEverywhere, b, git-bug, ScmBug, Fossil, SD, ticgit, ditrack; I looked at all of these. Many are tied to a single version control system; ditz was not, and so fit my needs best. In fact, a number of articles discussing distributed ticketing had been written:

Another approach

Ditz – and many of its ilk – keep ticket information in a separate database, usually stored in the repository but sometimes separate and with its own VCS. This architecture has pros and cons, but the cons are what eventually led me to look around for alternatives.

The issue all ticketing systems have to solve is this: how do you integrate the ticketing with the sourcecode? Non-distributed systems solve this by being tightly coupled with the VCS server, which enforces the anti-pattern we see of the centralization of de-centralized version control systems. If you think that hasn’t happened, think about how you’d react if someone on your team suggested hosting the sourcecode outside of github. Much of the functionality in an issue tracker requires being hooked into the VCS in an intimate way; you can not use Github’s ticketing system with a repository that isn’t hosted in Github.

Most distributed issue trackers have the same challenge, and most solve it in one of two ways:

  1. Put the tickets in the same repo as the source code
  2. Put the tickets in their own VCS, but integrate them with the sourcecode VCS tooling

The first is the most common, and the one Ditz uses. The problem is that any ticket changes do get committed to the repository, which creates a lot of chatter in the version history. The second is somehow worse, IMO, because it makes assumptions – and imposes limitations – about the VCS. I use git when I have to, for collaboration on github projects, or in one case, because I took over stewardship of an established project; but I prefer to use Mercurial. Any tool that requires one means I can’t use it on the other, and that just harshes my chill.

I think that, ironically, the biggest challenge for DITs is the desire to be a full-fledged issue tracker, with comments, ticket history, complex workflows, and attachments. Trying to achieve this while retaining distributivity and integration with a VCS gets messy pretty quickly; it also gets heavy, which is counter to an implied feature of these systems: issue tracking without the bulk of a centralized server with attached storage and a sophisticated database. Ditz suffered from this; it wanted to be simple, but also feature complete, and I think those two goals competed with each other.

After using Ditz for a couple of years, I found lentil, which took a more simple approach: it used code comments as its database. You didn’t add tickets with the tool; to do that, you added comments to your code. It was a very limited tracker, but by not trying to do everything a complex tool, complete with history, comments, attachments, and so on, it had a certain elegant simplicity.

legume was a liberating approach to ticketing. It was never going to replace an enterprise-ready application like Jira; to work in cross-functional teams that include not only QA, but business analysts and product owners, you need a system that those people can use without having to edit code. But it excelled at being a great developer tool, because it was an intuitive extension to something that developers have been doing since people have been writing code.

Keeping notes in code is a venerable programmer tradition. If you open any respectable-sized code base in any language and search for the words “TODO” or “FIXME”, you are almost guaranteed to get results. It’s because, when reading code, developers notice things that aren’t related to what they’re in there for. So as to not lose track, they add a source code comment, like “TODO: this entire block could be replaced with the new system library Sort function,” or “FIXME: this nested loop could be optimized.” Often, when writing code, developers will put a note in for some future development: “TODO: add a filter feature.”

In-sourcecode notes retain locality: they are often located at the point of work, or the point of entry for the work. Take the notes out of the code, and you either have to refer to the code (which will inevitably become stale), or when you come back to it you have to dig around for where to start. Todo comments are warnings and reminders for code browsers; people looking through the code see these comments, which might otherwise clog up the ticket system and be purged in spasm of “if it’s older than 2 years, we’re not going to fix it, so delete it.” Comments are integral with the history of the code, and automatically get the benefits of version control with no extra effort on behalf of the developers, and they’re not tied to any specific VCS. Git? Mercurial? Subversion, Bazaar, Darcs, Subversion, CVS, RCS, yea verily, even ClearCase and UCM! It doesn’t matter what the version control system is, if you have metadata in code comments, it’ll be versioned and available as long as you can access the source code. And when a TODO is completed, or a bug is fixed, it’s natural to delete the comment; when the code is committed, the ticket is as gone as the bug. If you have a good version control system, you can easily search the repository history for the issue, and the finding of it also tells you when the issue was closed, and by which code changes. Systems like Redmine, Trac, Bugzilla, and Jira put a lot of effort duplicating this linkage between issues and code changes, and it is effortlessly free in systems like Lentil.

Predictably, Lentil had its own issues. It is written in Haskell which, while a lovely formal language, is exceedingly slow in compile times, and the Haskell build environment on Arch is nearly 2GB; Haskell has a high cost of entry for developers, so attracting contributors can be difficult. Lentil’s execution speeds were also pretty slow. Even by the time I found it, Lentil was not actively developed and was lacking features I wanted, so, in 2017, I authored a spiritual successor, Legume.

legume

Legume is a distributed issue tracker base on developer code comments such as TODO and FIXME. It has no separate metadata or database and understands several programming languages. It’s written in Go, and is significantly faster than Lentil;

One of the main motivations for writing Legume was to add features I was missing from Lentil:

  • Performance.
  • Smart interpretation of diffs – it can tell you when a ticket was closed or added by looking at when the comment was removed or added. This works across version control systems, as long as they can generate diffs.
  • The ability to put todos in a separate file, for things that aren’t associated with any specific code.
  • Understanding the todo.txt format (priorities, due dates, project and contexts). So a comment could be:
    // TODO (A) Add a cancel context to the request +client
    
  • Plug-ins for vim and kakoune editors to open jump-to buffers

All in all, Legume understands this todo.txt syntax:

  • (A) priorities
  • 2017-03-20 “created” dates
  • due:2017-03-20 “tagged” dates
  • *@contexts_
  • +projects
  • key:value tags
  • Multi-line issues (new issue, blank line, or non-comment breaks issue)

Legume understands 42 programming and meta languages, and four commonly used keywords that it uses to classify the issue: TODO comments are presented as requirements (REQ), and FIXME comments show as bugs (BUG). Legume also recognizes XXX and BUG comments.

Because of its core philosophy, Legume has some specific non-features:

  • No unique ticket IDs, which can make referencing tickets difficult or cumbersome outside of the system.
  • Very simple tickets. No attachments, no cross-references or dependencies, no robust commenting; history tracking limited to what’s available in the VCS.
  • Efficiency. In a pure state, collating tickets requires walking directory trees and parsing source files.

The final point may be addressed by caching at some point, but the amount of added complexity may not be worth the effort.

Use the built-in help for options. One workflow is:

$ leg       # to list all todo/fixmes in a project
$ leg 5     # to list the details of item # 5

The diffing feature is something I use frequently:

➜  legume hg diff -r 4:98 | leg  -
  1 REQ    NEW Include the time stamp in the report; removed use old version, add use new
  2 REQ    NEW filter on priority, category, and project; use meta-tags and -t
  3 BUG    NEW catch string lit escapes
  4 REQ CLOSED Support for STDIN
  5 REQ CLOSED Config file
  6 REQ CLOSED Support for unified diff
  7 REQ CLOSED Add test cases.
  8 REQ CLOSED Add test cases.
  9 REQ CLOSED Add test cases.
 10 REQ    NEW Add unit tests for Alias [component:ui]
 11 REQ    NEW Implement & add unit tests for Keywords [component:ui]
 12 REQ CLOSED Add test cases.
 13 REQ CLOSED Add test cases.
 14 REQ CLOSED Add test cases.
 15 REQ    NEW Parsing diffs seems to be broken
 16 REQ CLOSED Add test cases.
 17 INF    NEW refactor parsing to "consume-to-end"

NEW is a TODO that was added in the changeset; CLOSED was something that was removed.

Obvious limitations result from how diff reports information; a changed line is reported as a combination delete + add, which looks to Legume like a close + open. In practice and over short spans, it works pretty well.

Performance samples

Legume has already satisfies my original performance objectives; small projects have sub-second parse times.

Project with 108 files, 14k lines, 15 todos:

leg .  0.04s user 0.03s system 81% cpu 0.090 total

Project with 992 files, 564k lines, 244 todos:

leg .  1.34s user 0.07s system 95% cpu 1.464 total

The Linux 5.7 kernel source tree, 64,309 files, 28,136,537 lines, 9,697 todos:

leg .  49.60s user 0.96s system 99% cpu 50.713 total

Summary

Legume is intended for a narrow problem space. Large projects, such as the Linux kernel (and certainly the boundary is much lower than that) are certainly better served by a “real” issue tracking system. For large code bases, any tool that parses the entire code base on every invocation will be too slow, even if a tool like legume didn’t lack most of the features of a sophisticated issue tracking system.

Most of my projects are small, and Legume works well for my use case. I always include a public ticket tracker for end-users to submit issues and requests, but all of my personal ticketing is done with Legume – I find it faster and easier to make a TODO note than to navigate and load a web page and fill out a form.

Maybe it’ll be a useful tool for you; I certainly hope so.