Every once in a while I find myself wasting a couple of hours re-trolling through the Wikipedia entry for lightweight markup languages, searching for the perfect markup system.  They are all deficient in some way, and it’s becoming increasingly evident that I’m either going to have to pick the least deficient one and write additional software for transforming it (ugh), or come up with my own (yeauck!).

As a preface, I’ll just say that TeX is a non-starter; any tool that requires TeX (or LaTeX, or any derivation thereof) is not an option. I’ve said it before: all TeX packages require about a gigabyte of disk space, and TeX doesn’t do anything significantly better than what Lout does in 11MB.  TeX is bloated, and I simply don’t want to have anything to do with it; it was fine when I was in college, when the only alternative was MS Word, but there are better alternatives out there, now.

RestructuredText has a huge amount of software for converting it into HTML, DocBook, PDF, Slidy, S5, and others, and it has very few dependencies.  In particular, rst2pdf can generate PDF using only a small number of Python libraries.  It has an awesome table syntax.  It is easily extended to support dot diagrams, aafigure, and a host of other embedded images.  You can use SVG images in an RST document, and both rst2pdf and rst2html will do the right thing.  However, the markup is unintuitive and verbose, and sometimes awkward, and it lacks primitive support for basic things like underscore, superscript, subscript, and strike-through.  While you can do these mark-ups with roles, they all require verbose syntax and both rst2pdf and rst2html don’t handle whitespace in roles correctly (e.g., :strike:- blah)

Asciidoc has better syntax, although it too has no primitive support for underscore and strike-out.  It’s main failing is that writing asciidoc is a lot like programming: the conversion tools are extremely strict about syntax, and will utterly fail on any minimal mistake (whereas the RST tools will normally emit a warning and continue on, trying their best).  Asciidoc converts everything through DocBook, which isn’t itself so bad, but getting something into PDF requires either LaTeX (recommended) or FOP, and the FOP stylesheet needs tweaking to get it to work correctly.  Altogether, Asciidoc is too fussy. 

Markdown has OK syntax, although they stopped too short on the really useful heading syntax (only two levels with the underline style), and it requires extensions to get anything like table and fenced block syntax.  Most damning, there’s no way to convert Markdown to PDF without going through LaTeX.

Creole and Textile have dumb heading syntax – they’re pretty obviously designed to make editing wiki pages easier, not to produce plain ASCII documents that can also be converted into something else.  Creole also has ugly list syntax.  Textile’s heading and blockquote markup is useless for reading the documents in plain-text.  In particular, Textile’s inline markup is almost perfect; unfortunately, the only conversion tool is to HTML, and the heading and blockquote markup really is unusable.

<!-- -->
<!-- -->

Setext has interesting markup, but is pretty spartan.  No table support, strike-through, image embedding; and there are two converters: one for converting to HTML, and one for converting to LaTeX.  So, really only one usable (IMO) converter.  It might be a good basis for extension into a more robust markup.

The Wikipedia page on Texy claims that it’s the “most complex” markup language, but it doesn’t support strike-out or underline.  Otherwise, its syntax is quite interesting, as it its table support.  However, there’s no software for converting to either PDF or Slidy.  The markup for adding style to text is tidy: .{color:red}

txt2tags, like RST, can be converted to a wide variety of output, and the syntax is interesting.  It has support for underscore and strike-out, but not super or subscript, and the table syntax is not as interesting as Texy’s.  Actually, as I re-review it, it may be the most interesting of the bunch; I’m a bit concerned about the implied dependency on tabs, which you can’t use in text fields on web pages, which would make it inappropriate for wiki.  I don’t care for the heading markup, but I think I could live with that.

Not mentioned on the Wikipedia entry is AFT (Almost Free Text), which is one of the original markup languages, and has been around for years.  The source documents don’t read quite as cleanly as other markups; it’s pretty obvious that you’re dealing with a markup language.  There’s no support for strike-out, and it requires LaTeX to generate PDF.

I just want something that reads like something you’d have composed in an email prior to HTML multipart-mime messages; simple syntax covering the basic cases (i.e., if I could do it with an old electric typewriter, the markup should be able to handle it as well.  The whole purpose of the underscore key was to underscore words; it baffles me how a modern mark-up language could leave that out.).  A rich, but easy, table syntax is a bonus, and being able to create PDF without installing a gigabyte of software (LaTeX) a must.  HTML is required, but they all generate HTML; Slidy output is more or less required, too.  Additional points are given to support for domain-specific markup, like aafigure (an awesome package for very simple diagrams) and dot – an easy plug-in for passing blocks of text to an external program, and then being able to embed that output (which will probably be SVG, or some bit image format like PNG) into the resulting document, wins special mention.

No existing markup package does everything that I want, though.  Dare I implement a new one?