The awful state of Javascript documentation tools

Recently, I’ve been looking all over the net looking for the One True Way to document Javascript programs and I must say I’m a little disappointed. Long story short, I have the following requirements for documenting Backgrid.js:

  • Automatic API summary based on function signature
  • Able to understand and document common JS idioms such as throw, extending a prototype with an object literal, triggering events… etc.
  • Embed code examples in Markdown
  • Gives reasonable error messages
  • Saves parsed documentation info in some metadata JSON file

You would think that lots of tools do that (cause they claim they can), but here’s the reality:

JSDoc-toolkit, JSDoc3

The JSDoc related tools are the heavy-weights when it comes to documenting API, and by that I mean really heavy in memory usage and very slow. They are forever tied to Rhino and support dozens of tags and aliases, mostly because many tags have to be used in groups to be meaningful, ex: @param, @type, @default etc. The syntax for documenting anything is way more verbose than the actual JS code. There is no way to document default values for option hashes. These tools seem to get utterly confused when multiple @default exists in a block. JSDoc3 also have a tendency to silently ignore a lot of questionable things, like if I have a @method tag on a function value assignment inside an object literal, but the object literal is not declared to be a part of anything using @lends. I also had a lot of trouble getting an indented block of example code snippet separated from anther description paragraph below. Is that even possible in JSDoc? JSDoc3 seems to be able to generate JSON data file tho, so that’s good.

YUIDoc

This one actually came close to satisfying all my requirements, but sadly there are a number of deal-breakers. First, it doesn’t understand @throw. Has the same problem as JSDoc with examples, but using Markdown solves it. The JSDoc-like tags are also incompatible with other JSDoc implementations like JSDoc3 and Closure compiler, most notably @class and @constructor. YUIDoc also generates a reasonable JSON data file by default, albeit in a completely different format from anything else.

Doctrine

This is also very promising tool that uses Esprima as the parser. It’s still quite alpha at the moment but it claims to support Google’s Closure JSDoc syntax. For reasons beyond my understanding Closure actually has the most elaborate and well defined syntax for documenting API signatures… except that it didn’t define what comes after = and complains loudly when something comes after it. So I can’t document default values, types are fine tho. I would like to see Doctrine further developed into a CLI tool with meaningful error messages.

Dox

Awful, or that I’m stupid because many people recommend it and I find it the least helpful. Judging from its source code and the JSON output, it seems it only understands 8 JSDoc tags. Oh right, it generates barely formatted Markdown. Meh.

Panino

Panino is a Cloud9 project and uses pdoc’s syntax for documenting signature, which is very very succinct and nice, but it also suffers from the problem of not being able to document default values easily. Having to prepend a * on every line is also not cool. It doesn’t install well in node because the executable doesn’t get installed into prefix/bin. Finally it insists on linkifying every type symbol which is also annoying. I mean, where do I point people to for Object? The Ecmascript PDF spec?

JFDoc

Another promising tool that uses Esprima to parse JS source code, but it’s also fairly incomplete and the master is very very broken. Too bad the maintainer is writing a Ph.D dissertation and doesn’t have time to maintain it at the moment.

Docco

I’m very sympathetic to the cause of Docco & friends after bumping into so much headache. Good JS documentation tools is hard to write and get right. Docco is simple and annotates source code well, too bad it doesn’t suit my purpose.

Am I missing anything out there that I should take a look?

Oh BTW, I have one request to all the node.js CLI tool authors out there, please catch your errors at the right places and give the end users some reasonable feedback so they can amend their input. Thanks.

Updated 2012-11-25

JSDuck

Since this post was published a few people have told me about the awesome capability JSDuck possesses so I decided to give it a try. At last, I’ve found The One True Way to write documentation for Javascript projects. It does everything I ask for. It is fast, generates useful warnings, has succinct syntax because it uses Esprima to parse the source code. It also support Closure’s type expression with it’s own addition. It has syntax support for option hashes, and it can format examples and Markdown fairly well (yay!), best of breed I must say. It also generates a bunch of JSONP metadata files. The generated HTML doc viewer is also very nicely done. Overall, I don’t have much to complain, except JSDuck is a Sencha/ExtJS project and comes with a GPLv3 license, which presents a lot of legal headaches as to what terms it places on your documentation and source code under what conditions. I’m still very puzzled at this point so just in case, I’ve re-licensed al the documents under GPLv3 and removed the JS source code the generated API docs link to.

The Conservation of Hairiness

Recently, or perhaps for as long as there have been technical decisions to be made, I've been hearing endless arguments about this versus that that lead to nowhere at all. Emacs vs Vi, PL X vs PL Y, NoSQL vs RDBMS, DjangoORM vs SQLAlchemy... etc. I hope I can shed light on what's fundenmentally going on here. I haven't blogged in a long time, this is the first post in which I hope will be a long line of semi-regular posts that document my thoughts, philosophy and learnings from programming, and anything barely related. 

Back in the days when I was a grad student at Tufts University, I attended a guest lecture by then Project Darkstar tech lead Jim Waldo (now Harvard's CTO). At the end of the lecture, when someone asked him why he made the technical decisions in Jini, he introduced the term - The Conservation of Hairiness. Lightning bolts struck. He didn't spend too much time explaining what he meant, but here's how I came to understand this term later (adjusted by my own interpretation, so don't misattribute its entirety to Dr. Waldo please):

For any given problem, there's usually more than one subproblem

Suppose you are to design a templating library, there are a number of smaller problems you need to solve. You will need to be able to load the template files from somewhere, parse them, allow the users to do some manipulation to fill in the contents, and return the results in some representation. You can obviously break these subproblems into sub-subproblems. For example, you may want to be able to load the templetes from the file system, or memory. You get my point. I will postulate that for any given problem that takes some input and produces some output, there's at least one subproblem that is either the problem itself, or a set of smaller problems.

The solution of a problem is the sum of the solutions of its subproblems

Natually, if you manage to solve all the subproblems, you have a complete solution to the problem as a whole. I assume you have heard of the Single Responsibility Principle, Do one thing only and do it well, and the Rule of Composition, so I'm not going to dwell on this. What I'm going to tell you though, is that there are solutions out there that attempt to solve more than one problem at the same time, and there are solutions out there where the solutions for the subproblems are pretty much the same, but arranged in different orders.

Swiss-army-knife-1_1764278b

If you have a different set of subproblems, you have a different problem, and requires a different solution

This is probably the most subtle point in the entire post so I'm going to explain a bit more. Let's say you are to build your startup company's website, you have limited time, money and you don't really know the constraints, you should probably go with the most popular technologies out there right? Wrong! The correct approach is always going to be to understand the problem first, even if you have to guesstimate on some number. How many users are going to be visiting your site? What's an acceptable response time? Do your data have schemas or are they free-form? Is it going to have lots of dynamic interactions or is it just mostly static with some dynamic content? These are all proper subproblems. Once you understand the constraints, then you can go out and choose a technology stack that enables that solution.

If your requirement is simply to be able to serve 1000 requests at a time, with less than 3 seconds response time, with a predefined data model and the pages are mostly static with some dynamic content, Spending days and weeks to evaluate Django vs Pyramid or Django ORM vs SQLAlchemy is probably going to make very little difference as they all solve the same problem, with slightly different arrangements of complexity spread out across basically the same layers.

However, if your data is free-form, using a schema-less datastore is probably a more attractive solution. Picking a framework that will enable you to use a different data access layer easily is probably a better idea. In this case, you may opt for Flask + MongoDB. By the same token, evaluating whether CherryPy + MongoDB or Bottle + Riak is better is going to make very little difference unless you have more nuanced requirements.

If you don't, or don't know, it's probably better to choose a stack that doesn't prevent you from solving other subproblems. Fortunately, most tools out there, especially those that adhere to the "Do one thing only and do it well" principle fit this description.

The complexity of any given problem is a constant

I would argue that how you choose your technology stack is largely dependent on how well you can understand the problem at hand. In reality, most of us most of the time simply don't know enough about the problem to be able to completely enumerate all the subproblems, so sitting here arguing about the same nuanced points days after days for two different tools that pretty much solve the same problems with different tradeoffs is going to make very little difference. At last, you still have to take your comfort zone into account. If you've found the perfect solution but requires you to learn the pieces for months, is it worth it?

This post has gone on long enough, I hope this post can give you some perspective in how to pick the right tool to do the right job. Now please feel free to comment!