Terseness at the Expense of Readability in JavaScript

You finish your function, and stare back at it. A thought occurs: “could this not be turned into a single statement?”. Your excitement builds you as you fantasise about a giant chain of map/reduce calls that will accomplish the same thing. You know it’ll be satisfying to reduce those ten lines to just four. You also know that it’ll likely be harder to read afterwards.

Few programmers are immune to this.

I was once supervising a lab session for students attempting to write their own version of the go-to multidimensional visualisation technique parallel coordinates using the quite wonderful D3.js library. The plot in general is often referred to as a parallel coordinates plot (with the slightly unfortunate initialism PCP).

Parallel Coordinates, from Wikipedia

Many of these students had never worked with JavaScript before, and those that had were at varying levels of intermediate familiarity. Despite this, there were no major misunderstandings of JavaScript’s strange quirks that tend to bite newcomers. Understanding parallel coordinates plots too was easy for them, because they are all smart.

In the lectures, we encouraged students wishing to begin a D3.js visualisation by finding something similar to their end goal, and using that as a guide for how the goal may be achieved in a D3-like way. The important thing with this method of course is not to blindly copy and paste code – external snippets of code should be understood in terms of semantics and end goal before reimplementing them.

Look, I Found Some Code to do it!

The true challenge in implementing the plots laid in the dataset we provided – it contains both categorical and numerical dimensions. Most online examples of PCPs deal only with numerical dimensions. Therefore, it was not quite as simple as the standard workflow of using d3.extent() to define the domain of each dimension. Somewhere in there, a conditional was required to select the appropriate scaling function type (i.e., linear or ordinal) based on the current dimension.

So far, so good. During the class, a student asked me about a section of code found on bl.ocks that seemed to provide a good starting point for defining the scales. The code is from Mike Bostock’s example of a PCP created on a dataset representing attributes of cars such as MPG, horsepower, and so on. Here it is:

Here, x is an ordinal scale function for the x-axis. The x-axis will lay out the vertical axes from left to right using rangePoints as the output range and the names of the dimensions as the input domain.

The student was struggling to understand the above block of code, which based on the comment at the top promised to extract a list of dimensions and create a scale for each. The student however did not see how the code accomplished this goal.

I could really empathise. The code, while compact (and quite beautiful, in my opinion, as most of Mike Bostock’s code is), is difficult to follow even for those familiar with JavaScript. It mixes multiple concerns, nests confusingly, and performs operations where those operations simply should not be performed. As an exercise in terseness, it’s a wonderful piece of engineering. Its readability however can be improved.

What is the key to understanding the code, and how could we rewrite it to be easier to understand?

What’s Going On in the Snippet?

There are a number of things going on in this block. Probably too many things, actually. Let’s tease them out!

Firstly, cars is a dataset read in by d3.csv() which smartly provides a nice array of dictionary objects from a CSV file. The first row is interpreted as the header row to create key names for the dictionaries. cars looks like this:

Screen Shot 2016-02-20 at 11.41.43

The first step of creating our plot is to figure out what dimensions we have. In the code, this is accomplished by calling d3.keys on the first item in the cars array which returns an array of strings representing the key names (note: it doesn’t have to be the first – it’s arbitrary, since all dictionaries in the array contain the same keys). Here’s the code again, in full:

Filtering out the categorical dimension

Next, on this key names array, filter() is called to filter out the “name” dimension of the dataset so that only numerical dimensions are included in the visualisation. Fair enough. We can see that happening by the use of return d != "name" inside the filter which returns everything except the “name” item. This is the first of two boolean statements joined with a logical AND operator &&.

Coercing an assignment to a boolean

The second boolean statement after the && will always evaluate to true, and is used to create a D3 scale for the dimension. If the current dimension fails the test d != "name", then the JavaScript virtual machine does not execute the second boolean statement, since false AND anything is always false. Therefore, the second statement (after the &&) is executed only for numerical dimensions. This second statement creates the D3 scale for the dimension, and inserts the scale into a dictionary y for later:

At this point, you might be wondering “but how does this expression evaluate to a boolean in order to complete the logical AND?”. The value of the expression above is the value inserted into the dictionary; that is, the D3 scale object. If somehow D3 had a problem creating the scale and returned undefined, we’d end up with undefined as the result.

JavaScript must now coerce the result into a boolean to figure out how “true” it is for completing the logical-AND. Coercing an object into a boolean always results in true. Some other things like 0, or undefined, evaluate to false:

Screen Shot 2016-02-20 at 11.41.43

So providing that it’s not the dimension “name”, and that D3 creates the scale without returning undefined somehow, then this dimension will feature in our array.

Finishing off: Setting the domain

So let’s look at the overall code again. We now understand what the expression inside filter() is doing:

If the dimension is “name”, then false is returned immediately. Otherwise we evaluate the second statement which creates the D3 scale for the axis which we now know isn’t the “name” axis.

The result of filter(), despite everything going on inside it, is every non-“name” dimension, which is assigned to a variable dimensions and finally passed as the domain of the D3 scale x.

So What are the Problems?

Phew! It’s straightforward to comprehend once you break it down into its individual components, isn’t it?

However, the fact that such a lengthy explanation was required for such a simple sequence of steps is perhaps an indication that this code should not be written this way. Let’s break down the problems with it.

Filter() shouldn’t mutate external state

The purpose of filter() is to simply create a new array based on the boolean-returning function that you provide. In our code snippet, we are mutating external state y within filter. This is confusing because people expect filter to just filter. They don’t expect it to have the side effect of mutating some state outside of the scope of the anonymous function. For that, we have functions like forEach which are intended to perform arbitrary operations on each element of an array.

Fundamentally, we’re trying to do too much in a place that wasn’t designed for it. The creation of y inside the filter confuses the reader’s understanding of where they are in the heavily-nested call.

Result of an assignment expression should be used sparingly

The second problem is the reliance upon the result of assignment in JavaScript. The result of assignment to an existing variable is the value being assigned, and is well defined in the JavaScript specification. No problems there.

The fact that it is well-defined however doesn’t make it acceptable in the context of code intended to instruct, as it is not quite common knowledge. Even programmers regularly writing chained statements such as var a = b = c = 5; may not make the connection to something the latter half of d != "name" && (y[d] = d3.scale.linear()) because the expression appears in a boolean context.

Doing too much in a function parameter

The assignment to dimensions also uses the result of a variable assignment:

When you look at the overall expression, it’s quite hard to figure out what is happening. Not just because of the issues we’ve discussed above, but because everything is wrapped in this call to scale.domain(), which is itself the result of a value assignment.

Not only can we not easily see that domain() is only actually taking one value, and that we’re doing everything required to create that value inline, we’re mutating the external state twice in the form of y and dimensions.

How Could we Make it Clearer?

Lots of readers will completely disagree with my analysis. The code would be reasonably clear to experienced JavaScript developers, is not designed to be industrial-strength, and does benefit slightly from being terse as you can fit more of this boilerplate-like setup code onto the page.

However, as a teaching aid, I think we can justify making it more accessible to newcomers. Let’s do that now.

We can break the code block into two concerns. First, we need to determine which dimensions in our dataset to visualise, and save the dimension array into dimensions and pass it to x.domain():

These two lines accomplish this clearly. filter() is doing only one job – providing a boolean to filter our keys array, and the assignment to dimensions and its passing to d3.scale.domain() are clearly delineated.

Finally, now that we know the list of dimensions, we need to create a y-scale for each dimension, and save each into a dictionary y:

If JavaScript supported dictionary comprehensions, a la Python, it’d be even clearer. In JavaScript, we can use Array‘s forEach() which is more suitable for mutating external state since it does not produce a result by itself.

We can go on to make this block even clearer still by separating out the concern of computing the extent:

Making the code more implicitly clear in this way has the secondary benefit of allowing us to be more explicitly clear because it provides more opportunities for writing meaningful comments about each distinct operation being performed.

Wrapping Up

So, uh, this was a lot longer than I had expected, and probably far too in-depth for a relatively simple set of operations. The issue of lines-of-code (LoC) versus readability has been discussed to death, and a common question for beginners is “how many lines should my function be?”.

The right question should be “how can I make the steps I am trying to accomplish read like a sequence of sentences?”. The original code in this respect does not quite read like a sequence of sentences (e.g, first do this, then do that, then take the result…) because there’s simply too much going on, in confusing orders of evaluation, and in places where those operations should not even be happening.

By breaking apart the individual steps to achieve the end result, the code can be read from start to finish in an ordered, imperative manner, and also understood in terms of how it will be evaluated at runtime. In this way, the code is far less daunting to newcomers and its intent can be quickly understood by almost any reasonably competent programmer.

Leave a Reply