12/15/2011

Using the Async Module for Rational Flow Control

Joe McCann

TL;DR

The bottom line: the async module provides you a way to write asynchronous code in a clean, readable and useful fashion. It should be yet another tool in your kit for node programming.

Flow

Flow control is hotly contested debate when it comes to node.js development. Many people from more traditional web development backgrounds have difficulty overcoming what could end up as "callback hell" or just plain ole spaghetti code. However, there is a module that I use for my blogging app (this blog) that makes the concept of waiting for other asynchronous things to complete much easier to manage. The module is called async.

Async is a feature-rich node.js module that allows you to do many fancy things like execute asynchronous code in parallel or in a series and even has convenience methods like forEach and map.

Typically, flow control libraries show examples (if any) of how you can write to a file in a series, but I tend to never find examples that are similar to what I'm trying to do for a real application. So I figured I'd share my experience of how I am using async in a real word sense.

Dynamic Articles Listing
--

If you navigate to http://subprint.com/blog you'll notice a list of the all of my articles with the latest article at the top. This list is generated dynamically based on the markdown files in my posts directory (I'm using dillinger.io to write the markdown files).

The following function is called in order to make this happen:

// This function grabs the list of post ids, and creates a block of 
// HTML that is stashed in blogPosts object.
function stashBlogPostData(){

  // Grab the markdown IDs objects.  One of them looks like this:
  /*
    {
      href: /blog/using-the-async-module-for-rational-flow-control
    , title: "Using The Async Module For Rational Flow Control"
    , teaser: "Flow control is hotly contested debate...shortened for brevity    
    , timestamp: Date
    }
  */
  var ids = watcher.getListMdIds()
    , count = 0
    , list = ''

  // Snag the EJS template for rendering later on.
  var ejsTemplate = fs.readFileSync(__dirname + '/views/blog-post-list-item.ejs', 'utf8');

  var len = ids.length

  // We need to do this many times depending on number of posts.
  // Eventually this won't be super scalable with thousands of blog posts.
  async.whilst(
      function (){ return count < len },
      function (callback) {

        var htmlFilename = __dirname + "/public/posts/" + ids[count].name +".html"

        // scraper is a small module I wrote that uses jsdom to scrape out portions
        // of static HTML pages, namely these parts:

        /*
          var schema = {
            href: $('#blog h1 a').attr('href')
          , title: $('title').text()
          , teaser: $('.post p:first').html()
          , timestamp: ''
        }

        */

        // the key here is in order to iterate over all the HTML files, I am using the
        // filesystem to read in each file asynchronously and extracting the data in the 
        // callback.

        scraper.getPostListItem(htmlFilename, function(err,data){
          if(err) {
            console.error(err)
            count++
          }
          else{

            // We add the timestamp to each data object (the "schema" object from above)
            data.timestamp = ids[count].timestamp

            // Now we render against our ejs template to create a block of HTML
            list += ejs.render(ejsTemplate, data)

            // Increase the count
            count++

            // Do it again
            callback()
          }
        })  // end getPostListItem()

      },
      function (err) {
        // TODO: Eventually store this in Redis or a flat file or something else.
        blogPosts.previewList = list
        // console.log(blogPosts.previewList)
      }
  ) // end whilst()

} // end stash

If you read the comments in the source code, hopefully it is somewhat clear what's going on here. Essentially I am grabbing the list of IDs for the blog posts and then iterating over each one of them with the async#whilst() method. The callback in the whilst method is calling out to a method in my scraper object (which uses jsdom) which makes use of the filesystem and reads in a file, scrapes data (text) out of it and returns only to do it again until the count < len condition is false.

This is super useful for a couple of reasons. One, as the number of blog posts increases, I don't have to actually modify any of the code (note, this is arguably not a truly scalable way of doing this, but great for the purposes of this exercise), meaning, I don't have to keep making more and more asynchronous callbacks which leads to the second reason: no spaghetti code!

Because we are iterating over an array of ids, it feels like we are simply iterating over an array of ids! If you are still wrapping your head around the art of programming in node, then trust me, this is a Godsend.