William John Bert

Loves to write words and code.

Using Github Pages to Hand Off a Legacy Site and Make Everyone Happier

Here’s how I turned over maintenance of a legacy site – built as a one-off project years ago using now outdated technology – to my non-techincal cofounder, with only a few hours of work. Best of all, it now uses evergreen technology that will make it easy for her to update for years to come, and everyone is happy with the outcome.

The problem

I created Call and Response when my friend Kira and I decided to co-curate an art show of that name in 2010. It became an annual event, and each year I updated the website with the new participants and details. This year, Kira is continuing the show with support from others because I’m out of capacity to help.

Kira needs to be be able to update the website. It lists what the concept behind the show is; who’s participating; where the show is; when it opens; and more. I created the original website using Django as a learning exercise when I was fairly new to web development. I went all out by my standards of the time, creating a simple micro-CMS that helped me learn about the MVC pattern, etc.

However, the project had no use outside of this particular website. It could not be generalized without a significant overhaul, and I had no need to generalize it. In fact, I had no interest in maintaining it or even working with it. I built it with Django 1.3, and hosted it with a service that I don’t use anymore.

Usually, in such a case, it’s tough luck for the the web developer. Kira is not a web developer and has no interest in becoming one; she just wants to be able to update her website. The responsible thing is to step up to to the task, and keep the site going.

This time I had an idea for how to reach the end goal of the site being updated in a timely way yet still ditching the legacy code. I thought I saw a path forward to handing it off to her in a way that would let her easily edit text, links, and images, and create new pages.

The solution

Here’s how I converted the site from a Django project that only someone who knew Python could update into a flat site that anyone could edit by knowing only the most basic HTML:

\1. Scrape.

This seemed like something I could do fairly easily with my current web programming tool of choice – Node.js. I just had too download and write to disk every page I could find by following links that had the same host. But often with things that seem straightforward, the devil’s often in the details, and anyway, I figured someone else must have wanted to do this, so there had to be a tool for it, which would turn this from fairly easily into dead simple.

The tool I found is called HTTrack.

$ brew install httrack

I ran it against Call and Response, selecting all the default options.

$ httrack "www.callandresponsedc.org" -v

Lo and behold, it did exactly what I wanted. Now I had a full static version of the site.

\2. Host.

I needed to host this static site somewhere free, reliable, and easy to use. I created a Github organization, initialized a git repo, created a branch named gh-pages, and pushed it to Github. Now the site was up at callandresponsedc.github.io/callandresponsedc, (albeit in a broken form due to the static assets’ root-relative paths resulting in 404s).

Then I added a CNAME for callandresponsedc.org, and I switched the DNS from my old hosting provider to the name service, and added the right DNS entries

Once the DNS changes took effect, the new static site was up where it needed to be: http://callandresponsedc.org. And it looked exactly as it had before: to any visitor who saw it before the change and again after, no difference would have been apparent.

\3. Enable.

While I’d waited for the DNS to change, I’d asked Kira to create a github account, added her to the organization, and spent 15 minutes writing up a brief guide to how to edit the HTML pages using Github.

Thanks to github’s editing tools, she now had the ability to make changes that would take effect instantly.

Technically, at this point, I was no longer needed. I could have handed it over now. Kira was enabled to do everything she needed to do. But I would have had to add a caveat: “By the way, when you want to change anything in the menu, or header, or footer, or to add a new logo or change colors, you have to do it on every single page.”

Because Httrack, good as it is, is not good enough to produce DRY results. Each page of the static site repeated all the same HTML structure for its header and footer and body, and contained its own set of inline styles. Changing any part of that would have meant changing every page. There weren’t a lot pages, but nonetheless, I didn’t want to force that burden on K.

I could do better.

\4. Refactor.

Large parts of the pages were identical, so I could factor them out into includes, and make a couple of simple layout templates composed of these. Same idea with the stylesheet.

To make this work, I had to use something slightly more complicated than flat HTML files. I decided to convert the site to use (Github-flavored) Jekyll. I won’t repeat the instructions needed to get that up and running; just follow the link, and do what it says. It takes on the order of a few minutes.

To refactor, I just needed to figure out exactly what was common to all pages, hack it out into includes, and reassemble the includes as templates. Then I’d specify which template to use via YAML front-matter on each page, and voila, my Jekyll site would be done. Because I use Emacs, I used ediff, but you could do this with any diffing tool and text editor.

A nice side effect of this was that the pages that Kira would be editing got much simple. They went from something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<!DOCTYPE HTML>
<html lang="en">
  <meta http-equiv="content-type" content="text/html;charset=utf-8" />
<head>
  <title>About - Call + Response
  </title>
  <script src="http://platform.twitter.com/anywhere.js?id=IrVoVLmkJDVw9Pagwdxsow&amp;v=1"
          type="text/javascript">
  </script>
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta name="description" content="Call + Response is an art show in Washington, DC, pairing writers and artists.">
  <meta name="author" content="William John Bert">

  <link rel="stylesheet" type="text/css" href="/static/fonts/fonts.css">
  <link rel="stylesheet" type="text/css" href="/static/css/bootstrap.css"/>
  <link rel="stylesheet" type="text/css" href="/static/css/bootstrap-responsive.css"/>
  <link rel="stylesheet" type="text/css" href="/static/css/style.css"/>

</head>

<div id="fb-root"></div>
<script>(function(d, s, id) {
    var js, fjs = d.getElementsByTagName(s)[0];
    if (d.getElementById(id)) return;
    js = d.createElement(s); js.id = id;
    js.src = "http://connect.facebook.net/en_US/all.js#xfbml=1&appId=398123750217250";
    fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));</script>


  <body>
    <div class="navbar navbar-fixed-top">
  <div class="navbar-inner">
    <div class="container-fluid">
      <a class="brand" href="/">Call + Response
      </a>
    </div>
  </div>
</div>

    <div class="container-fluid">
      <div class="row-fluid">
        <div class="span3">
  <div class="well sidebar-nav">
    <ul class="nav nav-list">
      <li class="">
        <a href="/about">About
        </a>
      </li>
      <li class="">
        <a href="/opening">Opening
        </a>
      </li>
      <li class="">
        <a href="/participants">Participants
        </a>
      </li>
      <li class="">
        <a href="/sponsors">Sponsors
        </a>
      </li>
      <li class="">
        <a href="/contact">Contact
        </a>
      </li>
      <li class="">
        <a href="/press">Press
        </a>
      </li>
    </ul>
  </div>
</div>

        <div class="span9">
          <div class="chiclet pull-right visible-desktop">
  <a href="https://twitter.com/share" class="twitter-share-button" data-count="none"
     data-hashtags="callandresponsedc">Tweet
  </a>
  <script>!function (d, s, id) {
    var js, fjs = d.getElementsByTagName(s)[0];
    if (!d.getElementById(id)) {
    js = d.createElement(s);
    js.id = id;
    js.src = "http://platform.twitter.com/widgets.js";
    fjs.parentNode.insertBefore(js, fjs);
    }
    }(document, "script", "twitter-wjs");
  </script>
</div>
<div class="chiclet pull-right visible-desktop">
  <div class="fb-like" data-send="true" data-width="100"
       data-show-faces="false" data-colorscheme=light>
  </div>
</div>

          <div class="swappable-content">
            <h1>About
</h1>
<p>
  Call + Response is an (almost) annual art show in the nation's capital that brings together writers and visual artists. The writers provide the call with an original piece of writing and then visual artists generate a new piece of work in response. The end result being two pieces that resonate with each other.
  [etc...]

to:

1
2
3
4
5
6
7
8
9
10
---
layout: page
title: About
---

<h1>About
</h1>
<p>
  Call + Response is an (almost) annual art show in the nation's capital that brings together writers and visual artists. The writers provide the call with an original piece of writing and then visual artists generate a new piece of work in response. The end result being two pieces that resonate with each other.
</p>

It took an hour or so to DRY out everything, and then…I was done. A site that had been a mess – one-off code written years ago for an outdated version of a framework that I don’t use anymore – had become so simple that I could hand it off to my non-technical cofounder and both of us would be happier for it!

And sure enough, since then, Kira has made numerous updates to the site, each of which would have taken at least one, maybe a couple, rounds of emails back and forth, plus a context switch for me to make the changes, and a deploy, another round of emails to confirm that everything now looked good. Repeat, repeat, repeat.

I did this with a Django site, but it’d be just as applicable with a lot of Wordpress sites, or any CMS, really. Obviously, there are many cases where this solution wouldn’t work for a variety of reasons. But if you can make it work, it’s a joy to do so!

Using a Node Repl in Emacs With Nvm and Npm

Running a repl inside Emacs is often convenient for evaluating code, checking syntax, and myriad other tasks. When I wanted to run a Node REPL, I found that I needed to do a little set up to get everything working the way I wanted.

My first question was: which Node? With nvm, I’ve installed multiple version on my machine. So I needed a way to specify one to execute.

Another question was: where to run Node? Since npm looks inside node_modules directories starting with the current directory and working up the file system hierarchy, the current working directory is important. If I want access to the npm modules installed for project A, I need to start my repl’s Node process from path/to/projectA.

But that raises another question: what happens when I want to switch to project B? Do I need to use process.chdir() to switch the Node repl’s current working directory to path/to/projectB? That’s clumsy and annoying.

Here’s how I answered these questions:

nvm.el gives you nvm-use to activate a version of Node within Emacs. It’s basically a nice wrapper around setting the enviroment variables NVM_BIN and NVM_PATH and adding the path to the Node version you want to use to your PATH. Great!

Except for one problem: nvm-use isn’t interactive. It’s meant to be use programmatically. So I needed to write a small do-nvm-use wrapper that lets me specify a version and then activate it:

1
2
3
4
5
6
(require-package 'nvm)

(defun do-nvm-use (version)
  (interactive "sVersion: ")
  (nvm-use version)
  (exec-path-from-shell-copy-env "PATH"))

To specify where to run Node, I wrote another small defun, named run-node, that prompts for a directory in which to start Node. Before it does this, though, it checks whether a program named node is in the exec-path, and if not, it runs do-nvm-use first. Once we have a Node to execute and a directory to execute it in, we can make a new comint buffer bound to the repl process.

To address the issue of different repls needing to be run for different projects, run-node adds the cwd to the buffer name. Repls for project A and project B will live in buffers named *-node-repl-path/to/projectA and *-node-repl-path/to/projectB, respectively—making switching to the right buffer with ido trivial.

1
2
3
4
5
6
(defun run-node (cwd)
  (interactive "DDirectory: ")
  (unless (executable-find "node")
    (call-interactively 'do-nvm-use))
  (let ((default-directory cwd))
        (pop-to-buffer (make-comint (format "node-repl-%s" cwd) "node" nil "--interactive"))))

Now to start my Node repls, I just call run-node and I’m all set!

How Legit HTTP (With an Async Io Assist) Massacred My Node Workers

An uncaught exception in our Node app was causing not only one, but two and then three workers to die. (Fortunately, we hardly ever encounter uncaught exceptions. Really, just this one since launch a few months ago. We’re Node studs! Right?)

The funny thing is that we’re using Express, which (via Connect) wraps each request / response in a try / catch. And we use Express’s error handler, which returns 500 on unhandled errors.

Another funny thing is we use cluster, which isolates workers from each other. They live in separate, solipsistic processes.

But instead of returning 500, our worker simply died. And, as if in sympathy, the rest immediately followed.

Time to get to the bottom of this. A Node stud like me can figure it out. No sweat. Right?

For a sanity check, I went to Chrome and Firefox’s network inspectors. Only one POST, the bad request that triggered the exception. Everything else looks normal. Sanity: verified.

Then it was on to the cluster module. That magical “OS load balancing” seemed highly suspicious. But nope, I asked in #nodejs and they said that only applies at the TCP connection level. Once a connection is assigned to a worker, it never goes to another worker. Meaning that the bad request was isolated—only the worker who received the initial connection could encounter it.

But the workers kept on dying.

These workers morted out fast. They didn’t even return 500, or any kind of response. The more I thought about it, that didn’t really seem right. Not right at all. Why no 500?

But I can only tackle one mystery at a time. I wanted to understand: why did so many workers die?

Furious googling ensued. My efforts were rewarded with this nugget:

If an HTTP/1.1 client sends a request which includes a request body, but which does not include an Expect request-header field with the “100-continue” expectation, and if the client is not directly connected to an HTTP/1.1 origin server, and if the client sees the connection close before receiving any status from the server, the client SHOULD retry the request.

(From the HTTP 1.1 spec, RFC 2616. Original hat tip, which links to this informative post about double HTTP requests.)

My mind was somewhat blown. The browers were right after all. They were just following HTTP. And—helpfully!—hiding the resent POSTs from the network inspector.

But POSTs are dangerous. They mutate resources! I must only click the Order button once or I may get charged multiple times!

I had a thought. One I have often, yet each time, it seems new again: I have much to learn.

Back to the 500s. Or lack thereof. Which got funnier still when I realized that other errors in our controllers that threw exceptions did return 500s. Being a hands-on kind of guy, I added one right at the top of a route controller: throw new Error("uh-oh"). My dev server spat back: 500 Error: uh-oh.

So why did that one particular error never, ever return a 500, or any response of any kind?

It’s my fault, really. I’m still a Node newbie (I must never forget this). I had missed that because async IO callbacks occur in a different call stack from the request / response cyle, one that originates from the event loop, Express’s try / catch doesn’t catch them.

It makes total sense. I have much to learn.

So what to do? require('domain') to the rescue. I can write some middleware (a bit of this, a dash of that) to wrap the request / response in a domain.

But how do I get this domain into my controller? My solution was to attach it to res.locals._domain. Good solution? I don’t know. I suspect there’s a better way. Good enough? It solved my immediate problem:

1
2
3
Model.find({key: value}, res.locals._domain.bind(function(err, docs) {
  // This callback can throw all it wants. My domain will catch it.
}));

Sweet. Now, armed with a reference to res in the domain error handler, I can return a 500. Voila, the browser gets its response. No more helpful resent POSTs. The silent gratitude of the spared workers is its own reward.

Except, do I need to bind every mongoose and other kind of async IO operation in my app? There are many.

Many.

I have much to learn.

Allow CORS With Localhost in Chrome

Today I spent some time wrestling with the notorious same origin policy in order to get CORS (cross-origin resource sharing) working in Chrome for development work I was doing between two applications running on localhost. Setting the Access-Control-Allow-Origin header to * seemed to have no effect, and this bug report nearly led me to believe that was due to a bug in Chrome that made CORS with localhost impossible. It’s not. It turned out that I also needed some other CORs-related headers: Access-Control-Allow-Headers and Access-Control-Allow-Methods.

This (slightly generalized) snippet of Express.js middleware is what ended up working for me:

1
2
3
4
5
6
app.all("/api/*", function(req, res, next) {
  res.header("Access-Control-Allow-Origin", "*");
  res.header("Access-Control-Allow-Headers", "Cache-Control, Pragma, Origin, Authorization, Content-Type, X-Requested-With");
  res.header("Access-Control-Allow-Methods", "GET, PUT, POST");
  return next();
});

With that, Chrome started making OPTIONS requests when I wanted to POST from localhost:3001 to localhost:2002. It seems that using contentType: application/json for POSTs forces CORS preflighting, which surprised me since it seems like a common case for APIs, but no matter:

1
2
3
4
5
6
app.all("/api/*", function(req, res, next) {
  if (req.method.toLowerCase() !== "options") {
    return next();
  }
  return res.send(204);
});

Emacs Cl-lib Madness

Emacs 24.3 renamed the Common Lisp emulation package from cl to cl-lib. The release notes say that cl in 24.3 is now “a bunch of aliases that provide the old, non-prefixed names”, but I encountered some problems with certain packages searching for–as best I can determine–function names that at some point changed but were not kept around as aliases. This was particularly problematic when trying to run 24.3 on OS X 10.6.8.

In case anyone else runs into this problem, here’s my solution:

1
2
3
4
5
6
7
8
;; Require Common Lisp. (cl in <=24.2, cl-lib in >=24.3.)
(if (require 'cl-lib nil t)
  (progn
    (defalias 'cl-block-wrapper 'identity)
    (defalias 'member* 'cl-member)
    (defalias 'adjoin 'cl-adjoin))
  ;; Else we're on an older version so require cl.
  (require 'cl))

We try to require cl-lib, and when that succeeds, define some aliases so that packages don’t complain about missing cl-block-wrapper, member*, and adjoin. If it doesn’t succeed, we’re on an older Emacs, so require the old cl.

Juxtaposition

A few days ago, I happened by chance to read these two articles one after the other:

The first is about how good Unix is at scaling the scheduling and distribution of work among processes. The second is about how Unix is the problem when it comes to the scheduling and distribution of work at scale.

The question, of course, is “What scale?”. Like the difference between a cure and a poison is sometimes the dosage.

Zero to Node, Again

At NodeDC’s January meetup, I’ll be giving a reprise of my Zero to Node talk, about designing, coding, and launching my first web service using Node.js. The meetup is Wednesday, Jan 23, at Stetson’s (1610 U St NW). Hope to see you there!

Review of Requests 1.0

Author’s note: This piece was originally published in the excellent literary journal DIAGRAM, Issue 12.6. I’m re-publishing here for formatting reasons.

Identification with another is addictive: some of my life’s most profound, memorable experiences have come when something bridged the gap between me and another human. Because I’m a reader, this can occur across the distance of space and time. It’s happened with minor Chekov characters, and at the end of Kate Mansfield stories. It happens again and again with Norman Rush and George Saunders. The author has pushed a character through the page and connected with me on a deep level: identification.

Identification happens with computer programming, too.

I say this as a reader, writer, and programmer: I experience identification when reading and programming, and I strive to create it when writing and programming.

Though they deal with the messiness of reality differently, several techniques common to both disciplines enable them to achieve this mental intimacy: navigating complexity; avoiding pitfalls that inhibit communication; choosing structure wisely; harnessing expressive power; and inhabiting other minds. The Requests library, a work of computer programming by Kenneth Reitz, illustrates this.