intertwingly

It’s just data

Service Workers - First Impressions


Created by potrace 1.13, written by Peter Selinger 2001-2015

Successes, progress, and stumbling blocks encountered while exploring Service Workers.

Preface

The Apache Whimsy Board Agenda tool is designed to make ASF Board meetings run more smoothly. It does this by downloading all of the provided reports and collating them with comments, prior comments, action items, minutes, links to prior reports, links to committee information, and the like. It provides a UI to allow Directors and guests to enter comments. It provides a UI to allow the Secretary to take minutes.

The tool itself is built using React.JS. It starts by downloading all of the reports. Navigation between reports can be done via mouse clicks or cursor keys and doesn't involve any server interaction. As new data is received, the window is updated.

Finally, I'm new to Service Workers so I may be doing things in a profoundly dumb way. Any pointers would be appreciated. I am capable of RTFM and following examples.

First step - caching JSON

Some of the data (e.g., the list of ASF JIRA projects) is fetched on demand. Generally the page is first rendered using an empty list, and then updated once the actual list is received.

This process could be improved by caching the results received and using that data until fresh data arrives. As the Cache API is built on promises, and therefore asynchronous, this generally means rendering three times: once with a empty list, then with the cache, and finally with live data.

// retrieve an cached object.  Note: block may be dispatched twice,
// once with slightly stale data and once with current data
//
// Note: caches only work currently on Firefox and Chrome.  All
// other browsers fall back to XMLHttpRequest (AJAX).
JSONStorage.fetch = function(name, block) {
  if (typeof fetch !== 'undefined' && typeof caches !== 'undefined' && 
     (location.protocol == "https:" || location.hostname == "localhost")) {

    caches.open("board/agenda").then(function(cache) {
      var fetched = null;
      clock_counter++;

      // construct arguments to fetch
      var args = {
        method: "get",
        credentials: "include",
        headers: {Accept: "application/json"}
      };

      // dispatch request
      fetch("../json/" + name, args).then(function(response) {
        cache.put(name, response.clone());

        response.json().then(function(json) {
          if (!fetched || JSON.stringify(fetched) != JSON.stringify(json)) {
            if (!fetched) clock_counter--;
            fetched = json;
            if (json) block(json);
            Main.refresh()
          }
        })
      });

      // check cache
      cache.match(name).then(function(response) {
        if (response && !fetched) {
          response.json().then(function(json) {
            clock_counter--;
            fetched = json;
            if (json) block(json);
            Main.refresh()
          })
        }
      })
    })
  } else if (typeof XMLHttpRequest !== 'undefined') {
    // retrieve from the network only
    retrieve(name, "json", function(item) {return item.block})
  }
}

All in all remarkably painless and completely transparent to the calling application. Doesn't involve the activation of Service Workers, but it doesn't have to.

Second step - caching HTML

What's true for JSON should also be true for HTML. Prior to the caching logic introduced above, and continuing to be true for browsers that don't support the service workers caching interface, data that should appear on the page would be missing briefly and show up a second or two later. In the case of HTML, that data would be the entire page. This would typically be seen both on the initial page load as well as any time a link is opened in a new tab.

The HTML case is both simpler and more difficult. Fetching the HTML from cache and then replacing it wholesale from the network, while possible, would be jarring. Fortunately, there already is logic in place to update the content of the pages based on updates received by XHR. So initially displaying where the user last left off, as well as updating the cache, is sufficient.

Unfortunately, it isn't quite so simple. I've included the current code below complete with log statements and dead ends.

// simple hashcode to prevent authorization from leaking
var hashcode = function(s) {
  return s && s.split("").reduce(
    function(a, b) {
      return ((a << 5) - a) + b.charCodeAt(0)
    },

    0
  )
};

var status = {auth: null};

this.addEventListener("fetch", function(event) {
  var scope = this.registration.scope;
  var url = event.request.url;
  var path = url.slice(scope.length);
  var auth = hashcode(event.request.headers.get("Authorization"));

  if (/^\d\d\d\d-\d\d-\d\d//.test(path) && event.request.method == "GET") {
    console.log("Handling fetch event for", event.request.url);

    event.respondWith(caches.open("board/agenda").then(function(cache) {
      return cache.match(path).then(function(cached) {
        if (cached) console.log("matched");
        console.log("auth", auth, status.auth);

        if (!auth || auth != status.auth) {
          // the following doesn't work
          cached = new Response("Unauthorized", {
            status: 401,
            statusText: "Unauthorized",
            headers: {"WWW-Authenticate": "Basic realm="ASF Members and Officers""}
          });

          // fallback: ignore cache unless authorized
          cached = null
        };

        if (cached) console.log("serving from cache");

        var network = fetch(event.request).then(function(response) {
          if (!cached) console.log("fetching from network");
          if (cached) console.log("updating cache");
          console.log(response);
          if (response.ok) cache.put(path, response.clone());
          status.auth = auth;
          return response
        });

        return cached || network
      })
    }))
  } else if (auth) {
    // capture authorization from other pages, if provided
    status.auth = auth
  }
})

The primary problem is that the board agenda tool requires authentication to use as the data presented may contain Apache Software Foundation confidential information.

Without accounting for this, what often would be placed into the cache would be the HTTP 401 challenge response. That's not desirable.

Attempting to force the return of a challenge when an Authorization header is not present results in the display of the challenge response. Again, not what we want.

Falling back to only providing the cached data when the Authorization header is present (and matches the one used for the cache) results in the cache being used sometimes with Firefox. And, unfortunately, never with Chrome.

A secondary problem, of lesser importance, is that the cache never gets updated if the service worker responds with a cache copy. Of if it does, the console.log messages aren't getting executed or aren't producing output.

Third step - monitoring

To help with debugging, it occurred to me that it would make sense to produce a page that shows Service Worker and Cache status.

Plans

One thing I haven't explored yet is replacing the fetch call with one with different values for the request mode and credentials mode. I figured I would ask for guidance before proceeding down that path.

Once caching HTML is mastered, caching related artifacts like stylesheets and javascripts would be in order. An online fallfack approach would likely be the best match.

After that, the next order of business would be queuing of updates while offline. While in general, that would be a hard problem, in this case as user operations are limited by role and generaally to editing their own changes, it should be manageable.