Generating RSS Feeds

June 07, 2011
[[post(150) Another]] bug report about this blog concerned various details of the RSS feed. I gave up on RSS readers soon after I started using them because of the overflow of information they generated. I find that twitter now provides a good flow of links to interesting stuff, that is auto cleansing. (And it appears RSS might be going out of fashion in general.)

Anyway, my excuse for never looking into the details of RSS. Based on a little study of the RSS 2.0 Specification I refined the feed definition of my blog application. It should be quite reusable in other WebDSL applications, although I couldn’t figure out a way to make it completely automatic. Actually while writing that last sentence I got an idea for improving abstraction, which I’ll discuss below.

While testing the improved RSS feed, I got back into RSS reading. The tricks seems to be in not subscribing to too many feeds, and in particular to avoid the high volume feeds generated by news organizations.

RSS Auto Discovery

The first report issue was RSS auto discovery. The browser recognizes a feed provided by a page by adding a link to the header of the following form:

<link rel="alternate" type="application/rss+xml" title="RSS"
      href="http://eelcovisser.org/feed/blog">

The mechanism to add information to the head of a WebDSL page is through the includeHead built-in template. A WebDSL application typically defines one (or a few) standard page layouts with a template called main or layout, which is then reused to build pages. (I’ll discuss page layout in a separate post.) This is the place to put the auto discovery link:

define main() {
  includeHead(rendertemplate(rssLink()))
  // ... reusable page layout for site with pageheader, sidebar ...
  elements // the parameter content
}

The rendertemplate function takes a template call and turns the result of rendering it into a string. Here rssLink is a template that we can redefine for a specific section of the site.

For the blog application, I further refined the page layout, with separate templates for wiki pages and blog pages. The perfect place to define the rssLink for the blog section of the site:

define bloglayout(b: Blog) {
  define rssLink() {
    <link rel="alternate" type="application/rss+xml" title="RSS"
    href=navigate(feed("blog")) /> 
  }
  // ... blog specific ingredients ...
  main{ elements }
}

Note that the navigate to the feed page makes it unnecessary to include an absolute URL of the feed, which makes the application directly applicable for different sites. Indeed, the same code is used for the DSL Engineering site.

Generating the Feed

From the perspective of WebDSL, an RSS feed is just an ordinary page, defined using a page template. The blog application supports two feeds, one for blog posts, and one for wiki page updates.

define page feed(type: String) { 
  case(type) {
    "blog" { blogrss(mainBlog()) }
    "wiki" { wikifeed() }
  }
}

The feed for the blog simply takes a list of recent posts, creates an RSS item for each, and wraps the whole in a channel document:

define blogrss(b: Blog) { 
  rssWrapper(b.title, link(b,1), b.description as Text, b.modified){ 
    for(p: Post in b.recentPosts(1,20,false,false)) { rssPost(p) }
  }
}

An RSS item represents a story and should at least contain a title and a link. The description can be short summary of the story, or the complete text. To support summaries, I added a description property to Posts, which is preferred over content when available.

define rssPost(p: Post) {
  <item> 
    <title>output(p.title)</title>
    <link>output(permalink(p))</link>
    <description>
      if(!isEmptyString(p.description)) { 
        output(p.description)
      } else {
        output(p.content)
      }
    </description>
    <pubDate>rssDateTime(p.created)</pubDate>
    <source url=link(b,1)>output(b.title)</source>
  </item> 
}

Abstraction?

Now one could argue that the definition of rssPost above is not very abstract, since it exposes details of the RSS implementation. A more abstract approach might be to let Post implement an interface with RSS properties and ‘auto’ generate the feed based on such properties.

extend entity Post {
  rssTitle : String   := title
  pubDate  : DateTime := created
  // etc.
}

I’m not sure however, if such an approach would actually be preferable. Essentially, what the rssPost template does is define a mapping from Post objects to RSS items. The template format allows much room for flexibility. The interface approach also defines a mapping from Post objects to RSS items. It probably requires just as much code, but allows less wiggle room, but it might prevent some errors, such as the date format bug that I introduced.

Date Details

The RSS specification suggests that RSS readers should take the pubDate as the date of publication of the item. However, when I migrated old posts from another blog app using older dates, Google reader used the day of first appearance, rather than the specified date.

A particular detail that I got wrong originally was the date format, which caused RSS readers to misread the date. This was easily fixed by defining the rssDateTime template as follows:

define rssDateTime(d: DateTime) {
  output(d.format("EEE, dd MMM yyyy hh:mm:ss zzz"))
}

Channel

The rssWrapper template is a reusable definition for RSS channels in my WebDSL library. It takes title, url, description, and publication date as parameters:

define rssWrapper(title: String, url: String, desc: Text, pubDate: DateTime) {
  mimetype("application/rss+xml")
  <rss version="2.0">
    <channel> 
      <title>output(title)</title>
      <link>output(url)</link>
      if(!isEmptyString(desc)){ <description>output(desc)</description> }
      if(pubDate != null) { <pubDate>rssDateTime(pubDate)</pubDate> }
      <docs>"http://www.rssboard.org/rss-specification"</docs>
      elements
    </channel>
  </rss>
}

Note that the templates defines the mimetype of the page.

Validation

The RSS feed appears to work fine with Google reader, but not with NetNewsWire, which complains about RSS feed validation errors. Apparently HTML markup is not allowed within the ‘description’ tag.