Importing Legacy Data in a WebDSL Application

April 30, 2011

WebDSL has been designed for greenfield development. That is, the database schema for an application is derived from the data model defined with entity declarations. There are no provisions for mapping such a data model to a legacy database schema. This has been done consciously so that we have full control over (optimization of) the database schema and do not have to deal with all sorts of corner cases, or schemas that do not map naturally to WebDSL data models. But once in a while, one comes across some legacy data.

My original blog was a PHP Serendipity application that I had running at a hosting service. Since it died because of some changes in the PHP setup I had meant to take out the posts and convert them to my new WebDSL blog application. Today, I bit the bullet and went ahead with the conversion.

Getting the data out of the mysql database of the hosting service turned out to be less problematic than I had foreseen. The result was a bunch of mysql tables, with only one that really mattered. The ‘serendipity_entries’ table has the following shape:

mysql> show columns from serendipity_entries;
+-------------------+----------------------+------+-----+---------+----------------+
| Field             | Type                 | Null | Key | Default | Extra          |
+-------------------+----------------------+------+-----+---------+----------------+
| id                | int(11)              | NO   | PRI | NULL    | auto_increment | 
| title             | varchar(200)         | YES  | MUL | NULL    |                | 
| timestamp         | int(10) unsigned     | YES  | MUL | NULL    |                | 
| body              | text                 | YES  |     | NULL    |                | 
| comments          | int(4) unsigned      | YES  |     | 0       |                | 
| trackbacks        | int(4) unsigned      | YES  |     | 0       |                | 
| extended          | text                 | YES  |     | NULL    |                | 
| exflag            | int(1)               | YES  |     | NULL    |                | 
| author            | varchar(20)          | YES  |     | NULL    |                | 
| authorid          | int(11)              | YES  | MUL | NULL    |                | 
| isdraft           | enum('true','false') | NO   | MUL | true    |                | 
| allow_comments    | enum('true','false') | NO   |     | true    |                | 
| last_modified     | int(10) unsigned     | YES  | MUL | NULL    |                | 
| moderate_comments | enum('true','false') | NO   |     | true    |                | 
+-------------------+----------------------+------+-----+---------+----------------+

In order to convert the entries in the legacy table to the ‘Post’ format of the new application, I used the following two stage procedure that I also used for importing DBLP data into researchr. The first step is to define a WebDSL entity that reflects the structure of the original table:

entity SerendipityEntry {
  key           :: Int
  title         :: String (name)
  timestamp     :: DateTime
  last_modified :: DateTime
  body          :: WikiText
  extended      :: WikiText 
  isdraft       :: Bool
  converted     :: Bool (default=false)
}

WebDSL generates a database table for the entity with the following schema:

mysql> show columns from _SerendipityEntry;
+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| DISCRIMINATOR    | varchar(255) | NO   |     | NULL    |       | 
| id               | varchar(32)  | NO   | PRI | NULL    |       | 
| _body            | longtext     | YES  |     | NULL    |       | 
| _converted       | bit(1)       | YES  |     | NULL    |       | 
| _extended        | longtext     | YES  |     | NULL    |       | 
| _key             | int(11)      | YES  |     | NULL    |       | 
| _last_modified   | datetime     | YES  |     | NULL    |       | 
| _timestamp       | datetime     | YES  |     | NULL    |       | 
| _title           | varchar(255) | YES  |     | NULL    |       | 
| version_opt_lock | int(11)      | YES  |     | NULL    |       | 
| _isdraft         | bit(1)       | YES  |     | NULL    |       | 
+------------------+--------------+------+-----+---------+-------+

The next step is to convert the rows in the old table to rows in the new table using an SQL query. This requires adding the discriminator, that is used to distinguish the subtype in a class hierarchy; a UUID as primary key, and converting the dates from timestamp format to DateTimes.

insert into _SerendipityEntry 
  select 
    'SerendipityEntry' as _DISCRIMINATOR, 
    replace(UUID(),"-","") as id, 
    body as _body, 
    false as _converted, 
    extended as _extended, 
    id as _key, 
    from_unixtime(last_modified) as _last_modified, 
    from_unixtime(timestamp) as _timestamp, 
    title as _title, 
    0 as version_opt_lock, 
    IF(isdraft = 'true', 1, 0) as _isdraft 
  from serendipity_entries;

Now the legacy data are in a form that they can be easily processed using WebDSL code, which makes it more convenient to define the conversion to the proper ‘Post’ type that is used in the blog application. To that end, I add a convert function to the SerendipityEntry entity:

extend entity SerendipityEntry {
  function convert(public: Bool): Post { 
    var p := mainBlog().addPost();
    p.title := title;
    p.created := timestamp; 
    p.modified := last_modified; 
    p.content := body;
    p.extended := extended;
    p.public := public;
    p.commentsAllowed := false;
    converted := true;
    return p;
  }
}

The ‘addPost’ function takes care of properly creating a new post to the main blog. (The application has been written to support multiple blogs.)

The conversion function can be run automatically on all SerendipityEntry objects. However, I chose to write a little UI to do this interactively, showing a list of all unconverted posts with a link to a page showing the data of the old entry with a convert button. This way I can decide which posts to include in the new blog.

define page serendipityIndex() {
  main{ 		
    list{
      for(e: SerendipityEntry order by e.timestamp asc) {
        listitem{ output(e) }
      }
    }
  }    
}
  
define page serendipityEntry(e: SerendipityEntry) {
  action convertPublic() { return post(e.convert(true),""); }
  action convertDraft() { return post(e.convert(false),""); }
  main{
    formEntry("Key"){ output(e.key) }
    formEntry("Title"){ output(e.title) }
    formEntry("Timestamp"){ output(e.timestamp) }
    formEntry("Last modified"){ output(e.last_modified) }
    formEntry("Body"){ output(e.body) }
    formEntry("Extended"){ output(e.extended) }
    formEntry("Is Draft"){ output(e.isdraft) }
    formEntry("Converted"){ output(e.converted) }
    submit convertPublic() { "Convert to Public Post" }
    submit convertDraft() { "Convert to Draft" }
  }
}