Preventing injection attacks with syntax embeddings

June 10, 2009

Our paper on StringBorg is being published by Science of Computer programming:

M. Bravenboer, E. Dolstra, and E. Visser. Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach. Science of Computer Programming, 2009.

StringBorg is a technique for embedding ‘string’ languages in general purpose languages in a safe way, to avoid injection attacks.

The paradigmatic example is the embedding of SQL queries, which typically is done using string literals as in the following example:

  String userName = getParam("userName");
  String password = getParam("password");
  String query = "SELECT id FROM users "
                      + "WHERE name = ’" + userName + "’ "
                      + "AND password = ’" + password + "’";
   if (executeQuery(query).size() == 0)
      throw new Exception("bad user/password");

In these approaches it is very easy to forget to escape SQL meta characters in the values obtained from the client. This opens the door to an attack through a query that escapes from the programmed query.

StringBorg prevents such attacks by syntactically embedding the query language in the host language. For example, the query above can then be written as follows:

  SQL q = <| SELECT id FROM users
                    WHERE name = ${userName} AND password = ${password} |>;
  if (executeQuery(q.toString()).size() == 0) ...

Now, the syntax of the query is checked statically. But more importantly, at run-time the query is constructed by a query API that ensures that the query constructed has the same syntactic structure as the one defined by the programmer. Furthermore, it enforces escaping meta-characters in values spliced into the query, thus guaranteeing that no injection attacks can occur.

The paper does not just provide a solution for embedding SQL in Java, but offers a generic approach for embedding any guest language in any host language with little more effort than providing syntax definitions for host and guest language.

Abstract: Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation, the concatenation of constants and client-supplied strings. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages (e.g. SQL) into that of the host language (e.g. Java) and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. This approach is generic, meaning that it can be applied with relative ease to any combination of context-free host and guest languages.