One of the ingredients of the Concepts of Programming Languages course I teach at TU Delft, is an introduction to the C language. In two lectures I went through the language, emphasizing functions, structs, pointers, and memory management based on Kernighan & Ritchie and a nice piece by Nick Parlante on Pointers and Memory. The students were supposed to be able to understand basics of expressions and control-flow from exposure to Java and Scala.
For the lab assignment, William Cook had given me the idea of expressing dynamic dispatch in a procedural language as a method (no pun intended) to understand both the OO mechanism and appreciate the difference (and hard work) of memory management in C. The idea seemed simple enough, but while understanding the idea conceptually, I had never actually done the exercise myself. Searching the web did not result in a whole lot of information, but for this example. In case anyone needs inspiration for their course, here is the text of the assignment I gave the students in which I explain the code in the example and provide a slightly larger Java program for them to translate. (Corrections of my poor understanding of dynamic dispatch or C are welcome as well.)
(Thanks to pandoc for helping convert LaTeX to Markdown.)
C is a procedural language. C programs consists, roughly, of data structure declarations and functions working on those data structures. C is not an object-oriented language. This means that functions are not associated with objects, as methods are in Java.
In this assignment we manually translate a small Java class hiearchy to Java. This will teach us something about Java and it will teach us many aspects of C.
The program below defines a small Java class hierarchy with two classes
Base
and Child
with the latter a subclass of the former. Class
Base
defines a method print
, which is overridden by class Child
.
Class BaseTest
creates a Base
and a Child
object, but both are
assigned to a variable of type Base
. When calling the method print
the program decides at run-time which implementation to run based on
the run-time type of the object.
class Base {
Integer x;
public Base(Integer v) {
x = v;
}
public void print() {
System.out.println("Base: " + x);
}
}
class Child extends Base {
Integer y;
public Child(Integer v1, Integer v2) {
super(v1);
y = v2;
}
public void print() {
System.out.println("Child: (" + x + "," + y + ")");
}
}
class BaseTest {
public static void main(String[] args) {
Base base1 = new Base(45);
Base base2 = new Child(567, 245);
base1.print();
base2.print();
}
}
The C program below contains a translation of the Java program. It uses a so called v-table or virtual method table to support run-time look up of a function depending on its ‘type’. Lets examine the code.
In C we declare data structures as struct
s:
struct Base {
int x;
};
This defines a new record type with field x
. Using a typedef
we
use the struct to declare the Base
type:
typedef struct Base {
int x;
} Base;
A variable of type Base
will thus contain a record with one field. The
type Base*
is a pointer to a Base
record.
The C language does not support the associations of methods with structs. Rather, functions are defined at top-level. In order to translate methods that belong to particular type, we adopt a naming convention, using the name of the class as prefix of the function implementing a method:
void Base_print(Base* obj) {
printf("Base: (%d)\n", obj->x);
}
Further, note that methods are applied to an object. C functions only
have regular parameters. Thus, the Base_print
function has one
argument, which corresponds to the object it is applied to.
In order to label a particular function as the implementation of a
method for a particular object we use a virtual method table or
vtable. This is enabled by the fact that C allows to store
function pointers. For example, given the declaration of function
Base_print
above, &Base_print
is a pointer to that
function. A vtable is simply an array of function pointers. The slots
in the array correspond to the methods of of a class. Since methods are
fixed per class, we do not have to maintain an array of functions
for each object. Rather, it suffices to declare one vtable
array
per class. Thus, the following global variable Base_Vtable
stores the vtable for the Base
class:
void (*Base_Vtable[])() = { &Base_print };
The expression { &Base_print }
constructs an array literal (of length 1).
In order to associate the vtable with an object, we extend the struct with a vtable field containing a pointer to an array of functions:
typedef struct Base {
void* (*vtable[])();
int x;
} Base;
C does not provide built-in support for constructing objects. Thus, we
have to define functions that correspond to the constructors of a class.
These functions have to do explicit memory allocation to create a
chunk of memory reserved for an object. We use the malloc
function to
allocate a chunk of memory the size of the Base
struct. As a
convention we create a constructor newC
for a class C
. Thus, for the
Base
class we define:
Base* newBase(int v) {
Base* obj = (Base*)malloc(sizeof(Base));
obj->vtable = Base_Vtable;
obj->x = v;
return obj;
}
The vtable for the Base
class is associated with the object by
assigning it to the vtable
field.
For the Child
subclass of the Base
class we go through the same
steps as we did for the Base
class, that is
define a typedef struct Child
to represent objects of type Child
include a vtable
field as first field to associate methods with
the objects of the type
implement the method(s) of the class as top-level functions with
Child_
as prefix for the function name
declare a global Child_Vtable
variable to contain an array
of function pointers with a pointer for each method
define a newChild
function representing the constructor for the
class
Note that the x
field is inlined in the Child
struct. Since C does
not support inheritance, we have to copy the fields of the superclass.
(We could try to deal with that with macros, but we will consider C
without macros in this course.)
Finally, we arrive at the magic moment. For each unique method in the
class hierarchy we define one general function that we can call for all
objects in the hierarchy. The function uses the vtable
field of the
object to dispatch to the concrete method implementation:
void print(Base* obj) {
obj->vtable[0](obj);
}
The expression obj->vtable[0]
is an access of the 0th position of the
vtable
field of the object obj
. The result is a function pointer,
which is applied to obj
. The main
function
puts this to the test:
int main() {
Base* base = newBase(45);
Base* child = (Base*)newChild(567, 245);
print(base);
print(child);
}
The function creates a base
and a child
object, of type Base
and
Child
, respectively. However, both variables are declared as type
Base*
. Then the same function print
is applied to both objects.
Since base
is constructed with newBase
, its vtable is
Base_Vtable
and print
dispatches to Base_print
.
The object child
is constructed with newChild
, thus its vtable is
Child_Vtable
and print
dispatches to Child_print
.
In general, the function called is determined at runt-time based on the concrete instance.
To abstract from the exact position in the vtable we can use an enumeration with symbolic names for the functions:
enum { Call_print };
void print(Base* obj) {
obj->vtable[Call_print](obj);
}
Virtual method table in C: http://www.daniweb.com/software-development/c/threads/228277
Dynamic dispatch: http://en.wikipedia.org/wiki/Dynamic_dispatch
#include <stdio.h>
#include <stdlib.h>
/* class Base */
typedef struct Base {
void (**vtable)();
int x;
} Base;
void Base_print(Base* obj) {
printf("Base: (%d)\n", obj->x);
}
void (*Base_Vtable[])() = { &Base_print };
Base* newBase(int v) {
Base* obj = (Base*)malloc(sizeof(Base));
obj->vtable = Base_Vtable;
obj->x = v;
return obj;
}
enum { Call_print };
void print(Base* obj) {
obj->vtable[Call_print](obj);
}
/* class Child */
typedef struct Child {
void (**vtable)();
int x; // inherited from Base
int y;
} Child;
void Child_print(Child const* obj) {
printf("Child: (%d,%d)\n", obj->x, obj->y);
}
void (*Child_Vtable[])() = { &Child_print };
Child* newChild(int v1, int v2) {
Child* obj = (Child*)malloc(sizeof(Child));
obj->vtable = Child_Vtable;
obj->x = v1;
obj->y = v2;
return obj;
}
/* test it */
int main() {
Base* base = newBase(45);
Base* child = (Base*)newChild(567, 245);
print(base);
print(child);
}
Translate the following Java program to a C program using the method above.
abstract class Expr {
abstract public String format();
public Integer precedence() { return 0; }
public Boolean higherPrecedence(Expr that) {
if(precedence() != 0 && that.precedence() != 0) {
return this.precedence() > that.precedence();
}
return false;
}
}
class Var extends Expr {
public String name;
public Var(String name) { this.name = name; }
public String format() { return name; }
}
class Number extends Expr {
public Integer value;
public Number(Integer value) { this.value = value; }
public String format() { return value.toString(); }
}
abstract class BinOp extends Expr {
Expr left;
Expr right;
public BinOp(Expr l, Expr r) {
left = l;
right = r;
}
abstract public String operator();
public String format() {
String l = left.format();
String r = right.format();
if(higherPrecedence(left) ) { l = "(" + l + ")"; }
if(higherPrecedence(right)) { r = "(" + r + ")"; }
return l + " " + operator() + " " + r;
}
}
class Plus extends BinOp {
public String operator() { return "+"; }
public Integer precedence() { return 10; }
public Plus(Expr l, Expr r){ super(l, r); }
}
class Mul extends BinOp {
public String operator() { return "*"; }
public Integer precedence() { return 20; }
public Mul(Expr l, Expr r) { super(l, r); }
}
class ExprTest {
public static void main(String[] args) {
Expr test = new Mul(new Number(42), new Plus(new Number(3), new Var("x")));
System.out.println(test.format());
}
}