It all starts with CONS

Introduction

My son wrote a post List Processing in C which he noted can be used down the road to implement a Lisp interpreter. Which got me thinking, how difficult would it be to write a list processor, a la Lisp, in C?

Writing in C

My first challenge was to write in C, since I haven’t done any C development in over 20 years. Things aren’t quite as bad as they sound though—I was an excellent C developer who transitioned from C to C++ and spent about 10 years writing C++ and then transitioned to Java roughly 10 years ago.

The challenge would be writing without objects—which as it turned out would have been very useful. But no objects for me. Back to manual memory management, which took some getting used to again especially since you can’t really on destructors or garbage collectors to take care of that for you. After working in C for a half hour or so it was amazing how all the idioms came back to me. I guess you really never forget a language once you’ve mastered it.

Creating a CONS

I hadn’t read John McCarthy’s seminal paper on Lisp prior to writing my List processor and I’m by no means a Lisp expert. But I’d already suspected that everything in Lisp starts with CONS.

So what is a CONS?

A CONS is a simple data structure containing two pointers, named CAR and CDR:

CONS Data Structure

See McCarthy’s paper for the explanation of how these pointers got to be so named.

In my first implementation my CAR pointed to a character and my CDR pointed to another CONS or NIL. I could create the string “Hello” like so:

CONS Hello

There’s a couple of things that bothered me with this implementation. The first problem is the only kind of complex data structure I could create is a list. In Lisp a list is a form of a CONS, but not all CONSes are lists. A CONS is merely the means for creating lists. In other words the CAR and CDR pointers aren’t restricted in what they can point to. The second problem is this isn’t very symbolic. Lisp is known to be a symbolic programming language, yet there are no symbols here. These problems are too large to ignore even for a toy implementation.

Symbols

A symbol can be thought of as a label affixed to a data structure containing slots for a value and for a function. A symbol therefore is a means for naming and referring to a value or function in Lisp. The symbol usage context determines which item is being referenced.

Lisp Symbol

So do the CAR and CDR pointers of the CONS point to symbols? No. In Lisp the CAR and CDR pointers can point to CONSes, characters, strings, functions, keywords and a host of other items that are not symbols.

A more intriguing question to me, from an implementation standpoint, was whether the value of a symbol could also be a symbol? If so then my implementation could have the CAR and CDR pointers referring to a value structure, which in turn could be a CONS, symbol or simple value type.

So I ran the following in CLISP:

(setq a "symbol a")
(setq b "symbol b")
(setf (symbol-function 'a) (lambda() (format t "Symbol A~%")))
(setf (symbol-function 'b) (lambda() (format t "Symbol B~%")))

Invoking (a) yields “Symbol A” as expected and invoking (b) yields “Symbol B” as expected.

Suppose we now do the following:

(setf (symbol-value 'a) 'b)
(funcall (symbol-function (symbol-value 'a)))

What will get printed? If you guessed “Symbol B” then you’re correct. This means we can assign a symbol value to be another symbol. A symbol is a kind-of value so-to-speak. Meaning my CAR and CDR pointers should be pointing to values, of which a symbol (and CONS) is merely a kind-of value.

Values

Implementation:

typedef struct _VALUE {
  union {
    long integer;
    double real;
    char character;
    STRING string;
    void* cons;
    void* symbol;
  } value;

  int discriminant;
} VALUE, *VALUE_PTR;

There’s several helper functions for creating new VALUES:

extern VALUE_PTR MAKE_NIL_VALUE();
extern VALUE_PTR MAKE_INTEGER_VALUE(long value);
extern VALUE_PTR MAKE_REAL_VALUE(double value);
extern VALUE_PTR MAKE_CHARACTER_VALUE(char value);
extern VALUE_PTR MAKE_STRING_VALUE(STRING value);
extern VALUE_PTR MAKE_CONS_VALUE(void* value);
extern VALUE_PTR MAKE_SYMBOL_VALUE(void* value);

Freeing a VALUE (remember in C we’re responsible for our own memory management):

extern void FREE_VALUE(VALUE_PTR v);

Printing a VALUE:

extern void PRINT_VALUE(VALUE_PTR v);

And for obtaining a VALUE and determining the type of a VALUE:

extern long GET_INTEGER_VALUE(VALUE_PTR v);
extern double GET_REAL_VALUE(VALUE_PTR v);
extern char GET_CHARACTER_VALUE(VALUE_PTR v);
extern STRING GET_STRING_VALUE(VALUE_PTR v);
extern void* GET_CONS_VALUE(VALUE_PTR v);
extern void* GET_SYMBOL_VALUE(VALUE_PTR v);

extern BOOL IS_VALUE_NIL(VALUE_PTR v);
extern BOOL IS_VALUE_INTEGER(VALUE_PTR v);
extern BOOL IS_VALUE_REAL(VALUE_PTR v);
extern BOOL IS_VALUE_CHARACTER(VALUE_PTR v);
extern BOOL IS_VALUE_STRING(VALUE_PTR v);
extern BOOL IS_VALUE_CONS(VALUE_PTR v);
extern BOOL IS_VALUE_SYMBOL(VALUE_PTR v);

There’s a couple of things to notice in this implementation. VALUES are created using the MAKE_*_VALUE functions. These functions handle the allocation of the VALUE object, setting the desired value, and setting the type discriminant. The type discriminant is how we subsequently determine the kind of value this is.

Since symbols have values and the CAR and CDR pointers of CONSes point to values, the definition of these items need the definition of VALUE. We therefore have a circular reference. I need a VALUE to define a SYMBOL but I need a SYMBOL to define a VALUE. The same issue is encountered when defining CONS. I circumvent these circular reference by using void* for the SYMBOL and CONS in implementation of VALUE.

Finally, note that a PRINT_VALUE function is provided for printing VALUES. This function has the responsibility for printing the values in such a manner that a REPL could process the output as input and reconstruct the VALUE. There will be more to say about this later.

Symbols

Once we have values, symbols are straightforward to implement:

typedef struct _SYMBOL {
  STRING symbolName;            // symbol name, a means for referencing the symbol
  VALUE_PTR symbolValue;        // symbol value, defaults to being same as name
} SYMBOL, *SYMBOL_PTR;

Notice that I haven’t defined a slot for a function. This implementation is a list processor at this time, should I desire at a future point in time to be able to define and execute functions then I’d have to come back and add a slot for a function in this structure.

I’ve defined several methods for creating and working with symbols:

extern VALUE_PTR MAKE_SYMBOL(STRING symbolName);
extern VALUE_PTR SETQ(VALUE_PTR symbol, VALUE_PTR symbolValue);
extern STRING SYMBOL_NAME(VALUE_PTR symbol);
extern VALUE_PTR SYMBOL_VALUE(VALUE_PTR symbol);
extern void PRINT_SYMBOL(VALUE_PTR symbol);
extern void FREE_SYMBOL(SYMBOL_PTR symbol);

Notice that since a SYMBOL is-a VALUE, MAKE_SYMBOL returns a VALUE_PTR, not a SYMBOL_PTR. Likewise all the other helper methods accept a VALUE_PTR and not a SYMBOL_PTR. The downside to this approach is these methods must check that an actual SYMBOL has been provided, however this is easy to accomplish using the IS_VALUE_SYMBOL method we saw earlier. Plus only the symbol implementation methods ever need to work with the innards of a SYMBOL, all usage outside of the implementation is working with a VALUE referencing a SYMBOL. This greatly simplifies the API and I believe leads to fewer usage errors.

CONS

Now we’re finally ready to implement the CONS:

typedef struct _CONS {
  VALUE_PTR car;
  VALUE_PTR cdr;
  BOOL _isList;
} CONS_CELL, *CONS_PTR;

That’s it!

One of the decisions I made that simplifies the implementation of CONS is to create a NIL value. This means a CAR or CDR are never NULL—they instead may refer to NIL values. This greatly simplifies things, you know that it’s always an error to encounter a NULL pointer. It also allows you to properly handle lists of NIL—which yes, Lisp allows.

The other item to note in this implementation is the _isList flag. A list is a CONS whose CDR points to another CONS or NIL, which is the last CONS in the list. For example we saw above how “Hello” was implemented as a list. But this also means “ello”, “llo”, “lo” and “o” are lists too. Rather than walk the list to determine if the criteria are met for being a list, a flag is kept instead. This makes the implementation faster.

This is important information to have because lists and CONSes are printed differently. Since the evaluator takes as input what the print puts as output we need to ensure we’re handling the printing of lists and CONSes appropriately.

CONS also has methods for printing and freeing:

extern void PRINT_CONS(VALUE_PTR cons);
extern void FREE_CONS(CONS_PTR cons);

Since a CONS is-a VALUE, when printing a VALUE if it determines the value is a CONS it invokes the PRINT_CONS method. The same is true when freeing a VALUE, if it’s a CONS it will free the CONS. This makes working with CONSes uniform and less error-prone: you’re always working with a VALUE.

In Lisp, the function CONS is used to create a CONS, the function CAR is used to retrieve the CAR of the CONS, and the function CDR is used to retrieve the CDR of the CONS. We’re now ready to implement these methods:

/*                                                                                                                
  Create a new CONS given both the CAR and CDR             
*/
VALUE_PTR CONS(VALUE_PTR car, VALUE_PTR cdr) {
  assert(car != NULL);
  assert(cdr != NULL);

  CONS_PTR cons = (CONS_PTR) malloc(sizeof(CONS_CELL));
  cons->car = car;
  cons->cdr = cdr;

  // determine if this CONS is a LIST                       
  if (IS_VALUE_NIL(cdr)) {
    // if CDR is NULL, then this CONS is a LIST             
    cons->_isList = TRUE;
  } else if (IS_VALUE_CONS(cdr)) {
    // appending a CONS to a LIST makes a LIST              
    cons->_isList = ((CONS_PTR)GET_CONS_VALUE(cdr))->_isList;
  } else {
    // CDR is not NIL and is not CONS, this is not a LIST    
    cons->_isList = FALSE;
  }
  return MAKE_CONS_VALUE(cons);
}


/*                                                                                                                
  Return the CAR of the CONS                                
*/
VALUE_PTR CAR(VALUE_PTR cons) {
  assert(cons != NULL);

  if (IS_VALUE_CONS(cons)) {
    return ((CONS_PTR)GET_CONS_VALUE(cons))->car;
  } else {
    return NULL;
  }
}


/*                                                                                                                
  Return the CDR of the CONS                                
*/
VALUE_PTR CDR(VALUE_PTR cons) {
  assert(cons != NULL);

  if (IS_VALUE_CONS(cons)) {
    return ((CONS_PTR)GET_CONS_VALUE(cons))->cdr;
  } else {
    return NULL;
  }
}

Putting it All Together

We now have the implementation of the basic Lisp functions CONS, SETQ, CAR and CDR along with the means for creating and printing SYMBOLS and VALUES. We’re now set to do a lot of list processing!

The following code samples demonstrate some simple list processing. The graph after each sample depicts what the CONSes look like in memory.

// (1 2 3)                                              
VALUE_PTR list = CONS(MAKE_SYMBOL("1"), CONS(MAKE_SYMBOL("2"), CONS(MAKE_SYMBOL("3"), MAKE_NIL_VALUE())));
PRINT_VALUE(list);
printf("\n");

(1 2 3)

// (2 3)                                                
PRINT_VALUE(CDR(list));
printf("\n");

(2 3)

// (3)                                                
PRINT_VALUE(CDR(CDR(list)));
printf("\n");

(3)

// NIL                                                
PRINT_VALUE(CDR(CDR(CDR(list))));
printf("\n");

Simply yields the value NIL.

We can retrieve any of the values in the list as the following code demonstrates:

// 1
PRINT_VALUE(CAR(list));
printf("\n");

// 2
PRINT_VALUE(CAR(CDR(list)));
printf("\n");

// 3      
PRINT_VALUE(CAR(CDR(CDR(list))));
printf("\n");

This is starting to look a lot like Lisp!

More Complex Expressions

We can get more advanced in our list processing as the following samples demonstrate. Again, the graph after each sample depicts what the CONSes look like in memory.

// (NIL)
PRINT_VALUE(CONS(MAKE_NIL_VALUE(), MAKE_NIL_VALUE()));
printf("\n");

(NIL)

Perhaps now you’re beginning to understand why using a NIL value rather than a NULL pointer simplified the implementation and usage?

// ((1))
PRINT_VALUE(CONS(CONS(MAKE_SYMBOL("1"), MAKE_NIL_VALUE()), MAKE_NIL_VALUE()));
printf("\n");

((1))

// (((1)))
PRINT_VALUE(CONS(CONS(CONS(MAKE_SYMBOL("1"), MAKE_NIL_VALUE()), MAKE_NIL_VALUE()), MAKE_NIL_VALUE()));
printf("\n");

(((1)))

// (1 (2 (3)))
VALUE_PTR embeddedList = CONS(MAKE_SYMBOL("1"), CONS(CONS(MAKE_SYMBOL("2"), CONS(CONS(MAKE_SYMBOL("3"), MAKE_NIL_VALUE()), MAKE_NIL_VALUE())), MAKE_NIL_VALUE()));
PRINT_VALUE(embeddedList);
printf("\n");

(1 (2 (3)))

This is a bit of a mind-bender but look at the code and look at the graph and you should start seeing how this is working.

See if you understand why the following code works. Hint: Use the definition of a list and the previous graph to follow the CAR and CDR pointers:

// (2 (3))
PRINT_VALUE(CAR(CDR(embeddedList)));
printf("\n");

// (3)
PRINT_VALUE(CAR(CDR(CAR(CDR(embeddedList)))));
printf("\n");

Our list processor is complete and it has the same behavior as the list processor in Lisp!

What’ Next?

It’d be nice to have a REPL (read-evaluate-print-loop) for entering expressions. This way you could enter CONSes in their printed form, which is really convenient and much less confusing than entering them programatically. For example it would be much easier to enter (1 (2 (3))) for embeddedList from the above samples. Look at how much simpler that is!

The second problem is a lack of expressiveness. Once a CONS is created there’s no means for modifying it. On the one hand this makes the data structures immutable, but on the other hand there’s a lack of flexibility. We’d really like to have both, as implemented for example by Clojure.

Of course there’s no way to define and evaluate functions, meaning this is no Lisp. I’ve been focusing solely on the list processor aspect of Lisp. Adding the ability to define and evaluate functions would not be a trivial undertaking, but you would have a fully-functional Lisp at that point. It’s something to think about.

Getting the Code

All the source for this list processor is available on GitHub.

Happy Hacking!

Advertisements

About taylodl

I'm an enterprise architect/developer who enjoys programming, math, music, physics, martial arts and beer
This entry was posted in Development, Lisp and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s