A (Programming) Language of My Own


I’ve long enjoyed learning programming languages and I try to learn at least one new language every year. Past years have seen mainstream choices like Kotlin or Rust, or just playing around with something esoteric like Rockstar.

This year I was going to (re)learn Lisp. It’s long held a fascination for me, with its legendary expressive flexibility and unusual syntax (at least, by C-family standards)! But aside from an AI course in college, I haven’t really used it.

Unfortunately - at least, for my plans - the more I learned about S-Expressions and Lisp’s code-as-data structuring, the more curious I became about writing a language of my own.

Thus: Weave - a new language I’m slowly building an interpreter for.

When I set out I had vague intentions of writing my own Lisp, but using a more familiar C-style syntax. In all wide-eyed innocence and with the grandest of visions, I investigated all the usual suspects - YACC and Bison, LLVM, SICP, the Dragon Book etc, and quickly learned a few lessons:

1. Don’t Bite Off More Than I Can Chew

Ha - turns out there’s an ocean of knowledge out there about writing languages. How are variables implemented? How about the call stack - or do you even have a call stack? Objects and Classes? A formal type system? How about Garbage Collection?

So I’m keeping the scope small! Weave is a small language, focused on a specific type of task: reading, trasforming, and writing files in common data formats. It’s essentially to make it easy to write local “ETL”-style tasks. Open a couple of JSON files, combine their records to a new format, then write the resulting data as a CSV. All in a few lines of Weave!

… At least, that’s the intention!

While still providing useful work, this also excludes a lot of language features that a more “grown up” language would need to implement. For instance, Weave has no Classes or user-defined types. Those would be kind of nice to have, but they add complexity that I don’t need to tackle right now.

2. Interpreters are easier than Compilers

The instruction sets of modern processors are vast and complex, with many details which won’t really help me as I set out to implement Baby’s First Programming Language. Instead, I can take advantage of a host-language that already has all the system-level hooks I need for Weave - for instance, file access - and focus on implementing the core functionality.

As it turns out, I’m implementing Weave as a VM for which I’m compiling Weave source to bytecode, but that allows me to build a very simple compiler targetted to a “cpu” whose instruction set is what I need, no more, no less.

3. Lisp syntax is what gives Lisp its power

Ha - this one is mostly a case-study in hubris, thinking that the benefits of Lisp were somehow disjoint from its syntax. Rather, I have learned (as I imagine, many before me also have) that the S-expression syntax of Lisp is partially the core of its power. Those parentheses provide such an unambiguous grouping of code it makes parsing Lisp programs if not trivial at least much more straightforward than C-style expressions.

Lisp code is also homoiconic - there is literally no difference in syntactic structure between Lisp code and Lisp data. If you want to pass a function around as a “first-class” piece of data - that’s trivial in Lisp. Not simply supported, but the syntax of the language itself makes first-class functions a natural behavior. After all there’s not so great a difference between this

(1 2 3)

and this

(add 1 2)

that it would feel unnatural to create a construction which uses a function as a piece of data:

(defun add (a b)
  (+ a b)
)

(reduce #'add '(1 2 3) :initial-value 0)

Compare this to something similar in C:

int add(int a, int b) {
  return a + b;
}

// C doesn't have a reduce function, so let's add one for ints
int reduce(int(*fn)(int, int), int* args, int n, int initial) {
  int acc = initial;
  int val;
  for(int i = 0; i<n; i++) {
    val = args[i];
    acc = fn(acc, val);
  }

  return acc;
}

int main() {
  int numbers[] = { 1, 2, 3};
  printf("Reduced to: %d\n", reduce(&add, numbers, 3, 0));
}

And there it is - Lisp syntax makes it possible to treat code as data without resorting to any special constructions - aside from the #' quoting (to keep the interpreter from immediately trying to evaluate the symbols) it’s just normal Lisp. The add function gets passed to reduce like any other piece of data. While in C-town, we get to mess with function pointer syntax like int(*fn)(int a, int b)

It’s not terrible, just different enough from working with other data types that it’s not going to be reached for as often.

C-style languages are more common and thus more comfortable to the majority of the programming world. But their syntax does lose that direct expressive power to treat code as data.

C’est la vie!

How it’s Going

Pretty good, thanks!

I picked up one more crucial resource in my learning journey - Bob Nystrom’s Crafting Interpreters. It’s been my primary resource in developing Weave. Because I can’t make anything easy on myself, I’m not implementing the example Lox language, but following along and using the techniques in the book to implement Weave. Of course, I’m building Weave in Rust rather than Java/C, which has required an extra layer of translation and problem solving on top of the extant challenges of reading a book about developing one language and using it to implement a different language, but again - I can’t make anything easy on myself.

It’s a character flaw.

Still and all, I’m making pretty good progress. Weave is nearly fully implemented - and I’m nearly through the book! Another few weeks and I believe I’ll have Weave ready for production personal use!

weave