Weave Wrap-Up


Well - it’s been a few weeks now, I think it’s time to finally call it. Weave is “done”. At least, as done as it’s going to get for a while.

Post Done Work

As it turned out, having a functional language made me want to add a few more features to it. After adding these, I think I’m satisfied and ready to move on:

Spools

It didn’t take long after I got Weave actually running that I realized I’d made a big mistake - early on, I’d specified “code import” as an explicit anti-goal. No code reuse in Weave, thank you! If your program doesn’t fit in one file, you can go elsewhere!

…Well, turns out that I ended up wanting code reuse after all.

“Spools” are Weave libraries (what? Python has ‘wheels’, Ruby has ‘gems’, I can have ‘spools’!). I took inspiration from Rust’s cargo, so Weave spools:

  • use semantic versions
  • use a toml config file and provide a lock file
  • supports remote and local spool locations
  • handles dependency resolution / downloading

I elected not to go for a full on repository however - I’m not going to that much effort for a language with a single user and three libraries.

But - should the mood take you - you can write a spool, give it an accessible git URL and then install it to another Weave program.

Tail Call Optimization

This is just something I’ve always thought was neat. If you’re unfamiliar, Tail-Call recursion (though it works for any tail-call) is an optimization available to stack-based VMs, wherein you can make certain recursive function calls more efficient.

It works like this: Imagine you have a program that looks like this:

# Recursively calculate a^n - using addition. Because we can.
fn bad_pow(a, n, acc: 0) {
  # When we hit the bottom, we've calculated our value - return it.
  if n <= 0 { return acc }
  return bad_pow(a, n - 1, acc * a)  # we're accumulating our value in acc as we descend, then returning it back up the stack
}

A tail call is when the return value of a function is just the return value of another function.

Imagine our VM stack right before we call bad_pow. We’ll have something like this:

0x1234  PUSH  a
0x1238  PUSH  n
0x1242  PUSH  acc
0x1246  CALL  bad_pow

and bad_pow itself looks something like

0x4320  BIND  acc  # pops the value 'acc' off the stack and binds it to the local 'acc'
0x4324  BIND  n    # ditto
0x4328  BIND  a    # Of course, I'm eliding all the type management here. :D 
0x4332  JMP_LEZ  n  0x4360   # If n is <= 0, jump to the offset and skip the rest.

0x4336  PUSH  a    a        # this looks familiar...
0x4340  PUSH  n    n - 1    # technically this would be like `PUSH n; PUSH 1; SUB` but we'll keep it simple
0x4344  PUSH  acc  a * acc  # same deal
0x4348  CALL  bad_pow

0x4352  POP   tmp  # pop to get the return value from bad_pow - if we don't, they'll accumulate. We could return something else and ignore this value.
0x4356  RETURN  tmp  # ...and put it right back so we can RETURN it.

# oh yeah, remember this? it's the base case where n <= 0.
0x4360  RETURN  acc 

Now, without tail-call optimization, the stack will slowly accumulate CALL -> <function> -> POP -> RETURN on the stack and eventually overflow.

0x1112  CALL bad_pow
0x1116  POP  tmp
0x1120  PUSH tmp
0x1124  RETURN
...
0x1150  CALL bad_pow
0x1154  POP  tmp
0x1158  PUSH tmp
0x1162  RETURN
...

If we note (at compile time) that the setup and calculations are going to be the same each time, we don’t have to keep filling up the stack - we can just reuse the same instructions we already have… and just jump back to the start of the function as if we had a normal loop.

e.g. instead of CALL bad_pow (JUMP 0x8160) for a new instance of the function, the compiler emits something like CALL bad_pow (JUMP 0x4320) to go back to the top of its own code. This means that we’re able to reuse the same code and data we already have on the stack!

Various Performance Improvements

This has been an ongoing effort, but things like using #[inline] in Rust to make hot-path code well, inlined instead of function calls has squeezed a little more performance out of Weave. It’s still not going to set any speed records, but it’s pretty fast for a one-man interpreted language project!

Documentation

As I worked on Weave, I kept a syntax.md file to help me remember what was supposed to work. Over time some of it changed (I had accumulators with default values as the first param in the ‘reduce’ functions! Why?!) but it’s mostly been added to. This also helped me to begin writing sample weave programs to act as my test cases once I had an actual interpreter and wanted to avoid regressions.

Eventually, syntax.md grew into a small set of (sloppy) documentation files which in turn I used to create Weave-Lang.com a hosted resource for Weave and its documentation. The site has been organized and themed (thanks, Hugo) and provides a much nicer view of the language now than my original resources. I’m honestly pretty happy with how it came out.

Weft: The Official Testing Framework for Weave

Weft is the only official test framework spool for Weave (because it’s the only one.) :D It’s vaguely inspired by RSpec but the thing I’m happiest about is a truly horrible hack in which I abuse Weave’s error handling to make describe work:

from weft import describe, is_true, is_false
from foo import make_foo

describe("Foo", [
  setup: ^(ctx) { ctx[:foo] = make_foo() }
  foo_is_amazing:   ^(ctx) { ctx.foo.is_amazing() |> is_true },
  foo_is_not_dull:, ^(ctx) { ctx.foo.is_dull()    |> is_false }
])

This looks not too bad - the trick is that describe is actually reporting a :test_set condtion, and waits for a :test_run strategy to resume running the tests!

fn describe(name, tests) {
  report (:test_set, name) {
    test_run: ^(ctx) {
       # run the tests
    }
  }
}

This way, the test runner just has to load the test files inside a handle { test_set: ... context and then the runner automatically receives receives a container with a .resume function pointer that will run the tests from that file.

Can’t do that with exceptions. :D

Weave StdLib

Weave has a bunch of built-in functions for things that require OS access. Stuff like open to get at a file handle, or time to get the system clock.

Built-in functions are implemented in Rust and each one is implemented in the VM.

On the other hand, the stdlib is just like… all the useful stuff that nobody really wants to write themselves and we pretty much all need to exist. Math functions like max/min or helpers like pad_left. A lot of these things make sense to ship with the language but don’t need to be built-in!

Among other reasons, built-in functions can’t be overridden. They’re down there bone deep. You can’t make your own version of print in Weave. It’s not quite a keyword, but it’s deep enough in the VM that it might as well be.

As I’ve run into things that aren’t available in the core language definition but would be really nice to have, I’ve taken to putting them into the stdlib.

That’s why things like dispatch aren’t in the actual language, but I can still use dispatch instead of switch statements

from iter import dispatch

foo |> dispatch([
  ^(x) { x < 20 && x > -1 }, ^(x) { low_num(x) },
  ^(x) { x >=20 },           ^(x) { bigger(x)  },
  ^(_) { true },             ^(x) { smaller(x) }  # must be -1 or less
])

it’s not the most elegent system, but it does work.

Learnings

So after all this… what’d I learn?

Computers are Hard

Even knowing how hard it is to write normal software, seeing how challenging it has been to get even a simple language (with a stupid type system and no classes!) to work correctly… It’s frankly amazing that anyone ever computes anything successfully.

Good Error Messaging is Not a Nice To Have

Once you pass a certain level of complexity, knowing why a program failed stops becoming a luxury you can implement later. And it’s pretty much once you pass the 10-line-scripts point. If you want to use your language for “real” coding, you’ll need to have accurate, helpful error messages when things go wrong.

LISP’s “Weird” Syntax is What Gives it Power

Like many, in my ignorance and hubris I thought perhaps I could find a way to bring the mythical power of LISP to C-style language constructs.

Turns out that the source of that power is the syntax. Moving function names inside the parenthesis means that there’s no difference between the syntax of code and the syntax of a LIST. Everything is LISTs. Of course, that’s the whole point of LISP and it’s crazy obvious in retrospect.

You Have a Wierdness Budget

If you want other people to be able to read your language, you need to provide constructs they are familiar with. The more new concepts you throw at someone who sees your language for the first time, the less they’re going to want to invest in it. (Unless they’re a language nerd in which case, awesome, but those guys never get to use a cool language at work anyway.) It takes decades for new languages to “catch on”.

I’m not hoping for Weave to become “popular” with “the masses” - but even so, reflecting on my design, I think I made a few choices that make Weave less familiar to other programmers, but hopefully not so extreme as to put it out of reach.

Pipeline Syntax

I don’t think this is spending too much of my budget, honestly. Most programmers are exposed to piping shell commands if nothing else and the idea of “take the output from foo() and pass it to bar()” is pretty simple. Most coders are also familiar with map and reduce these days (they certainly weren’t when I was learning C++ in ‘04!) but having them as operators and not a list of functions with nested parentheses takes a little getting used to.

Error Handling

If I had to pick one thing I think would trip up new Weave programmers, it would be the error handling system. I’d never seen a condition/restart system before I started thinking about what to do for Weave’s error-handling, but now I love it so much I wish I could bring it into more languages.

Still, even though you can do things with conditions that are impossible with exceptions, I tried to keep the syntax familiar enough for programmers used to exception handling to find something to grab hold of.

# Handle blocks are like `catch` - but they declare handlers up front instead of afterward.
cfg = handle {

  # error handler functions tell the error'd code how to continue after error conditions occur.
  file_not_found: ^(src, c) { c.resume(:use_default, [foo: 1, bar: 2, baz: 3]) }

} for {
  # this code may raise a condition - like a thrown exception, but the handler can recover from it.
  read("config.toml", :toml)
}

What’s Next?

Well, since I started Weave, the world has caught the LLM-AI bug and I’m no exception. I started my career in AI, building self-driving cars from ‘06-08 (non-civilian uses, but we did compete in the Darpa Grand Urban Challenge). As I’ve been winding down with Weave, I’ve been experimenting with building fully autonomous, self-improving AI systems and I’m looking forward to turning my attentions that way next! The state-of-the-art in AI has advanced quite a bit since I finished my degree and it’s been a lot of fun getting back into it with LLMs and diffusion models crossing some pretty incredible capability boundaries.