This article is long, and it's taken me a long time to get to. I procrastinated enough that the only way I could do it was to make it as social network threads first, and this is a compilation and expansion of those ideas.

A "short" summary of Spritely Goblins and CapTP

But still, this post is also too long, so here's a tl;dr (too long; didn't read) summary:

Spritely Goblins has made significant headway with its implementation of CapTP, or the "Capability Transport Protocol"
CapTP is not original to Spritely, and various implementations have been around for over two decades, but Spritely is making some improvements to CapTP that the community mostly agrees are good directions
CapTP enables secure, distributed object capability (ocap) programming in mutually suspicious networks
CapTP reduces the work on writing "protocols" to merely thinking about writing ordinary programs
CapTP has cool features with fancy names like "distributed acyclic garbage collection" and "promise pipelining"
We plan a general version of CapTP which can be used across many different languages (and we're already talking with the Agoric folks about interoperability with their Javascript ecosystem); a ways out, but on the horizon
Nearly everything cool that comes out of Spritely in the future will use CapTP as part of its foundation

Getting motivated for CapTP

Okay, now if you want the verbose version, read on.

An illustration of CapTP handoffs taking place on a whiteboard

For a while I've had the following whiteboard drawing hovering around my desk in my office. It has unintentionally served as a fun conversation starter in some video calls, but its real purpose was to help me think through some difficult problems in implementation.

If you've been following me on the fediverse or on Twitter you probably have a guess as to what this is, because I've been saying it over and over again: this diagram represents CapTP. I've been making major strides lately in the CapTP implementation in Spritely Goblins and posting about it as I go.

I've even made wild claims like "I think CapTP is the most important work I've done yet in my life" (yes, in the long run, I think it will be a bigger deal than ActivityPub's standardization work, which I am very proud of, but the two are not at odds; the plan for Spritely is that the two live side by side). But while I've made it very clear that I am excited, I haven't done a very good job of explaining why others should be excited.

So the real question is: what does CapTP do? Or more importantly, what does CapTP enable us to do? Since I don't think I've done a good job of explaining this so far, so this blogpost is an attempt to do better. Admittedly, even the best attempt here might not succeed until people get to use it; I used to have the hardest time explaining to anyone what ActivityPub was, even though I use mostly the same language I do now, and suddenly when it started gaining major adoption it's as if everyone I was talking to got it and my life became much easier.

I suspect a similar thing will happen with CapTP: at the moment, a lot of what I am going to say will sound abstract and maybe even barely believable. As it turns out, simply talking at people is a rather horrible way to get people to understand concepts, but showing people is another matter. A small demo does exist, but the big CapTP demo is to come. Nonetheless, this blogpost will attempt to explain in plain (or plain-ish) language what CapTP can enable us to do.

Let me speak at a high level first. Spritely makes the claim that we are aiming for the ability to build such rich things as distributed virtual worlds. That requirement is not there just because distributed virtual worlds are fun things to work on or use (though they are), but because right now the social and computing systems we have tend to be insufficiently capable of performing the kind of rich and secure interactions that this goalpost requires. CapTP becomes the foundation for making this and many of the other spaces Spritely aims to work in.

But really, what is CapTP?

But I'm already multiple paragraphs in with no explanation yet, so here we go. Are you ready?

CapTP is a protocol to allow "distributed object programming over mutually suspicious networks, meant to be combined with an object capability (ocap) style of programming".

Wow, clear as mud, right? There's a lot to unpack there! So let's break it apart piece by piece.

Distributed object programming

First of all, "distributed object programming": Yes, CapTP allows you to program across networks with the level of convenience as if you were programming against local objects. (Throw out your assumptions of "objects" as OOP, this can be functional; Spritely Goblins more or less is.) This is done by creating local proxies representing remote objects that the programmer can operate against. This has been done wrong many times in the past (eg the NeXT model); doing this right is the result of significant research. But the result is significant programmer ergonomics in building distributed systems.

Mutually suspicious networks

Next, "mutually suspicious networks": there's no assumption that trust exists on server-boundaries: CapTP is built to allow collaboration without full trust. Curiously, this approach allows for increased collaboration and building of more trust; collaboration is more consensual.

This is no small matter. To draw parallels to non-computing life, I feel safer knowing I do not need to trust all people equally and with the same things in my own life: it is important to permit building the appropriate level of trust, rather than an absolute level of trust, in all parties. We do this in our daily lives, but our computing systems are generally not privy to all of our thoughts (that too might result in trust violations) and yet must act on our behalf. The ability to scope the amount of trust permitted means living a life of greater collaboration, less paranoia, and less distrust.

The decision to not assume the need for trust on machine/server boundaries may also seem surprising, but is important. If you've ever tried to configure CORS, you'll be aware of how hard and error-prone this is. Even the most advanced security architects find themselves frequently making mistakes in this area.

But making decisions based on node-boundary seems like a strange system if we think too long about it anyway. In general (though it often requires much social un-conditioning), I try to not evaluate trust boundaries where I treat members of one nation-state the same. Similarly, there are many households where I trust its members to varying degrees and with different things.

So the machine boundary trust seems like a poor indicator. It is even more poor when we examine the needs of fully peer to peer systems. Server-boundary-oriented-systems with only a few, small number of trusted servers barely scales in the post-web-2.0 increasing consolidation of the web to just a few service providers. They cannot stand up when making new nodes is extremely trivial.

So, nodes are mutually suspicious and do not hand out access to each other simply because they happen to be on trusted lists of server identities. So how is authority handed out?

Combining with object capability programming environments

This is where "combined with object capability style of programming" comes in: this combination is where the power really comes out. Safe, cooperative interaction is very easy in the ocap style: it turns out capability flows can be encoded as normal programming: argument passing and scope! This is the fundamental observation of A Security Kernel Based on the Lambda Calculus; if we take our models of programming security, within them is the best security model we have, and the easiest for programmers to reason about. CapTP takes that observation and applies it on the network level.

A nicely implemented CapTP system will abstract this for programmers so they can focus on the programming part. It wasn't handed to you? Then it's not in your scope and you can't access it.

This simplifies program construction dramatically.

Recently I did a 250 line client/server p2p chat "protocol" (well, 250 lines for the protocol, a mere 300 lines more for the GUI), but I didn't really have to think about the protocol at all. In fact I designed it locally first, in one process; it "automatically" worked over the network, but that's because CapTP took care of the network considerations for me.

By contrast, in most programming systems, an enormous amount of time is spent on protocol design and APIs which tend to be bespoke and disconnected mostly from the actual implementation. They also tend to be made of many moving parts which are hard to reason about. We know that building good abstractions can lead to significant gains in programmer productivity; TCP and TLS are clear examples of this. CapTP, when combined with object capability security systems, brings a similar type of abstraction gain; the more tedious parts of protocols are handled in a general way, and we can focus on the specific ways our programs work and need to communicate. In general this will often correspond to what we would have put at the API perimeter anyway, but now we need less confusing wiring to do it.

Intermission: CapTP origins and collaboration

Okay, if you've really read this far already, time for an intermission. Spritely has not invented CapTP (though it is helping in some of the innovations happening which have been planned for this generation). The idea is somewhere around a two and a half decades ago and was part of the E programming language (which I often jokingly call "the most interesting programming language you've never heard of"). E actually came out of another distributed virtual worlds system of the late 90s, Electric Communities Habitat. Even though EC Habitat did not make it out of the dot-com crash, E did, and lived on as an open source project. (It's no exaggeration to say that the vast majority of the exploration space Spritely is exploring comes out of work that was trailblazed most especially by E; I can't recommend Mark Miller's dissertation enough.) In-between then and now, CapTP has seen several variants (maybe the most famous of which is Cap'N Proto, which in some ways I think of as a mostly-CapTP for people who don't know they're using a CapTP, and whose rpc.capnp was of enormous help to me learning how CapTP works).

Another funny thing happened in-between E's CapTP and now: most of the E and ocap folks joined Javascript's standardization efforts and over the course of the last decade and a half or so have helped beat it into a suitable shape to finally also achieve the distributed object dream. Most of those folks have gone on to start an organization named Agoric (whose namesake goes back go The Agoric Papers which laid out the vision for all this work all the way back in 1988 (holy cow!)) which is just now bringing that dream of distributed ocap networks to Javascript land and beyond (with a bit more focus on economic systems in contrast to Spritely's focus on social networks).

Which may lead you to ask: shouldn't Agoric and Spritely's CapTPs interoperate? I'm happy to say: we're already talking, and that's the plan. In fact, part of my (documented, but long) process of learning CapTP I submitted a PR adding some comments to Agoric's implementation since I was reading and trying to make sense of it anyway (happily, it was merged). It is the goal for both parties to implement the same protocol, and I hope to be able to say more about this soon. When this happens, it won't matter whether you're using Lisp'y Spritely code or Javascript'y Agoric code; you should be able to do distributed object programming in each and both should interoperate happily.

Feature: Efficient representation of capabilities

Okay, back to the cool features that CapTP provides. CapTP is very efficient. You may have used ocap systems that have huge certificates or long URIs, but in CapTP a shared capability is merely a bidirectional integer assignment between the machine importing and the machine exporting! (But users need not be aware of this, since again, a well designed CapTP interface encapsulates the underlying semantics in the same way that good TLS and TCP libraries encapsulate the layers of encrypted and ordered network connections.)

Feature: Cooperative garbage collection across the network

CapTP also has distributed acyclic garbage collection. That means that two servers can collaborate to say "oh yeah, thanks for giving me that object, but you don't need to hold onto it any more on my behalf." Wow!

(Unimportant side note: the original Electric Communities proto-CapTP even could handle collecting cycles that span machines; this seems to require more significant support from the underlying language runtime than most support. This turns out to be rarely needed anyway and is definitely a deeper rabbit hole than the already deep rabbit holes this blogpost has gone down so we will save that for another time.)

Why should you care about distributed acyclic GC though? Let's put it another way: imagine you were building some sort of distributed role playing game. Your players are regularly fighting bats, which are cheap enemies that generally don't stick around long. It doesn't take long for your system to be bogged down by bat corpses! CapTP helps solve this problem by allowing servers to cooperatively know when they no longer need object references held on their behalf. (Before you start asking about non-cooperative scenarios, there are abstraction layers for that too, but we won't worry about those in this particular post.) This is a big win that few other protocols provide, but which is cheap and efficient under CapTP.

Feature: Promise pipelining for convenience and network efficiency

CapTP also has "promise pipelining", which reduces round trips. I can send a message to a remote car factory and ask it to drive the car once it makes it, even before I've been told the car is made! (Spritely, Agoric, and even E all have tooling that makes this look like a "natural" code flow as well.) To quote Mark Miller's dissertation:

Machines grow faster and memories grow larger. But the speed of light is constant and New York is not getting any closer to Tokyo. As hardware continues to improve, the latency barrier between distant machines will increasingly dominate the performance of distributed computation.

All in all, this reduces the amount of work for rich, networked collaborations with safety properties we can reason about from something which only protocol hyper-experts are deemed worthy to consider, to something that us mere mortals can think about. Focus on writing your code and think about where access is being passed around on that layer.

Current status, future roadmap

Now let's discuss the current state of things. Some of the biggest pieces of the CapTP puzzle have recently landed. One of those, "handoffs", is what was being puzzled on the whiteboard previously shown; since much of CapTP's efficiency and operation comes from local pairwise meaning of integers between two machines, transferring a capability machine A has to machine C to machine B is a tricky process. We now have a certificate-oriented solution for handoffs (that is, the handoffs use certificates for sharing capabilities across three machines; the bidirectional two-machine case can still use pairwise integers). This approach has gone through some community review (though we would like more, and more will come at protocol codification time for certain) and is also in alignment with the plans expressed by the folks over at Agoric, so this is good news: getting the "key features" of CapTP in is (mostly) no longer a blocker for fleshing out other layers of Spritely.

At this current moment, Spritely and Agoric have both made significant advances in terms of CapTP in different areas. The reason for this is clear when looking at the current focuses of end use applications that Spritely and Agoric are focusing on respectively at the present moment: Spritely is focused on virtual worlds and social networks, while Agoric is focused on economic systems. (In the long run, we both agree that both of those end uses will encompass apects of the other... and we also can leverage each others' work over CapTP to get there!)

This plays out as follows: Agoric has figured out how to treat abstract machines like blockchains as ordinary participants in the network with CapTP, and this is an enormous advancement in abstraction, but not as critical for Spritely right now. In the meanwhile, in terms of enumerating core CapTP features, Spritely is a bit ahead having implemented distributed acyclic garbage collection, handoffs, and shortening/unwrapping of object references that return home.

Curiously, language choice also has a component here: Spritely is partly ahead in its features because its choice of a lispy language encourages fast experimentation with language-like features, and on top of a language toolkit largely built for language-experimenting enthusiasts. (Nothing outpaces lisp environments in this area, and Racket especially has a large focus here!) The tradeoff, of course, is that lispy languages are more obscure. By contrast, Agoric is implementing its tooling on top of Javascript, which is much more broadly popular, if not quite as fast for language-like experimentation.

Actually, saying that Agoric is "implementing on top of Javascript" really isn't telling the whole story. Much of the Agoric team has spent the better part of the last two decades helping shape the Javascript world, as part of standardization processes, to be ready to pull off the kind of distributed ocap systems that both Spritely and Agoric are interested in. (Many recent wonderful features in Javascript, like its promises, were based around designs taken from the E programming language, an ancestor of the current Agoric tooling... and inspiration also for much of Spritely Goblins' own design!)

Spritely's choice of lispy languages, while more obscure, puts it in a space of languages built for language design enthusiasts, so all of the needed pieces were already there to build on top of. However, we should be glad for Agoric's heroic efforts bring these tools to a wide audience through Javascript standardization; this is hard work. (Trust me, I'm not a stranger to standards work, but language standardization requires extra care.) This also means that a much broader part of the world will gain access to the kind of ecosystem we're talking about thanks to those hard efforts.

But consider the advantage of planning for convergence on the CapTP layer: this is a big win because it means that users of either system can leverage the features of the other side without having to disagree over which language is the right foundation. This also means that we have a way to collaborate even with us focusing on different pieces of the end-user puzzle.

In other words, Spritely need not be focusing on the economic layer right now: once we have CapTP interoperability, we'll already have a bridge into that world thanks to the hard work of the Agoric folks! And the Agoric folks can also benefit from our work on the social side of things. Eventually, both sides can also write some of this tooling in their own language environment, but it may even turn out that in some places this need not even happen because the other side's tooling works well enough.

So, interoperability on the CapTP layer is planned. Both Spritely and Agoric are in close conversations but also have pressing matters to attend to which precede this work, but likely you'll hear more about this soon-ish.

The more exciting and immediate thing to do is to start building demos. Longform textual explanations are good and well, but "seeing is believing". Shinier demos are even better; the Terminal Phase time travel demo showed off features that had existed for some time Goblins, but was the first time I saw people raising their heads and saying "gosh, wow, what's happening over here!" I'd like to do the same for the CapTP layer of Goblins soon.

CapTP is only exciting because it's a powerful foundation for what's to come. I fear this blogpost, long and rambly as it is, still will not capture minds in the appropriate way. Hopefully by building demos people can get a sense and feeling that indeed, something truly interesting is going on here, something they really want to use.

Conclusions

I believe Spritely's future is bright, but part of this is because of the long and hard work on its architectural foundations. Those pieces are coming together, and CapTP is probably the most shining pillar of all of those. At the beginning of this process, CapTP was a strange and mysterious thing, yet with seemingly alluring powers. At present, those alluring powers have shown themselves true and are increasingly available to the Spritely system. In time as CapTP is codified, we aim to chip away at the strange and mysterious component, distributing its power to all.

Thanks to lthms and Bill Tulloh for your review, feedback, and suggested changes to this post!

What is CapTP, and what does it enable?