CRDTs, Bounds, SyncFree, and Invariants

I finally got around to watching the SyncFree consortium’s talk about their work on CRDTs at RICON 2014. Or at least half of it, rather, and I was really struck by their Bounded CRDTshttp://christophermeiklejohn.com/crdt/2014/07/22/readings-in-crdts.html

idea. The idea of a bounded CRDT is this: using a predefined invariant, you allow a certain count of “interactions” per replica. They are likened to leases or reservations in the talk. I like reservations better:

Okay (edit 2023: I don’t have these drawings anymore, sorry)

Here, in an over-simplified example, we have a counter. This counter changes with increments of 1, i.e. counter++ or counter--. We have set an invariant of cannot be < 0, and we use this invariant to make a guarantee. If each replica can only make 1 operation against the current known value (4), then we will never violate the invariant. However, should a second operation be performed on any replica, we are no longer convergent: the result of this would have counter’s value equal to -1:

Not Okay (edit 2023: I don’t have these drawings anymore, sorry)

This is my primitive understanding, or my assumption, after watching 20 minutes of the RICON talk on this today. What I’m most interested in is the bound that comes from the reservation count. This is the third time I’ll bring this paper up on this blog, which I can accept – maybe I’m a bit obsessed. The Peter Balis, et al. paper on Coordination Avoidance. This paper, from a 10,000ft perspective, says only coordinate when you have to. Define invariants, coordinate when one is violated.

It looks to me that a bounded CRDT is using this same concept, but using the invariant to avoid conflicts.

As a background process, the “Reservation Manager” above doles out reservations, allowing each replica to operate on an object N times. When the Nth operation is reached, consensus is needed. If you are clever enough, and your strongly consistent Reservation Manager service can stay far enough ahead, you never have to sync.

Here comes some math

Mathematically, CRDTs are all about a join semilattice, and so I’m about to attempt to surmise what the lattice is in the context of a bounded CRDT.

Reservations is a semilattice \((R,\ \lor)\). The current operation, if operations are \(O \subseteq R\) is \(R \lor O\). That is, the Least Upper Bound of \(R\) is the current Operation Count. This meets semilattice requirements of commutativity, associativity, and idempotency in that with each operation \(⊤O\) will always be equal to some element in \(R\) so long as \(R\) remains unbounded. Therefore \(R \lor O\) holds for any \(O \subseteq R\). Like I said, if you are clever enough you can keep your Reservation Manager far enough ahead.

This CRDT concept is a bit more complicated than the basic commutative or monotonic models, but is still farily easy to understand conceptually. I’m very interested in what comes next from the SyncFree humans, namely the adaptive stuff, which is used to reduce replicas – which decreases the LUB, which makes pre-determining reservations easier, which is :thumbs_up:.