Borrowing from ACID patterns to tame eventual consistency - Written by Matt Howard (continued...)
In Part 1 we explored the notion of data consistency from the perspective of an Oracle database - specifically how Oracle relies heavily on structured logs to give us the consistency guarantees that we have come to rely on. We can often apply this same pattern in our applications to mitigate some of the risks in working with an eventually consistent database. This requires a shift in how we think about our data. Developers think of “data” in terms of the public model that relational databases expose to us - a “column” and a “row”, but we just saw how Oracle operates with much more granularity. “Data” is a specific value at a specific point in time, and this definition of data is what allows Oracle to manage consistency.
Pat Helland (who worked closely with Jim Gray and Andreas Reuter who gave us ACID) says it best in his paper “Immutability Changes Everything”:
The truth is the log. The database is a cache of a subset of the log. That cached subset happens to be the latest value of each record and index value from the log.
Let’s look back at the example financial scenario with this definition of data.
There are 2 things that bother me about my example. The first is that the entire thing is erased from existence if only the last step fails. The 2nd is that the 2 updates just change the balance without caring what the previous value was.
Problem #1 - Data vs Facts
The insert of the actual transfer record went just fine - at that point the transaction could be considered official but we have to wait for 2 denormalized tables to confirm the same changes. Relational databases are all about normalized data, but quite often when we talk about transactions we are really talking about dual writes of denormalized data. There isn’t any reason a single balance transfer needs to be written in 3 places other than to simplify the read model. It is one event that has multiple effects which we denormalize and write in a single transaction.
This is where the community is shifting its understanding of data. Facts are the source of truth, while data is the sum total of information that can be derived from the facts. To put it in context of our transaction - facts are the records in the redo logs, while the representation of those facts is the data tables.
So what if the “log” in our application was the single insert of the money transfer? We’ll denormalize the view of that fact into the denormalized balance of Person A and Person B, but if one of those updates fails maybe it’s not right to totally invalidate the fact of the transfer or attempted transfer. If we can safely replay our log to ensure the balances are eventually updated then maybe ROLLBACK is the wrong model here - we could look to record our facts first and then ROLLFORWARD any denormalized downstream effects by replaying them until they succeed.
You may be thinking “doesn’t this raise the possibility of inconsistent reads until the data is propagated”? Yes, but this is the same problem that relational databases face within an uncommitted transaction - which is handled internally using logs. Reads against the denormalized balance aren’t implicitly trusted, just as Oracle doesn’t trust their internal data files without a check against the undo logs. We can implement a similar pattern in our apps to control the consistency of our data and verify or adjust against the logs as needed.
This isn’t a pattern I would want to implement everywhere, but in the few cases where transaction-like behavior is actually needed it provides a nice verification layer to ensure that all events have been processed. If you need to manage the potentially inconsistent reads of the log itself you can use a number of different tools from simple caching to lamport clocks or vector clocks.
It’s up to us - if we have the source facts available in log tables we can project or suppress the publicly available view of that data however our context requires. On a related note see Martin Fowler’s Contextual Validation for some great thoughts about the validity of data and what we allow in the database. Perhaps the record of the attempted transfer is worth writing into our database regardless of whether it is ultimately valid or successful.
Problem #2 - Destructive Writes
We’ve talked a lot about the power of structured logs, and one of the central points is that they must be immutable or unchangeable. Once something has happened it is in the log for good… the only way to get rid of it is to write another log with a compensating action. This is a powerful concept that I’ve come to rely on so much that anytime I see an UPDATE or a DELETE statement I get uneasy. It feels so risky to me now to throw away critical data, yet we are so used to doing just that because ACID guarantees make it seem like a safe operation. Like most developers I have the scars to prove it isn’t safe at all. How many bugs or user errors have we seen where a record was updated or deleted incorrectly and trying to get it back is hard or impossible?
Just because Person A changes their address doesn’t invalidate the fact that they lived in another place before a certain date. Yet it is so ingrained in our old architectural minds that CRUD is normal and UPDATES and DELETES are necessary and harmless. They are fine for derived data, but I believe all critical source data should be modeled as immutable facts first. UPDATE and DELETE the effects of those facts all you want. In my example above I’m calling the balance a denormalized view of the data so an UPDATE could be ok but the statement still looks wrong to me.
Update balance of person A = A-$100
This statement is saying I don’t care what the balance was before, I’m sure it’s ok to just remove $100 from it. Again this seems like a risky behavior that we are used to only because ACID has allowed us to get away with it. Of course most applications would have a check to be sure the previous value was what we expected but this just adds complexity to work around this write after read data hazard. It would be much less error prone if our balance was always generated from the history of the logs/facts. The above statement is not idempotent - it can be executed once and only once. A balance update based on the sum of verified transactions can be done repeatedly and will auto-heal itself in the event something goes wrong.
There are patterns for snapshotting so you don’t have to read every log from the beginning of time all the time, but this method of deriving denormalized data from source facts is so much safer than the alternative. I believe this strategy is critical if you have a distributed system - concurrent writes to an immutable log that feed into a single balance eliminate several complexities that are introduced by concurrent destructive writes directly to that field.
Now that I’ve worked with Cassandra for a while and I have implemented these patterns I see that most of my initial fears about eventual consistency were rooted in anti-patterns that ACID had only let me get away with.
Any developer of multi-threaded applications knows the complexity and danger of shared, mutable state. This complexity is at the heart of ACID guarantees and we have been building applications on this foundation for years. If we lose the consistency guarantees we have to also move away from using the database as a container for shared, mutable state.
What we really should be asking is “what other algorithms accomplish the same net effect?” I’ve found that the patterns to manage eventual consistency also make applications much more resilient, fault-tolerant and ultimately more consistent or correct. It is a complete fallacy that database consistency equals application consistency. Many of us including myself have believed this fallacy and relied on it on for too long… but I’ll write about that will be a separate post.
Jonas Bonér says “Consistency is an island of safety, but so are immutability and idempotency”. I love this quote and have found it to be true - immutability and idempotency actually can give us back consistency in an inconsistent world. Decades-old principles still apply - structured log storage is not nearly a new concept. When trying to coordinate data or services that are distant or disconnected the tools of immutable logs and idempotent behaviors are invaluable.
It took me a while to wean myself off of reliance on ACID and I was very skeptical of all the voices that were critical of relational databases. The writings/talks of Pat Helland and Jonas Bonér in particular helped me reshape my thinking about what a database should do, and how to build an application around a database that couldn’t hide all the complexities of concurrency from me. With or without a relational database to rely on I would implement most of these concepts at least for any critical path in my application. Relying on the immutable logs of financial events that can be replayed over and over as needed enabled us to build an accurate, consistent and reliable system on top of an eventually consistent database.
I’d highly recommend watching two great videos by these guys who explain it much better than I could:
Subjective Consistency - Pat Helland
Life Beyond the Illusion of the Present - Jonas Bonér
I think this article is also well worth reading:
Using logs to build a solid data infrastructure (or: why dual writes are a bad idea) - Martin Kleppmann