Conflict Free Replicated Data Types image

Entity Persistence

When we discuss entity persistence, we generally think of four basic verbs: Create, Read, Update, and Delete (CRUD ). With these four actions, we interact with the persisted entity. Each persisted entity is mapped to a record in a table (or tables), and each attribute is represented as a field in a record. As the application makes changes to the entity's attributes, its record fields are updated to reflect the current state. At any given instance, the persisted entity represents the aggregate change since it was created.

THE PROBLEM WITH STATE

When we persist an entity's current state, we are capturing a single instance in the life of the entity. With CRUD-based persistence, we have little insight as to how the entity arrived at its current state. This situation can be a problem if we have a new domain insight, or if a coding error corrupts the entities persisted data. With a new domain insight, we are unable to retroactively apply it to the entity because we don't understand the history of the entity. If a coding error corrupts an entity, we can't easily go back and fix it because we don't know the pre & post events bracketing the data corruption. So how can we address this? One solution is Event Sourcing.

Event Sourcing

Unlike the CRUD approach, Event Sourcing does not persist the current state of the entity. Instead, it focuses on capturing and persisting the sequence of all state change events for a domain in an event store. The event store is a special purpose database used in event sourcing systems to store events. I will give you a moment to bask in the glory of that definition.

With event sourcing, when we require the current state of an entity, we derive it from the event log history rather than retrieve its last persisted state. A bank account works in the same way. In this case, there are two event types, deposits, and withdrawals. The bank account maintains a history of these events and derives the current balance by applying the event history.

The journey or the Destination

When we store only the current state, we lose information by failing to capture intent. With event sourcing, we eliminate the persistence of state entirely and instead capture intent in the form of stored events. Because the event log contains every operation performed on the entity, it becomes the source of truth for the entity. With event sourcing, we capture the journey rather than the just destination. We are now able to retrieve the state of the entity at any given point in its lifetime.

The event log provides us with an audit log for the entity from its inception to any point in its lifetime. In addition to being able to derive the current state of the entity from the event log, we are able to perform temporal queries on the entity to inspect how it has changed over time. These queries can derive historical state by replaying the events up to a particular point in time. Additionally, by persisting domain events instead of the current state, we no longer need to map the entity to a table structure, avoiding the object-relational impedance mismatch problem entirely.

Correcting Past mistakes

Finding and fixing errors in application code can be challenging. Fixing the data corrupted by the error when we only store current state can often be impossible. Fortunately, with event sourcing, we simply replay the event log once the error has been fixed and we will have derive the correct state.

Audit Logs vs. EVENT SOURCING

When we realize our traditional state-based persistence requires an audit trail, it is common to bolt-on a parallel audit table that stores entity operations in much the same way as event sourcing. This audit table captures the entity's history. With the dual approach, the audit table is used in the same ways as event sourcing log- so why don't we do that?

In theory, we get the best of both worlds, easy access to the current state as well as a full history of the entity's operation. The first question we must ask is, "what happens if the entity's state and the audit log somehow get out of sync?" Which is the source of truth? The audit table or the state record? Since we can derive the current state from the audit table, the audit table is the logical choice. At this point, if we eliminate the persisted state, we are left with an event sourced system.

From event log to Entity

So now that we have eliminated state how do we reconstitute our entity when we need it again? To do this, we simply execute every stored event that mutates the entity's state in its chronological order. When we run out of events, we arrive at the current state.

Snapshots

For long-lived entities, the event log can grow extremely large. As the log grows, the time incurred to reconstitute the entity will eventually exceed an acceptable level. One common optimization technique is the Snapshot. With this technique, we introduce the creation of periodic entity snapshots events into the log. Each snapshot represents the aggregate of all prior log events up to the point of the snapshot creation event. When using snapshots, we search the event log in reverse chronological order to find the latest snapshot event. We instantiate the last entity snapshot, then replay any remaining events to arrive at the current state. This approach maintains the event log history while providing an optimization to minimize load times.

Event integrity

An event sourced system is only as good as the contents of its event log. It is critical that every entity event is persisted to the log. Events in the log represent historical facts about the entity. For the event log to be the system of truth, it must represent the actual history of the system. Every event must be immutable. We can change the code that interprets the events, but we can't change the events themselves.

Versioning Events

Over time, application events may need to evolve to reflect changes in the domain model. Because of the immutability of events, we can't go back and change past events to reflect the changes. Instead, we introduce a versioning scheme for the events that identifies how the log event should be processed. Each change in the structure of the event is issued a version identifier to differentiate it from its predecessors. When an event is applied to the entity, the version identifier signifies the appropriate code to invoke to apply that log event version.

Advantages of event sourcing

Event sourcing allows us to capture the history of an entity's current state. This history enables us to rewind the current state to a particular point in history or undo any change. This ability to rewind the history of the entity enables us to correct errors in state by fixing coding errors and replaying events. This approach simplifies the process of correcting data corrupted by coding errors that would be difficult or impossible to accomplish with the persisted entity state alone.

The event log also makes application tracing and debugging simpler as we can now review the sequence of operations that were applied. When implementing event sourcing, we are able to decouple the representation of an entity's persisted state in the entity and its storage eliminating the entity-persistence mapping problem. An event sourced entity is much easier to scale. With every operation, we are only writing new events to the event store in an append-only manner.

Disadvantages of event sourcing

The move from a persisted-state to an event sourcing system brings with it a significant learning curve. Even though the concept of event sourcing has existed for a long time, it has only recently been embraced as the microservice architectural model has become popular. This relatively new spotlight on event sourcing means there are fewer events sourcing frameworks, a smaller technical support community, and limited commercial and open-source offerings.

With the event sourcing model, we also see a significant increase in storage requirements needed to persist every entity's event history. From a performance perspective, we incur a performance hit when reconstituting an entity. While this performance penalty can be mitigated to some degree with the use of snapshots, it is difficult to match the speed of a relational database read.

It is important to remember that event sourcing is not suitable for every persisted entity type. Events sourcing is best suited for simple, self-contained entities. By far the most significant disadvantage manifests when we need to query across entity attributes. When we store the entities current state in a relation database, it is relatively simple to perform queries. Unfortunately, by itself, event sourcing does not perform well in this regard. This is due to the computational requirements needed to obtain the current state of every entity we would be querying across.

Summary

Event sourcing provides a persistence mechanism for capturing an entity's event history. Rather than store the current state of an entity (the aggregate), event sourcing derives the current state by replaying all of the entity's events. This approach allows the application to derive the entity's state at any given point in its lifetime. Entity snapshots provide an optimization to reduce the performance penalty incurred by replaying log events. Snapshots are made possible by the fact that all log events are immutable.

Coming up

Event Sourcing is a powerful persistence mechanism, but its query capabilities are handicapped by the need to reconstitute each entity from the event log. In our next article, we introduce Command Query Request Segregation (CQRS) to address this deficiency.