Versioning events

daxyhr · September 14, 2021, 6:59am

Ok, so after a little break from investigating eventsourcing, I’m back with a little question.

From what i understand its preferred to version events per aggregate…

aggregate_id, version, data
1, 1, {…}
1, 2, {…}
2, 1, {…}
1, 3, {…}

Or maybe store it in sperate streams:
Stream 1:
1, 1, {…}
1, 2, {…}
1, 3, {…}

Stream 2:
2, 1, {…}

So if I want to make a projection, i keep track of it like:
id, current_version
1, 3
2, 1

correct?

On the other hand the projections should always follow the complete stream? So why not version everything combined?:

version, aggregate_id, data
1, 1, {…}
2, 1, {…}
3, 2, {…}
4, 1, {…}

now the projections only need to remember it processed up to version 4?

yves.lorphelin · September 14, 2021, 7:05am

From what i understand its preferred to version events per aggregate…
What do you mean by Version ?

now the projections only need to remember it processed up to version 4?
That’s the prefereed way , you keep a checkpoint of the stream you’re getting data from
( either a “real” stream , some index stream ($ct- , …) or the $all stream-

remember that projection might uses data from multiple streams ( that’s whay you need $all, $ct-, …) to show some kind of report accross agregates

daxyhr · September 14, 2021, 7:17am

In most examples about eventsourcing (in general so not really eventstoredb related) I see they keep a record of the version (so keep the events in order and to check if none are missing?), from what i see its always an “auto” incremental integer?

So lets say we have accounts when would you set up a stream per account or when will you setup a stream accounts that keeps track of all the events on all the accounts?

yves.lorphelin · September 14, 2021, 9:08am

Individual streams have indeed revisions , the main reason is for optimistic concurrency & ordering as you mentionned .
Most of the time you want granunal streams: the stream is the transactional boundary at the database level when appending events.

1 stream per account or 1 stream with all accounts… is a modelling issue and higly dependent on the specifics needs of the system.

In the case of eventstore DB , it’s easy to have 1 stream per account and then have all event of all “account” streams using system projections : https://developers.eventstore.com/server/v21.6/docs/projections/system-projections.html#by-category

oskar.dudycz · September 14, 2021, 10:11am

@daxyhr selecting which number to use will depend on your case. If you’re using EventStoreDB subscriptions to build the read model, then you can either use subscription to:

$all stream
specific stream
specific stream from projection.

$all is a specific stream representing the append-only log of all events in the system. It has a position that’s monotonic but not autoincremented (Note: to be precise, it’s built from two positions commit and prepare. However, if you’re not using transactions, then for brevity, let’s say that you can use commit position.).

Stream revision is an autoincremented number (that’s also most of the case for other event stores but may vary depending on the specific implementation).

If you’re subscribing to the $all stream, you’ll get events from the multiple streams in order of appearance. You’ll have to store the checkpoint position of the last processed event. Position can also be used for deduplication/idempotency: e.g. to check if the specific event wasn’t already processed. That’s a safe assumption if you’re storing checkpoint information together with updating projections because regular subscriptions guarantee order (that’s not the case for persistent subscriptions).

For specific streams, you either use global position or stream revision. I’d probably suggest using always global one for simplicity.

See also the drawing:

daxyhr · September 17, 2021, 6:15am

Thanks for the insights. For now I’m trying to learn about event sourcing without a specific implementation (i.e. not only the Eventstore way). Eventually I will really look into Eventstore because it looks like a great way to handle event sourcing… But for now I’m really trying to understand how this would all work on paper .

One thing I’m not really sure how to handle is the following…

Let say we got a db with 500k users, it tracks stamp-cards for users…
So now we have added a new stamp card to the system, so all users would require a new instance of the stamp card.

In a relational database I could do something like:

INSERT INTO instance(…)
SELECT …
FROM user
LEFT JOIN stamp_card ON …
WHERE stamp_card.id is NULL;

Now all 500k that don’t have a stamp-card yet will have the stamp card.

How would you handle this in event sourcing (preferable not a specific implementation). Would this require to query for all 500k users? Then loop through all 500k and store a new event for each one of them?

yves.lorphelin · September 17, 2021, 8:11am

learn about event sourcing without a specific implementation

yes and no. I understand that, but as in the relational world , you need some set of functionalities provided by the DB to make your life easier.

In a very abstract way, you need to do exactly the same as in the SQL query in your sample , but not set based
e.g. one of the functionalites I require from a database storing event , is the possibility to have subcriptions , because it allows to build reactive components easily .Events needs to be delivered to consumers from any point in the dabase (let’s say from the beginning, the end, some point in between)

so to answer your question , if the db provides that functionality
I’d subscribe to a stream of events, and reacts to it creating those stamp-cards if they are not present
that is not a one time thing, the same code create the historical stamp-cards & the new one
It can run as long as stamp-cards are needed, creating those for past events & new events as they arrive.
the day you don’t need to generate those stamp-cards, you just remove that one component from the system. Or create a new time-stamp genration component version and remove the old one .