Some architectural questions

Aleksander_Heintz · August 8, 2016, 7:43am

Hi. I just started to look at event sourcing a few days ago, and decided to make a small project using it to test out the different technologies. After watching a good few talks on the subject, and reading some of the information available, I think the overall design of the system I’m planning to make is starting to take somewhat of a shape in my head, but I thought I’d ask about it here since I’m new to all of this and it might be a completely wrong approach.

The system I’m planning can be considered to be a blog-like system consisting of posts posted in categories and a user database. Now what I’ve started thinking about is the fact that you can have different read “systems” for your data, which is somewhat intriguing I think, so I was thinking of having a service that dealt with login and another one that dealt with the “blog” posts.

Here are my thoughts so far:

2 (or more) write services. One for the registration/manipulation of users, and another one for categories and posts. These do business logic, and optimistic validation against the read store (like checking if email is already in use) and then simply write to the event store.

1 login service. This subscribes to the event store and builds up an in-memory user database. Considering there shouldn’t be too many users, in memory should be fine. This service implements SSO, as well as some simple APIs for the write service to check if an email is in use etc.

1 static content generator. I’m thinking similar to how GitHub pages work, I can just subscribe to posts in the event store, and generate static content for them, then put it behind a simple nginx server or similar.

So my main question is this: Does this seem like a reasonable approach? At least to get started? Obviously this system would not have much traffic, cause it’s just something I’m building as a hobby to test out technologies, but I’m trying to lean how one can build something that can be scaled, so for instance I really liked the idea of doing static page generation that could be distributed to as many “simple” nodes as I would want to support huge amounts of traffic that would provide eventual consistency (cause I don’t really care if it takes a few minutes or more before the posts are visible to the world). Also, this could ensure that even if my “blog site” is being DDOSed, I could still have the “admin site”, where one would go to write posts be completely functional, cause they are two different systems.

I also have a question with regards to eventual consistency, which is something I’ve never worked with before. One of the talks I saw had a great example of two users trying to register at “the same time” with the same email address. Both events were allowed into the event store, then in the handler when the second UserRegistered event was being handled, it saw that it was a duplicate email address, and created a new event to describe the error, while sending out a mail to the user explaining that something went wrong. Now, this is all fine, and the functionality should be easy to create, but my question is this: If I later down the line create a second (or restart) my event handler, how am I supposed to prevent it from sending the mail again? Do I keep track of “up to which event” has been handled somewhere? What’s a good way to deal with this situation?

Thanks. Alxandr.

urbanhusky · August 8, 2016, 8:23am

Answers inline…

Hi. I just started to look at event sourcing a few days ago, and decided to make a small project using it to test out the different technologies. After watching a good few talks on the subject, and reading some of the information available, I think the overall design of the system I’m planning to make is starting to take somewhat of a shape in my head, but I thought I’d ask about it here since I’m new to all of this and it might be a completely wrong approach.

The system I’m planning can be considered to be a blog-like system consisting of posts posted in categories and a user database. Now what I’ve started thinking about is the fact that you can have different read “systems” for your data, which is somewhat intriguing I think, so I was thinking of having a service that dealt with login and another one that dealt with the “blog” posts.

Do you really need event sourcing for this? Or what is your reasoning for wanting to use event sourcing?

Here are my thoughts so far:

2 (or more) write services. One for the registration/manipulation of users, and another one for categories and posts. These do business logic, and optimistic validation against the read store (like checking if email is already in use) and then simply write to the event store.

I’d try to look at this more from the domain perspective, less from a technical perspective (keywords: bounded context, aggregate, domain-driven design). So you’d end up with something like a registration context and a blog context.
Validation should ideally not be against the read store. Keep the state you need to validate (i.e. safeguard your invariants, business rules) in the aggregates. The aggregate is your domain model and used during writing. This can be as simple as having a document per aggregate with some business logic. (keyword: CQRS)

1 login service. This subscribes to the event store and builds up an in-memory user database. Considering there shouldn’t be too many users, in memory should be fine. This service implements SSO, as well as some simple APIs for the write service to check if an email is in use etc.

Don’t forget that you’ll have to read all events to build the state. However, login would be a separate context and you could integrate the events into separate streams (essentially a projection of events into your authentication context)… but that might be too complex for something simple like a blog. (The more I think about it, the more overlap I see between the SCRUM example in Implementing Domain-Driven Design - a worthwhile read btw; just skim over all the implementation-specific Java details)

1 static content generator. I’m thinking similar to how GitHub pages work, I can just subscribe to posts in the event store, and generate static content for them, then put it behind a simple nginx server or similar.

So my main question is this: Does this seem like a reasonable approach? At least to get started? Obviously this system would not have much traffic, cause it’s just something I’m building as a hobby to test out technologies, but I’m trying to lean how one can build something that can be scaled, so for instance I really liked the idea of doing static page generation that could be distributed to as many “simple” nodes as I would want to support huge amounts of traffic that would provide eventual consistency (cause I don’t really care if it takes a few minutes or more before the posts are visible to the world). Also, this could ensure that even if my “blog site” is being DDOSed, I could still have the “admin site”, where one would go to write posts be completely functional, cause they are two different systems.

The biggest concern regarding scaling would be how you design your aggregates - i.e. how much concurrency you’ll have to handle.

I also have a question with regards to eventual consistency, which is something I’ve never worked with before. One of the talks I saw had a great example of two users trying to register at “the same time” with the same email address. Both events were allowed into the event store, then in the handler when the second UserRegistered event was being handled, it saw that it was a duplicate email address, and created a new event to describe the error, while sending out a mail to the user explaining that something went wrong. Now, this is all fine, and the functionality should be easy to create, but my question is this: If I later down the line create a second (or restart) my event handler, how am I supposed to prevent it from sending the mail again? Do I keep track of “up to which event” has been handled somewhere? What’s a good way to deal with this situation?

I’d say: sending mail would be a long running process (process manager/saga), which has its own (persisted) state - sending an email would be a command then. Or you could solve that via separate messages…

Caveat: That is my naive interpretation of things.

Aleksander_Heintz · August 8, 2016, 9:14am

I obviously “don’t need” event sourcing, but I want to learn event sourcing, and I find that having a project I want to do is the best way to learn new technologies. Also, I’ve been planning this project for a while, and initially I planned to base it on git as the “database” for the posts, because I want the full history. Then I learnt about event sourcing and figured that was probably a good fit. Also, as said, I’m very new to this and have some more reading to do, so I don’t know what a bound context is, but I’ll look into it. I’ve also not yet fully understood aggregates, but I’m currently reading about it. I’ll also try to look up the SCRUM example in implementing DDD. There are obviously several things I don’t yet understand here, and need to learn about, and wether or not your interpretation is correct or not is hard for me to say given your replies, but I appreciate the feedback nonetheless :).

Greg_Young1 · August 8, 2016, 11:33am

"If I later down the line create a second (or restart) my event
handler, how am I supposed to prevent it from sending the mail again?
Do I keep track of "up to which event" has been handled somewhere?
What's a good way to deal with this situation?"

This is a process not a projection. Processes don't get replayed. That
said you could also dedupe your emails being sent (eg recognize that
you sent it before and therefore not send a second time) though its a
bit wasteful

Aleksander_Heintz · August 8, 2016, 11:49am

These “processes”, is this for instance a node working on a “persistent subscription” as found here? http://codeopinion.com/event-store-persistent-subscriptions/
There were very little obvious information available when googling “eventstore process” or “eventstore processes”.

Greg_Young1 · August 8, 2016, 11:55am

You can run them off a persistent subscription (especially if you need
to load balance or high availability via multiple instances). Its more
that they are not replayed.

Aleksander_Heintz · August 14, 2016, 10:16pm

Thanks. I think I’m starting to get a better understanding of some of the concepts in use. Especially, I didn’t know what an aggregate was previously, but I think I understand it now. However, I have another question that is likely very EventStore spesific, that I haven’t been able to find described anywhere. If I want to store a list of users (not as states, but as a list of created events and changed password events etc.) in the eventstore, I would consider a User an aggregate, right? I would also have a stream in event-store per user (something like “User-”). My question is then when building up the read model, how would I get a list of all users? Do I just go through all events globally, looking for created user events? Or do I loop through streams somehow, looking for ones that start with “User-”? I noticed that the javascript projections had the notion of categories, so I guess I could make a projection in the eventstore that reacts to users being created and post a new event to a “all-users” stream or similar, but I would really like (for the time being at least) to stick with writing .NET code if I can.

Rickard_Oberg · August 15, 2016, 3:56am

In my case the read model subscribes to user events (the subscriber is
for the "all" stream really), and puts them into a database that I can
then query for all users (Neo4j in my case). I also put the events
into ElasticSearch so I can easily see what has happened for each
user, or how many times a day people change passwords, or whatnot
(anything time related queries go to ES basically).

/Rickard

Aleksander_Heintz · August 19, 2016, 3:25pm

Is the way really to go through ALL events in the entire event store? Even though you might just want a tiny portion of them? Is there no way to get events “by category” using the API?

Andrew_Badera · August 19, 2016, 3:26pm

Would like a way to do this was well … without involving projections.

Greg_Young1 · August 19, 2016, 8:57pm

see by-category projection.

jen20 · August 20, 2016, 12:18pm

Would like a way to do this was well … without involving projections.

This would require secondary indices to do in any manner which didn’t involve looking at every event.

Greg_Young1 · August 20, 2016, 12:54pm

It could also be done by writing your own linktos (from outside)
without using projections internally.