Performance and Scale Data?

We’re looking to retool a few systems, one with a (fairly) high rate of events and another that is DDD but not currently very event sourced, but will be. EventStore appears as if it would address many of the functional requirements, but it isn’t clear from the documentation if it would be a great fit from a scale and performance perspective.

Before we go off and do an involved POC - has anyone published any information around scale and performance?

The higher event scale system might be characterized using some of the following:

Number of Events/Day: 200Mi

Eventing Rate: Current rate is ~ 2kps but hoping for more upside

Number of Streams: ~ 10Mi but will grow over time

Number of Categories: ~ 15 (possibly fewer), most activity would be from 2 or 3 categories

Number of different Services Consuming/Generating Events: ~ 10 service with 2-6 instances each so lets say fanout is 40

I understand the answer is always going to depend on a lot of things. I’d just prefer to avoid a time-consuming POC if EventStore is not going to be a good fit.

I’d appreciate anyone pointing me at anything that would be helpful.

EventStore can pretty easily do 2000/sec (hardware dependent). This is not really the problem …

At 200m/day no unpartitioned store will work well over time. You would be discussing > 50,000,000,000 events/year. The first goal should be some partitioning strategy. No?

Even if relatively small events say 500 bytes we are talking close to half a PB your first year at this rate.

Thanks, yeah there’d be some data management. A lot of the events become uninteresting after a short period of time.

I got some push back from a few devs when I demoed some of the basics, but they were conflating RabbitMQ write behavior with actual event instances.

Be very careful when saying “they become uninteresting” … I have rarely (as in essentially never!) found this to be true. I would plan instead on keeping all forever! The tradeoff is usually in raising access times based on age as opposed to getting rid of them. The thing is when they DO become valuable they often become MASSIVELY valuable. As an example what if I ran a db per week then archived that DB … hot for 4 weeks, warm for 8, then cold.

To be fair how much does a 20TB NAS cost? :wink: hint: less than $1000 :open_mouth:
https://www.amazon.com/Cloud-Ultra-Network-Attached-Storage/dp/B07179ZYH2/ref=sr_1_3?gclid=EAIaIQobChMIuJGH96uB5gIVDZSzCh1tzwXJEAAYAiAAEgLcq_D_BwE&hvadid=390257496928&hvdev=c&hvlocphy=9003227&hvnetw=g&hvpos=1t2&hvqmt=e&hvrand=9006223505792082124&hvtargid=kwd-302611396505&hydadcr=4233_9338993&keywords=20tb+nas&qid=1574546835&sr=8-3 .

That I could answer this question today (even though it took me two weeks to calculate) is often worth … millions. It doesn’t take much of a probability of this occurring to justify buying some $1000 NASes :slight_smile: This is especially true when we consider that we have a team of developers working on this so are likely spending > $5-10,000/day in development costs. If I can justify developers it becomes highly unusual to not be able to justify the storage.

I have literally seen only a single case in my career where this was not the case :open_mouth: The reason it was not the case was that the behaviour was changing so quickly that anything older than a week to a month was no longer worth looking at :open_mouth: “things work differently today” (think a new growing/quickly changing market like maybe a certain crypto-currency a few years ago …). Any guesses what the primary focus of this team was? :smiley:

Good point. When I said less interesting, my meaning was the operational database. Everything will end up in the data lake fior analytics and data science.

Keeping all events allow you to retroactively replay new projections made on new business insight. The context of most day to day is based around a couple weeks to a month-- that’s why people often think that keeping ALL events is not necessary.

This discussion was interesting enough for me to sign up with the group because I get the same kind of pushback and conflation when trying to get buy-in for event sourcing. First off, are we discussing Event Store or event sourcing? Because the RabbitMQ conflation sounds like there is also conflation between Event Store and event sourcing.

People often think a message broker alone is event sourcing. ES is still hugely misunderstood today regardless of technology all around us essentially adopting it. Blockchains, Redux-Sagas, etc. I argue that DDD is actually not as important as realizing that with event sourcing, you have implicit CQRS. You can have CQRS without ES, but cannot have ES without CQRS. Separating your read and write models + having your services communicate through a flat interface (events) is convenient for any infrastructure.

For analytics, imagine you had aggregated data sets that need to be computed and re-computed as commits flow in the db. You can use a combination of upsert or materialized views or triggers to achieve this. But with event sourcing, you just create a projection. In addition, you can replay this projection on historical data if you wanted to. Versioning data is another painful thing in SQL. With Event Store, you can just snapshot. Or if you rolled your own event sourcing, just rename your new events after a backup (as opposed to a versions table in SQL).

Of course this might actually not be anything new. I apologize if I went off-target. It just struck me as strange that RabbitMQ was conflated with Event Store.