Poor performance from Event Store

Firstly the TL;DR of the below detail is that we’re seeing really poor event store performance when writing new events “as quickly as possibly”. There is a load more information I can provide on request or publish somewhere else and link to (e.g. logs, segfault/crash info, CPU/disk stats from the host, graphs of performance metrics, etc).

Use case:

  • We are using eventstore to track all changes in “Products” so that we have a full audit trail/history of every Product in the system.
  • The current state of a Product is obtained at regular intervals (typically daily, but can equally be weekly or hourly).
  • The “commands” that represent these current Product states arrive in a message queue that our “Product Command Processor” handles
  • For each command the command processor will
    • Read all existing events in eventstore for the Product in question to reconstruct the aggregate
    • (If there are no events for this Product, emit a ProductCreated event with containing the full Product detail)
    • Compare the aggregate state to the state in the incoming message and emit the new events describing the state change
  • We have the by_category system projection enabled so that we can have projections that subscribe to $ce-product in order to stay up to date with all Product events

Some numbers

  • We get approximately 12 million product updates per day
  • There are approximately 50 million products in existence (this grows steadily as new ones are found)
  • We write approximately 35 million events (25 GB of event data) per day based on the detected Product changes

Our Setup

  • We running on a single node linux VM EC2 instance in AWS and have tried a few different flavours:
    • c5n.4xlarge with “vanilla” SSD EBS drive
    • c5n.4xlarge with “high iops” SSD EBS drive
    • i3en.2xlarge with dedicated local SSD for the event data location (disk iops is even higher - currently running this one but still seeing the same issues)
  • We run a single instance of Event Store (linux version) on this single EC2
  • We sometimes run multiple instance of the command processor (in an effort to squeeze more performance from ES)
  • In general we find things run smoothly when we start out and everything is keeping up with the incoming data stream nicely
  • We run into issues when we try and populate eventstore more quickly:
    • by running through the last 24 hours of commands which we have in a kafka topic
    • or, if we have had some sort of error or redeployment of the command processor meaning there is a backlog of commands to process
  • Our persistent subscriptions use the default settings and use the $ce-product stream

Issues

  • When our command processor is trying to catch up with a backlog, i.e. trying to go as fast as possible (trying to write new events, but also doing reads to reconstruct aggregates as part of that process)
    • Persistent subscriptions grind to a halt
      • On the UI, the Persistent Subscriptions page is empty.
      • On the dashboard there is a large build up of work in the PersistentSubscriptions queue (other queues are fine). Observed that even if we stop all our external processes (no requests in or out of ES) this queue is processed at a pitiful rate (e.g. <100 item/s)
      • The system projection starts to fall behind (note that even with this turned off temporarily, we generally don’t see an improvement in event writing performance)
      • Consumers of those subscriptions (our own projections) initially get starved (maybe < 10 events per second) and soon after the subscription just dies and those consumers can’t reconnect
    • UI becomes unresponsive and/or get regular “an error has occurred” banners
    • We can’t monitor what’s happening via grafana as the prometheus request to scrape the metrics from the eventstore endpoint times out with this message : time=“2019-10-22T15:37:04Z” level=error msg=“Error while getting data from EventStore” error=“HTTP call to http://172.xx.xx.xx:2113/subscriptions resulted in status code 408”
    • Recovery happens when the command processor rate slows down to something more pedestrian, once the system projection and the PersistentSubcriptions queue catches up, the subscriptions re-appear in the UI and the consumers can connect again
  • We’re trying/hoping to process Product updates at a few thousand messages per second at least (to catch up with a backlog) rather than the low hundreds which is what we tend to see (this is only just about enough to keep up with live data).
  • We regularly see “SLOW QUEUE MSG” and “SLOW BUS MSG” in the event store logs. When things are really bad we also get “VERY SLOW QUEUE MSG” in the error logs. We don’t truly know what this is really telling us and how to react, though obviously it is somewhat of an indicator that we are putting eventstore under stress.
  • We observe that the eventstore machine doesn’t seem to make full use of the resources available to it.
    • Typically when we see poor performance we might see 25-50% CPU utilization and low disk iops (5k reads/writes per second) compared to what the instances should be capable of.
    • … but we have seen much higher figures when things are going smoothly (CPU around 60% and up to 150,000 disk iops)
  • Generally not getting great performance when reading a system category projection such as $ce-product, as compared to a raw stream of events. Suppose this is expected to some extent as the former has to resolve the linked events?

First thing that jumps put at me is how long are the product streams typically that your reading? Have you tried offloading work by snapshotting to another medium (redis or similar) to reduce the read burden?

I would add long lived streams are not a great strategy but looking into snapshots would be a tactical solution to reduce some load. What size are these product streams and what is being updated daily? pricing/availability info?

Since you are having problems with “writing”

Can you run the test client and run wrfl 10 30000

What do you get back?

“VERY SLOW QUEUE MSG”

Is basically an assurance this is not an eventstore issue. The only time you see these is when we are doing things that take massive amounts of time longer than expected. In almost all cases this is caused by … IO.

My first few thoughts here:

Think about using a “hot aggregate” pattern in the command handler. i.e. keep aggregates cached in memory at the command processor and use optimistic concurrency on write to see if the product has changed, on cache miss or concurrency exception load the missing events. This should much more efficient.

What is your write batch size? Basically just make sure you’re not writing events 1 at a time.

How many persistent subscriptions are you running?

-Chris

Just to add (responding to Chris’s email but in general).

Persistent subscriptions can easily handle 1000s of messages/second if you are just acking them. What level of processing are you doing associated? Usually the bottleneck is the client code not the subscription overall as example you might be doing an insert into a database or posting to a uri? Don’t get me wrong you can get it to be the subscription but even if only deserializing the messages and counting etc its likely not the subscription as the limitation.

Most streams are a few events per day, but there are also significant number that get updated many times a day. We’ve just implemented snapshots using an adjacent stream “productSnapshot-xxxx” such that the command processor only needs to read 100 or so events to reconstruct the aggregate, but this hasn’t made a difference to the specific issues we’re seeing here.

Caching in memory wouldn’t be effective unless there was enough memory to cache most of the daily products (there are too many products so with a size limited cache it would be mostly misses) but certainly considering trying an distributed cache to reduce reads and see what happens. Each command is handled in its own thread and writes all the events for the updated stream at once using AppendToStreamAsync. I don’t know how we can write in any bigger batches as the write operation is specific to a stream.

We run 2 persistent subscriptions when able to, but important to stress we’re unable to run them at all when the command processor is working hard. Surely we shouldn’t have to scale back on the incoming message rate in order to be able to even connect consumers to subscriptions?

Could you try to snapshot more often away from ES, we had to do this to Redis and gained significant perf improvement

Indeed, we do get 1000s of messages/second on persistent subscriptions when we aren’t stressing things on the command processor side (i.e many stream reads/writes). We also have a test consumer with all code in the event handler removed except ack as a benchmark test. A subscription on the system projected $ce-product is slower than one on a purely written stream but still can get e.g. 3000 messages per second.

But the main problem is, persistent subscriptions just won’t work at all with an unrestricted command processor - like I said the web UI won’t even show the subscriptions until we switch of the command processor and wait for the system projection and the PersistentSubscription internal queue to catch up again.

I’ll try the test client on Monday and let you know - maybe I can even get it to reproduce this issue with persistent subscriptions.

You said you are creating a thread per command?

That sounds interesting.

Try turning off the standard projections that you are not using.

Writing out the linkto events and checkpoints into the event log, while not super expensive also isn’t free.

-Chris

All the system projections are stopped except for the by_category one which we rely on for the subscriptions used by our projections... We've tried stopping that one too to alleviate the problem (since we can't connect to subscriptions right now anyway)... things do get marginally better, but then all the harder for the by_category projection to catch up again when we turn it back on... it takes hours in fact even if we stop the command processor so no external activity is going on.

Out of curiosity, are you putting your logs and database on separate disks?

The current pattern is (.NET Core code):

  • We get a batch of commands to process at a time (100 by default, we can configure this)

  • We do:

Task.WhenAll(commands.Select(async => command => await ProcessCommandAsync(cmd)));

  • ProcessCommandAsync is the code for handling each message (reads the events for the aggregate, writes new events)

  • Essentially threadpool will determine how many threads/tasks run in parallel, but that’s all I meant by separate thread - each is an individual Task and they can run at the same time.

  • They all use the same client connection to event store.

Noticed we had a crash. These seem fairly random in occurrence and content - I usually get a “.NET” stack trace, but this one isn’t:

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,68,10:54:54.748] SLOW QUEUE MSG [“Worker #6”]: “TcpSend” - 283ms. Q: 0/0.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,75,10:54:54.748] SLOW QUEUE MSG [“StorageReaderQueue #10”]: “ReadStreamEventsForward” - 292ms. Q: 27/27.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,32,10:54:54.748] SLOW BUS MSG [“PersistentSubscriptionsBus”]: “EventCommitted” - 286ms. Handler: “PersistentSubscriptionService”.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,83,10:54:54.755] SLOW QUEUE MSG [“StorageReaderQueue #11”]: “ReadStreamEventsForward” - 294ms. Q: 14/14.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,30,10:54:54.765] SLOW QUEUE MSG [“StorageReaderQueue #9”]: “ReadStreamEventsForward” - 300ms. Q: 11/11.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,66,10:54:54.794] SLOW QUEUE MSG [“StorageReaderQueue #13”]: “ReadAllEventsForward” - 2072ms. Q: 0/5.

Oct 25 10:54:54 ip-172-30-17-47 eventstored[24870]: [24870,59,10:54:54.794] SLOW QUEUE MSG [“StorageReaderQueue #12”]: “ReadAllEventsForward” - 2073ms. Q: 0/3.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,59,10:54:55.077] SLOW QUEUE MSG [“StorageReaderQueue #12”]: “ReadStreamEventsForward” - 282ms. Q: 2/2.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,44,10:54:55.077] SLOW BUS MSG [“bus”]: “UnwrapEnvelopeMessage” - 283ms. Handler: “UnwrapEnvelopeHandler”.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,44,10:54:55.077] SLOW QUEUE MSG [“Projection Core #0”]: “UnwrapEnvelopeMessage” - 283ms. Q: 0/252.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,32,10:54:55.077] SLOW BUS MSG [“PersistentSubscriptionsBus”]: “EventCommitted” - 283ms. Handler: “PersistentSubscriptionService”.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,75,10:54:55.090] SLOW QUEUE MSG [“StorageReaderQueue #10”]: “ReadStreamEventsForward” - 299ms. Q: 12/14.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,66,10:54:55.090] SLOW QUEUE MSG [“StorageReaderQueue #13”]: “ReadStreamEventsForward” - 297ms. Q: 4/4.

Oct 25 10:54:55 ip-172-30-17-47 eventstored[24870]: [24870,83,10:54:55.098] SLOW QUEUE MSG [“StorageReaderQueue #11”]: “ReadAllEventsForward” - 319ms. Q: 9/9.

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: * Assertion at mini-exceptions.c:2871, condition `tls->signal_stack’ not met

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: Stacktrace:

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /proc/self/maps:

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 4047e000-408c6000 rwxp 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 41947000-41957000 rwxp 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 55ffff124000-55ffff55b000 r-xp 00000000 103:03 25232 /usr/bin/eventstored

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 55ffff75a000-55ffff761000 r–p 00436000 103:03 25232 /usr/bin/eventstored

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 55ffff761000-55ffff766000 rw-p 0043d000 103:03 25232 /usr/bin/eventstored

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 55ffff766000-55ffff77d000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 560000751000-56000277e000 rw-p 00000000 00:00 0 [heap]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2f65eff000-7f2f65f00000 —p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2f65f00000-7f2f65f01000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2f65f01000-7f2f65f09000 —p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2f65f09000-7f2fe0400000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe0500000-7f2fe0800000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe0900000-7f2fe0c00000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe0d00000-7f2fe2000000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe2100000-7f2fe2800000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe2900000-7f2fe4000000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe4100000-7f2fe7e00000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fe7f00000-7f2feca00000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2fecb00000-7f2ff4700000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2ff4800000-7f2ff6b00000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2ff6c00000-7f2ff7300000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2ff7400000-7f2ff8400000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f2ff8500000-7f3001500000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f3001600000-7f3001900000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 7f3001a00000-7f3002900000 rw-p 00000000 00:00 0

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: Memory around native instruction pointer (0x7f3d3496ce97):

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 0x7f3d3496ce87 d2 4c 89 ce bf 02 00 00 00 b8 0e 00 00 00 0f 05 .L…

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 0x7f3d3496ce97 48 8b 8c 24 08 01 00 00 64 48 33 0c 25 28 00 00 H…$…dH3.%(…

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 0x7f3d3496cea7 00 44 89 c0 75 1f 48 81 c4 18 01 00 00 c3 0f 1f .D…u.H…

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: 0x7f3d3496ceb7 00 48 8b 15 a9 bf 3a 00 f7 d8 41 b8 ff ff ff ff .H…:…A…

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: Native stacktrace:

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0xbfdb7) [0x55ffff1e3db7]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f3d34f49890]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7f3d3496ce97]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7f3d3496e801]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0x2f2bd8) [0x55ffff416bd8]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0x307c7b) [0x55ffff42bc7b]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(monoeg_assertion_message+0xa3) [0x55ffff42c143]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0xbf8ff) [0x55ffff1e38ff]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0x3b81c) [0x55ffff15f81c]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0x3b8c7) [0x55ffff15f8c7]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /usr/bin/eventstored(+0x23a1ff) [0x55ffff35e1ff]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f3d34f3e6db]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f3d34a4f88f]

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: Debug info from gdb:

Oct 25 10:54:56 ip-172-30-17-47 eventstored[24870]: mono_gdb_render_native_backtraces not supported on this platform, unable to find gdb or lldb

Hi Greg, this is the command output when with our command processors running:

wrfl 10 30000

[21826,11,14:06:36.147] Processing command: wrfl 10 30000.

…[21826,11,14:06:41.508] Command exited with code 0.

[21826,08,14:06:41.508] Completed. Successes: 30000, failures: 0 (WRONG VERSION: 0, P: 0, C: 0, F: 0, D: 0)

[21826,08,14:06:41.508] 30000 requests completed in 5359ms (5598.06 reqs per sec).

[21826,08,14:06:41.508] [[begin

DataName:WRFL;

clientsCnt:10;requestsCnt:30000;ElapsedMilliseconds:5359;

successes:30000;failures:0;

end]]

[21826,08,14:06:41.508]

##teamcity[buildStatisticValue key=‘WRFL-10-30000-reqPerSec’ value=‘5598’]

[21826,08,14:06:41.508]

##teamcity[buildStatisticValue key=‘WRFL-10-30000-failureSuccessRate’ value=‘0’]

[21826,08,14:06:41.508]

##teamcity[buildStatisticValue key=‘WRFL-c10-r30000-st1000-s256-reqPerSec’ value=‘5598’]

[21826,08,14:06:41.508]

##teamcity[buildStatisticValue key=‘WRFL-c10-r30000-st1000-s256-failureSuccessRate’ value=‘0’]

fastest: 2

quintiles

20% : 2-209

40% : 209-265

60% : 265-339

80% : 339-477

100% : 477-1632

90% : 672

95% : 851

98% : 1042

99% : 1166

99.5% : 1294

99.9% : 1596

99.99% : 1620

99.999% : 1632

99.9999% : 1632

99.99999% : 1632

Highest : 1632