Hi, I’m looking at the performance of an existing Event Store environment. Are there any good benchmarks of event store on different hardware configurations and different configuration file settings. Such as changes to number of threads used for workers / readers for example.
Thanks, Ross
To add some extra background - Event store running on an Ubuntu VM with 20 CPU Cores, 24GB memory, SSD Storage, database is around 150GB, and we are seeing SLOW QUEUE MSG in the logs and I’m looking at how to alleviate this. Multiple user projections and the system ones. Thanks, Ross
Also, This instance is not HA, it’s single node in a dev environment.
Without the actual SLOW QUEUE msgs we wouldn’t know which queue is being slow Different queues can be slow for different reasons… Can you provide it? Also how often are you seeing these messages, and under what load do you see them?
Cheers,
Greg
Thanks for the quick reply Greg At the moment its running through a couple of system projections as they have been off for a while, but still see them if i stop those ones and leave just the user projections that are also catching up. Writes from new events at the moment are low, 10’s per second. Log is continuous with them:
{
“PID”: “91015”,
“ThreadID”: “62”,
“Date”: “2019-02-28T18:45:58.774076Z”,
“Level”: “Trace”,
“Logger”: “QueuedHandlerThreadPool”,
“Message”: “SLOW QUEUE MSG [{queue}]: {message} - {elapsed}ms. Q: {prevQueueCount}/{curQueueCount}.”,
“EventProperties”: {
“queue”: “StorageReaderQueue #4”,
“message”: “ReadAllEventsForward”,
“elapsed”: 222,
“prevQueueCount”: 1,
“curQueueCount”: 1
}
}
{
“PID”: “91015”,
“ThreadID”: “53”,
“Date”: “2019-02-28T18:45:58.828095Z”,
“Level”: “Trace”,
“Logger”: “QueuedHandlerThreadPool”,
“Message”: “SLOW QUEUE MSG [{queue}]: {message} - {elapsed}ms. Q: {prevQueueCount}/{curQueueCount}.”,
“EventProperties”: {
“queue”: “MonitoringQueue”,
“message”: “GetFreshStats”,
“elapsed”: 892,
“prevQueueCount”: 0,
“curQueueCount”: 0
}
}
ok so the monitoring queue (GetFreshStats) is likely no issue at all, it comes up on some systems, getting statistics can vary in speed etc… This is an occasional operation and should not change overall performance (its just pulling stats on an interval on a bg thread, I am making a note that we might want to increase this value).
For the other message. How many storage reader messages like this are you getting? These can happen especially if other operations are queued etc for the disk etc (do you monitor disk queues?). In this case it was about 200ms on a read (I can’t see the count (eg size) being used here which is also important (making a note to add it)). Overall this can be many things and if messages are occasional it should not be an issue (disk queue had 50 items in front as example because something else was doing IO etc).
So I’ve done some testing with dd with various read / write sizes on the VM and can get 1GB/s write, 5GB/sec read 1GB/sec uncached read - about what i was expecting for larger files. However, with ES running (system projections off and user projections on - loading up from events not seen yet) iostat is showing near 100% utilisation, high queue and only 10MB/s reads. I’ve tried playing with number of reader threads from 20 - 1, to benchmark differences - all similar. I’ve not looked at host storage stats yet, will try to tomorrow. Wrt the storage reader messages - see this a lot with 20 reader threads, not as often when only 1 (but still there).
Do you have any good benchmark IO tests parameters I can use which roughly simulate ES IO in different scenarios?
Also, With regards to reader / worker thread etc, what numbers should be aimed for? Are the reasons to use different from the defaults in any scenarios?