Been running an operation involving EventStore for about a week and have some questions about performance.
Setup
-
Azure ExtraLarge VM (14 GB memory, 8 logical CPUs)
-
6 x 1 TB VHDs pooled into one 6 TB volume (RAID 0) (no caching, read caching can only be used with up to 4 pooled drives)
-
EventStore arguments: --run-projections=ALL --skip-db-verify --db=(path to pooled volume) --log=(path to C drive)
Workload
Event streams are written to in batches on a 1 sec interval. There is one stream which (currently) contains about 60 million events and generally, every other event in this stream corresponds to an additional event in a new stream. So there are also 30 million, 1 event streams (not enforced through metadata). There is a projection over the first stream which measures events/sec over the whole stream and in sliding, 100 event windows, snapshots of which are emitted to a separate stream. There is also another projection which measures stats for the 30 million streams using the $by_category projection, but this and the $by_category have been disabled out of concern.
Utilization
CPU ~13%
Memory ~200 MB available (98% used)
Disk Read ~ 250 KB/sec / Write ~ 550 KB/sec
Ethernet Send ~ 700 Kbps / Receive ~ 2 Mbps
Projection event/sec ~ 200
Data ~ 60 GB
Queues Snapshot
Snapshot taken at: Mon Nov 04 2013 11:26:33 GMT-0500 (Eastern Standard Time)
Name Length Avg, items/s Avg Time, ms/item Busy, % Peak Max Peak Total processed Current/Last message
============================== ======== ============= ================== ======== ======== ========= ================ ===================================
MainQueue 0 486 0.003 0.1 144 3127 204039062 <none> / EventCommited
MonitoringQueue 0 0 14.696 1.3 0 1 18689 GetFreshStats / GetFreshStats
Projection Core 1 2177 0.166 25.1 252 3654 494824070 n/a
Projection Core #0 0 247 0.045 1.1 250 3654 125841827 <none> / EventReaderIdle
Projection Core #1 0 282 0.005 0.1 148 998 218085111 <none> / EventReaderIdle
Projection Core #2 1 1648 0.449 74.0 252 726 150897132 CoreTick / CoreTick
Projections Master 0 47 0.016 0.1 8 313 38324739 <none> / StatisticsReport
Storage Chaser 0 804 0.520 41.8 0 0 245738715 <none> / ChaserCheckpointFlush
StorageReaderQueue 0 30 31.344 25.5 1 2014 20665597 n/a
StorageReaderQueue #1 0 7 19.048 13.6 0 739 5166400 <none> / ReadAllEventsForward
StorageReaderQueue #2 0 7 60.534 43.4 1 510 5166399 ReadStreamEventsForward / ReadAllEventsForward
StorageReaderQueue #3 0 7 0.253 0.2 0 2014 5166399 <none> / ReadAllEventsForward
StorageReaderQueue #4 0 9 45.541 44.9 1 1748 5166399 <none> / ReadAllEventsForward
StorageWriterQueue 0 20 48.542 100.0 5 30237 23094108 WriteCommit / WriteCommit
Subscriptions 0 385 0.000 0.0 177 1062 98795539 <none> / EventCommited
Timer 13 26 0.019 0.1 13 27 19402931 <none> / ExecuteScheduledTasks
Workers 0 50 0.083 0.1 2 7 24943432 n/a
Worker #1 0 8 0.084 0.1 2 7 6434333 <none> / HttpSend
Worker #2 0 9 0.061 0.1 1 5 4760110 <none> / HttpSend
Worker #3 0 12 0.099 0.1 1 6 5127474 <none> / IncomingHttpRequestMessage
Worker #4 0 11 0.078 0.1 2 5 3924167 <none> / AuthenticatedHttpRequestMessage
Worker #5 0 10 0.095 0.1 1 5 4697348 <none> / HttpSend
The StorageWriterQueue has been red and there are a few errors logged (since Thursday) that indicate slow queues:
[PID:03612:013 2013.10.31 20:00:34.827 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WriteCommit - 8549ms. Q: 0/13.
[PID:03612:013 2013.10.31 21:06:52.669 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WriteCommit - 15243ms. Q: 0/27.
[PID:03612:013 2013.11.01 03:23:30.090 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WriteCommit - 13675ms. Q: 0/25.
[PID:03612:013 2013.11.02 06:16:48.942 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WriteCommit - 7203ms. Q: 0/18.
[PID:03612:031 2013.11.02 07:25:00.471 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #4]: CheckStreamAccess - 33254ms. Q: 0/22.
[PID:03612:036 2013.11.02 07:25:00.471 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #3]: CheckStreamAccess - 24844ms. Q: 0/12.
[PID:03612:034 2013.11.02 07:25:00.471 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #1]: CheckStreamAccess - 27906ms. Q: 0/13.
[PID:03612:025 2013.11.02 07:25:00.471 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #2]: CheckStreamAccess - 30955ms. Q: 0/16.
[PID:03612:013 2013.11.03 10:14:19.014 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 9393ms. Q: 0/24.
No other errors have been logged so far.
Are these stats something to be concerned about? Would it be accepted to enable the $by_category projection as well as the second, custom projection? The StorageWriterQueue is consistently red and 99% busy.