Hello,
In April and May 2017, our cluster throw exception like “VERY SLOW QUEUE MSG” and “CommitTimeout”, it was happened usually, but frequent since April.
Our cluster environment have 3 nodes, all used Rax-Space virtual machine, each machine environment is
- Windows version: Windows Server 2008 R2
- CLR Version: 4
- Framework: 4.6.1
- EventStore version: 3.5.0 OSS
below are our part of error log:
Time
Event
5/4/17 8:15:47.498 AM
[PID:03076:006 2017.05.04 08:15:47.498 DEBUG HttpSendService ] Error occurred while replying to HTTP with message EventStore.Projections.Core.Messages.ProjectionManagementMessage+Statistics: The specified network name is no longer available.
5/4/17 8:15:39.599 AM
[PID:03076:011 2017.05.04 08:13:39.599 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 50094ms. Q: 0/22.
5/4/17 8:09:39.599 AM
[PID:03076:004 2017.05.04 08:09:12.871 INFO CoreProjectionCheckp] Failed to write events to stream $ce-xxx. Error: CommitTimeout
We used https on each server, and each server have more than 800 trunk files, usage almost 210GB disk space.
From the resource monitor, the disk and network have low usage, disk keep 1 or 2 MB/s write rate, and network just less than 10% usage.
So could you please help me to check this error?
In other side, to avoid time difference between synchronized data from master to slave, we setting our application write and read both from master.
and in master, EventStore.ClusterNode.exe loaded lots of trunk file at same time, just one trunk for write, other are read, it matched with our application rule, but I don’t know whether this trigger above “Very Slow Queue Msg” error.
here is the master server disk monitor screenshot:
Thanks for you help
Chaohui Zhang