Tuning options to reduce amount of [SLOW QUEUE MSG] and [SLOW BUS MSG]

Charles_Solar · December 7, 2016, 11:23pm

I am running ES in a high-event scenario with 3 projections and 2-4 competing consumers for each.

ES is handling ~3000 items/s constantly and each projection is writing ~30 events/s.

I keep track of my log files and not unexpectedly ES’s log is full of

[PID:04808:020 2016.12.07 23:09:02.636 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projection Core #0]: CommittedEventDistributed - 75ms. Q: 44/46.

[PID:04808:023 2016.12.07 23:09:02.636 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projection Core #3]: UnwrapEnvelopeMessage - 75ms. Q: 27/33.

[PID:04808:022 2016.12.07 23:09:02.638 TRACE InMemoryBus ] SLOW BUS MSG [bus]: UnwrapEnvelopeMessage - 136ms. Handler: UnwrapEnvelopeHandler.

[PID:04808:022 2016.12.07 23:09:02.638 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projection Core #2]: UnwrapEnvelopeMessage - 137ms. Q: 0/118.

[PID:04808:021 2016.12.07 23:09:02.657 TRACE InMemoryBus ] SLOW BUS MSG [bus]: UnwrapEnvelopeMessage - 144ms. Handler: UnwrapEnvelopeHandler.

[PID:04808:021 2016.12.07 23:09:02.676 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projection Core #1]: UnwrapEnvelopeMessage - 164ms. Q: 0/135.

[PID:04808:020 2016.12.07 23:09:02.717 TRACE InMemoryBus ] SLOW BUS MSG [bus]: UnwrapEnvelopeMessage - 57ms. Handler: UnwrapEnvelopeHandler.

[PID:04808:020 2016.12.07 23:09:02.717 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [Projection Core #0]: UnwrapEnvelopeMessage - 57ms. Q: 25/194.

``

after a day ES logs can grow to gigabytes in size of basically just these lines. And yes its always “Projection Core” that produces these.

My machine is configured with 8 cores, 8gb of ram, and an intel 750 pci-e SSD to write to - I don’t think I can get much better hardware wise so I’ve tuned my config a bit

Greg_Young1 · December 8, 2016, 1:49am

We will bump the time on those when it traces (and possibly add an
argument to disable it). You can also turn off trace for just the one
logger Projections.Core
https://github.com/nlog/NLog/wiki/Configuration-file#example-rules

IntTcpHeartbeatTimeout: 300
ExtTcpHeartbeatTimeout: 300
IntTcpHeartbeatInterval: 60 <- this is extremely low is there a reason for it?
ExtTcpHeartbeatInterval: 60 <-

UnsafeDisableFlushToDisk: True <-- probably shouldn't be set.

Charles_Solar · December 8, 2016, 2:35am

Ok thanks - I’ll just disable those logs entirely then. So long as its not a symptom of some greater issue.

The heartbeats have been something I’ve been constantly fusing with, at some point a few months ago I needed to know asap if a connection was dropped but I think its safe to remove those now.

I also find as I look through the code right now that those units are not seconds but milliseconds… so yeah I’ll definitely remove those

Greg_Young1 · December 8, 2016, 2:41am

"I also find as I look through the code right now that those units are
not seconds but milliseconds.. so yeah I'll definitely remove those"

Yeah 60 ms seemed pretty low If you are doing 3000/second it won't
get hit very often as you will normally have a message in that window
but an idle connection would be funny to watch.