SLOW BUS / SLOW QUEUE

Hey fellow users,

Quick question because I cannot find any real answer myself. What do the messages “SLOW QUEUE MSG” and “SLOW BUS MSG” indicate? And how to I get rid of those?[00006,16,23:59:58.256] SLOW BUS MSG [MainBus]: PurgeTimedOutRequests - 92ms. Handler: HttpService.
[00006,140,23:59:57.748] SLOW BUS MSG [Worker #2 Bus]: TcpSend - 598ms. Handler: TcpSendService.
[00006,107,23:59:57.746] SLOW QUEUE MSG [StorageReaderQueue #2]: ReadStreamEventsForward - 597ms. Q: 2/2.any pointer, help, documentation is really appreciated

BR

David

Could you be more specific on how you set your database up?

Minimum should include the version, the OS, is it a cluster or single node? do you run your database in docker?

We run on AKS on dedicated hardware using Evenstore 4.1.3. Each node ~ 140 GB RAM and the ES pods are allowed (limit none, request 4 GB)

The disks are Azure premiums with 1TB each.

Hope that helps, or do you need additional info?

Actually, I was looking for documentation of the log-output of Eventstore. Understanding this is part of basic operational stability. But I could not find anything :frowning:

OK so the SLOW MESSAGE messages are there to alert only. Its basically just saying “this took longer than I expected” and they are for notification/debugging purposes.

If you are seeing them occasionally there likely is no problem. Sometimes things just took longer, and it happens … rarely!

As example:

[00006,16,23:59:58.256] SLOW BUS MSG [MainBus]: PurgeTimedOutRequests - 92ms. Handler: HttpService.

This was quite likely a GC etc getting run which paused this what would usually be an in memory operation. 100 ms to remove some items from an in memory collection is … a very long time. Its important to note here this is not “this took 30% longer than I would expect” this is “this took > 1000% longer than I would expect”.

[00006,140,23:59:57.748] SLOW BUS MSG [Worker #2 Bus]: TcpSend - 598ms. Handler: TcpSendService.

This was quite likely a network blip with socket, a very large piece of data being sent, or a GC.

Again if you are seeing these rarely its probably no issue (say even dozens per day). If you are seeing them constantly (say every few seconds) there is likely an issue. They are again just writing “this took longer than I would have expected” and the expectation is fairly high!. As example spending ± 100ms on removing some things from in memory collection is a very long time! The same is true on the TCP one, BeginSend some data and it taking 600 ms to complete is a very long time.

We have seen many nodes where they occasionally see writing to disk as example take say 5000 ms due to network/other issues. These messages are quite useful in debugging such scenarios and are generally not an issue unless they are either coming in waves (500 messages over 30 seconds) or occurring a lot (eg every second/few seconds) over long periods of time.

Does this make sense?

Cheers,

Greg

Yes. Thanks. This helps a lot.