Timeout while reading

Peter_Hageus · June 17, 2014, 8:32am

Hi. I’m seeing some stranger behaviour with a specific database, it times out when we try to read all events. We get more and more ‘slow bus message’ until there’s one ‘very slow’ and a heartbeat timeout. This only happens with the specific database. There are no errors when starting up, and verify is on.

On my machine I was able to get it working by limiting the amount of memory SQL Server is using (this is during replay to readstore), but that doesn’t help on any of the systemowners machines.

Reading code is:

while (true)

{

var slice = Connection.ReadAllEventsForward(next, 500, true);

foreach (var e in slice.Events)

if (IsDomainEvent(e.Event) && typeNames.Contains(e.Event.EventType))

yield return ToCommittedEvent(e, true);

if (slice.IsEndOfStream)

yield break;

next = slice.NextPosition;

}

(Next is initialized to EventPostion.Start)

Log:

[PID:09728:028 2014.06.14 11:08:33.920 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #2]: ReadAllEventsForward - 533ms. Q: 0/1.

[PID:09728:012 2014.06.14 11:08:34.718 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 767ms. Q: 0/2.

[PID:09728:009 2014.06.14 11:08:35.508 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #3]: ReadAllEventsForward - 765ms. Q: 0/2.

[PID:09728:005 2014.06.14 11:08:36.302 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #3]: ReadAllEventsForward - 270ms. Q: 0/1.

[PID:09728:013 2014.06.14 11:08:37.110 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #4]: ReadAllEventsForward - 691ms. Q: 0/2.

[PID:09728:007 2014.06.14 11:08:37.923 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 734ms. Q: 0/2.

[PID:09728:005 2014.06.14 11:08:38.703 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #2]: ReadAllEventsForward - 754ms. Q: 0/2.

[PID:09728:013 2014.06.14 11:08:39.923 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 976ms. Q: 0/2.

[PID:09728:005 2014.06.14 11:08:42.340 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #4]: ReadAllEventsForward - 2384ms. Q: 0/4.

[PID:09728:030 2014.06.14 11:08:43.999 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 1619ms. Q: 0/3.

[PID:09728:006 2014.06.14 11:08:49.375 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #2]: ReadAllEventsForward - 5337ms. Q: 0/4.

[PID:09728:026 2014.06.14 11:08:51.893 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #3]: ReadAllEventsForward - 2488ms. Q: 0/3.

[PID:09728:006 2014.06.14 11:08:55.256 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 3302ms. Q: 0/4.

[PID:09728:013 2014.06.14 11:09:07.281 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #3]: ReadAllEventsForward - 4919ms. Q: 0/0.

[PID:09728:006 2014.06.14 11:09:07.311 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #2]: ReadAllEventsForward - 12030ms. Q: 0/4.

[PID:09728:006 2014.06.14 11:09:07.311 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #2]: ReadAllEventsForward - 12030ms. Q: 0/4.

[PID:09728:027 2014.06.14 11:09:07.688 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #4]: ReadAllEventsForward - 345ms. Q: 0/0.

[PID:09728:010 2014.06.14 11:09:12.829 INFO TcpConnection ] [11:09:12.829: N127.0.0.1:15959, L127.0.0.1:1113, {32daf939-86f5-4c59-a233-51460326af24}]:

Received bytes: 16550, Sent bytes: 2089791314

Send calls: 414, callbacks: 414

Receive calls: 409, callbacks: 409

Close reason: [ConnectionReset] Socket receive error

I’m just trying to help out an ex-customer here, I’m fully aware they should get commercial support, but it’s out of my hands…

One theory is that it’s related to eventsize, that a bug might have written a huge event, is there any way to query on that?

TIA, Peter

Greg_Young1 · June 17, 2014, 8:36am

What kind of storage are they reading from is it local storage?

If reading the events from an outside client you should be able to see what events you are actually reading at the time of the failure.

Greg_Young1 · June 17, 2014, 8:39am

Also what is the window size for the readalleventsforward (eg is it 50 events at a time 100? 1000?)

Greg_Young1 · June 17, 2014, 9:04am

I see its 500 events in window what happens if its 50 or 100?

Peter_Hageus · June 17, 2014, 11:55am

Just tried 50 and 100, no difference. No correlation with event-size (largest is 38k). Local storage, ssd.

Seems to happen consistently on my machine now as well (must have been a lucky break right after throttling sql memory), with simplest possible client. Last event read looks fine. The behaviour is pretty consistent, with a slow period in the beginning, then reading as normal for a while, until grinding to a halt.

/Peter

Peter_Hageus · June 17, 2014, 12:07pm

Update: Using window of 50 actually got me to the end. Still very slow in the beginning, but it seems like it made it passed the critical point…

/Peter