We are trying to migrate one of our production services from MSSQL to ESDB. by running an import using AppendToStreamAsync. However it stops working every 100 events or so by throwing a DeadlineExceeded exception. This is what we’re seeing in the client log:
[16:00:18 INF] Processing PageAddedEventV1(4d2e3b7c-5d99-4ced-af84-44dfa4dc27f7) [03511016]...
[16:00:18 DBG] Starting gRPC call. Method type: 'ServerStreaming', URI: 'https://esdb3.joure1.net:2113/event_store.client.streams.Streams/Read'.
[16:00:18 DBG] Sending message.
[16:00:18 VRB] Serialized 'EventStore.Client.Streams.ReadReq' to 80 byte message.
[16:00:18 VRB] Message sent.
[16:00:18 VRB] Response headers received.
[16:00:18 DBG] Reading message.
[16:00:18 VRB] Deserializing 56 byte message to 'EventStore.Client.Streams.ReadResp'.
[16:00:18 VRB] Received message.
[16:00:18 DBG] Reading message.
[16:00:18 INF] Appending PageAddedEventV1(4d2e3b7c-5d99-4ced-af84-44dfa4dc27f7) to stream pageaggregate-b11dc7e0-bf6d-5bf1-8c03-f218e803ea8a...
[16:00:18 VRB] No message returned.
[16:00:18 DBG] Append to stream - pageaggregate-b11dc7e0-bf6d-5bf1-8c03-f218e803ea8a@Any.
[16:00:18 DBG] Finished gRPC call.
[16:00:18 DBG] Sending message.
[16:00:18 VRB] Serialized 'EventStore.Client.Streams.BatchAppendReq' to 2148 byte message.
[16:00:18 VRB] Message sent.
[16:00:21 VRB] Deserializing 143 byte message to 'EventStore.Client.Streams.BatchAppendResp'.
[16:00:21 VRB] Received message.
[16:00:21 DBG] Reading message.
[16:00:51 ERR] Processing failed
System.AggregateException: One or more errors occurred. (Status(StatusCode="DeadlineExceeded", Detail="Timeout"))
---> Grpc.Core.RpcException: Status(StatusCode="DeadlineExceeded", Detail="Timeout")
at EventStore.Client.Streams.BatchAppendResp.ToWriteResult()
at EventStore.Client.EventStoreClient.StreamAppender.Receive()
at EventStore.Client.EventStoreClient.StreamAppender.AppendInternal(Options options, IEnumerable`1 events, CancellationToken cancellationToken)
at EventStore.Client.EventStoreClient.AppendToStreamAsync(String streamName, StreamState expectedState, IEnumerable`1 eventData, Action`1 configureOperationOptions, Nullable`1 deadline, UserCredentials userCredentials, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.Wait()
at Program.<Main>$(String[] args) in ...\Program.cs:line 175
We’ve been checking the flow for every imported events, but there are no significant differences. It just stops after reading the BatchAppendResp.
Our code has been tested already in a staging environment. Here we were able to import over 900K events without any exceptions.
Program.cs
try
{
var events = client.ReadStreamAsync(Direction.Forwards, streamName, StreamPosition.Start);
if (events.AnyAsync(a => a.OriginalEvent.EventId == eventId).Result)
{
logger.LogWarning($"Skipping {messageName}({messageId}) [{sequenceId:00000000}] as it already has been appended...");
continue;
}
}
catch
{
//***** Do nothing;
}
//*****
var eventData = new EventData[] { new(eventId, messageName, data, headers, mediaType) };
logger.LogInformation($"Appending {messageName}({messageId}) to stream {streamName}...");
client.AppendToStreamAsync(streamName, StreamState.Any, eventData).Wait();
logger.LogInformation($"{messageName}({messageId}) appended to stream {streamName}");
We are running ESDB 21.10.1.0 with client version 22.0.0.
We’ve just noticed a difference in server version between staging and production. Staging was running on version 21.10.2.0 and production on 21.10.1.0. Just updated production to 21.10.2.0 and seemed to work like a charm. But after 11K events or so, a deadline exception was thrown again. The problem is that the client connection is lost after that. As the client is a singleton, it requires the service to restart. Is there another approach possible for the client allowing a more resilient behaviour? Is singleton required, or could it be a per request connection?
We wrote a wrapper around AppendToStreamAsync. It seems like DeadlineExceeded can be considered a transient exception. This is the wrapper code. Perhaps it can be helpful for someone:
/// <summary>
/// Helper methods for the <see cref="EventStoreClient" />.
/// </summary>
public static class EventStoreClientHelper
{
/// <summary>
/// Try to append event data to a stream within a certain amount of retries in case of deadline exceptions returning failure if exceeded.
/// </summary>
/// <param name="client"></param>
/// <param name="streamName"></param>
/// <param name="eventData"></param>
/// <param name="retries"></param>
/// <returns></returns>
public static OperationResult AppendToStream(this EventStoreClient client, string streamName, EventData[] eventData, int retries)
{
var attempt = 1;
while (attempt <= retries)
{
//*****
try
{
return OperationResult.CreateSuccessResult(client.AppendToStreamAsync(streamName, StreamState.Any, eventData).Result.NextExpectedStreamRevision);
}
catch (AggregateException aex)
{
//***** Retry on DeadlineExceeded;
if (aex.InnerException is not RpcException { StatusCode: StatusCode.DeadlineExceeded })
return OperationResult.CreateFailure(aex.InnerException ?? aex);
}
catch (RpcException rpcex)
{
//***** Retry on DeadlineExceeded;
if (rpcex.StatusCode != StatusCode.DeadlineExceeded)
return OperationResult.CreateFailure(rpcex);
}
catch (Exception ex)
{
return OperationResult.CreateFailure(ex);
}
//*****
attempt++;
}
//*****
return OperationResult.CreateFailure("Attempts failed");
}
/// <summary>
/// Try to append event data to a stream within a certain amount of retries in case of deadline exceptions returning failure if exceeded.
/// </summary>
/// <param name="client"></param>
/// <param name="streamName"></param>
/// <param name="expectedRevision"></param>
/// <param name="eventData"></param>
/// <param name="retries"></param>
/// <returns></returns>
public static OperationResult AppendToStream(this EventStoreClient client, string streamName, StreamRevision expectedRevision, EventData[] eventData, int retries)
{
var attempt = 1;
while (attempt <= retries)
{
//*****
try
{
return OperationResult.CreateSuccessResult(client.AppendToStreamAsync(streamName, expectedRevision, eventData).Result.NextExpectedStreamRevision);
}
catch (AggregateException aex)
{
//***** Retry on DeadlineExceeded;
if (aex.InnerException is not RpcException { StatusCode: StatusCode.DeadlineExceeded })
return OperationResult.CreateFailure(aex.InnerException ?? aex);
}
catch (RpcException rpcex)
{
//***** Retry on DeadlineExceeded;
if (rpcex.StatusCode != StatusCode.DeadlineExceeded)
return OperationResult.CreateFailure(rpcex);
}
catch (Exception ex)
{
return OperationResult.CreateFailure(ex);
}
//*****
attempt++;
}
//*****
return OperationResult.CreateFailure("Attempts failed");
}
}
Hi Alexey, single event batches. Changed deadline you to 1 minute. Strange thing it happens with 21.10.1.0 only. 21.10.2.0 seems to work without noticable issues (used wrapper as well) Also the max attempts was exceeded at some point. So the wapper makes limited sense.
We have a similar issue. It happened yesterday on the production server. The scenario is: an application was performing intensive event additions to streams sequentially (around 700 streams), and the addition was completed within 5 seconds. During this time, another application started encountering the errors described above when accessing ES.
I’d suggest getting in touch with support. I don’t think it’s possible to diagnose issues like that in a forum. I have a simulation app that adds events continuously, one by one, in parallel threads, like thousands per sec, to around 1 million streams, running for days, and I never saw issues like that. I might need to try it again with 23.10.
Has this issue been resolved for you guys? We started seeing very very very(!) similar behavior in our production environment when we started importing a moderately large number of events! (We are on EventStoreDB v23)
We have about a few million streams in our project, and the importer appends to about a thousand or so streams (from a single process, written in golang) streams simultaneously.
After a while, the leader node runs out of memory and the importer keeps receiving DeadlineExceededExceptions.
Does ESDB have a memory leak?
I wrote up a post about this issue below, but no response arrived yet: