Cluster Connection Issues(Exception on EndAuthenticateAsServer, System.IO.IOException)

Hello,

These days, in our Prod environment we found lots of error “System.IO.IOException: Authentication failed because the remote party has closed the transport stream.”,

Configration:

Our cluster have 3 nodes, each server use https, all used Rax-Space virtual machine, each machine environment is

  • Windows version: Windows Server 2008 R2
  • CLR Version: 4
  • Framework: 4.6.1
  • EventStore version: 3.5.0 OSS
    and each one have more than 800 trunk files, usage almost 210GB disk space.

Error Detail:

Time
Event Error
Machine
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] [S172.20.0.253:47089, L172.20.14.100:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.100
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] [S172.20.0.252:47040, L172.20.14.102:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.102
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] [S172.20.0.253:43842, L172.20.14.101:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.101

This error happened frequent, we found 72 error message in one minute. and we also received below error but not frequent:

Time
Event
5/4/17 8:15:47.498 AM
[PID:03076:006 2017.05.04 08:15:47.498 DEBUG HttpSendService ] Error occurred while replying to HTTP with message EventStore.Projections.Core.Messages.ProjectionManagementMessage+Statistics: The specified network name is no longer available.
5/4/17 8:15:39.599 AM
[PID:03076:011 2017.05.04 08:13:39.599 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 50094ms. Q: 0/22.
5/4/17 8:09:39.599 AM
[PID:03076:004 2017.05.04 08:09:12.871 INFO CoreProjectionCheckp] Failed to write events to stream $ce-xxx. Error: CommitTimeout

Cluster Setting:

We use windows service do a wrapper to start to stop EventStoreClusterNode.exe, and send the parameter to EventStoreClusterNode application though wrapper, the configuration below, and other used default setting.

<instance name="eventstore"

      add-interface-prefixes="false"

        int-http-prefixes="[http://172.30.1.193:2113/](http://172.30.1.193:2113/)"

        ext-http-prefixes="[http://172.30.1.193:2114/,http://172.30.1.193:2113/](http://172.30.1.193:2114/,http://172.30.1.193:2113/)"

        ext-ip="127.0.0.1"

        int-ip="127.0.0.1"

        int-tcp-port="1112"

        ext-tcp-port="1113"

        int-http-port="2113"

        ext-http-port="2114"

        db="D:\eventstore-db-2"

        log=".\eslogs"

        use-internal-ssl="false"

        ssl-target-host="[xxx.com](http://xxx.com/)"

        ssl-validate-server="false"

        run-projections="all"

         certificate-store-location="LocalMachine"

         certificate-store-name="My"

         certificate-thumbprint="253tgrasd....."

         ext-secure-tcp-port="2115"

         int-secure-tcp-port="2116"

         cluster-size="3"

        discover-via-dns="false"

        gossip-seed="[172.30.1.194:2113](http://172.30.1.194:2113/),[172.30.1.195:2113](http://172.30.1.195:2113/)"
 />

Please Help:

I ask the question here before, see this link: https://groups.google.com/forum/#!topic/event-store/v5DexXkbWrs, but we need time to try this upgrade solution on local first, and the error happened everyday. it’s a urgent issue for us.

So do you know what’s the other solution for our problem? and what’s this problem’s root case?

Thank you very much!

Chaohui Zhang

This seems to be a rather typical network problem. What is the overall frequency of the issue that you see? This should be just a transient issue that self-corrects.

This issue happened in 3 machines 72 times in one minute, each machine happened 72/3=24 times in one minute. see below:


Thank you very much!

That does not explain the frequency of the issue. How many times/day does it happen? Getting the issue many times over a short period of time sounds like an underlying networking error. If this is happening very often then likely there is a secondary issue, if infrequently a network error is a strong suspicion.

I checked last 30 days this error happened, see below:

it happened every day and at least more than 69000 times every day.

Thank for you help.

What is your networking setup? We have previously seen associated issues from your setup that seemed networking related like talking to a SAN

Yes, we use virtual machine in Rackspace, and use SAN network.

Thank you very much!

I would start by looking here. The issues are TCP drops. If in windows which I believe was said there is also configuration around this.

yes we use windows. we also continue to check this.

Thanks for you help.

Also is it possible you have a secure connection trying and retrying to a port that is not a secure connection?

From reading above you mentioned that even though you are getting 200k errors a day things are generally still working.

Also can you post some of the log near the exception where it closes things? You should be able to see which ip addresses things are coming from

Our logs see below:

Time
Event Error
Machine
5/7/17
11:11:43.415 AM
[PID:07020:017 2017.05.07 11:11:43.415 DEBUG PersistentSubscripti] Lost connection from 172.20.0.253:47089
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionManager] Connection ‘external-secure’ [172.20.0.253:47089, {f504016d-153b-46f9-91c8-19b7baaed831}] closed: SocketError.
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:43.415: N172.20.0.253:47089, L172.20.14.100:2115, {f504016d-153b-46f9-91c8-19b7baaed831}]:Close reason: [SocketError] Authentication failed because the remote party has closed the transport stream.
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:43.415: N172.20.0.253:47089, L172.20.14.100:2115, {f504016d-153b-46f9-91c8-19b7baaed831}]:Receive calls: 0, callbacks: 0
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:43.415: N172.20.0.253:47089, L172.20.14.100:2115, {f504016d-153b-46f9-91c8-19b7baaed831}]:Send calls: 0, callbacks: 0
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:43.415: N172.20.0.253:47089, L172.20.14.100:2115, {f504016d-153b-46f9-91c8-19b7baaed831}]:Received bytes: 0, Sent bytes: 0
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] [S172.20.0.253:47089, L172.20.14.100:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.415 INFO TcpConnectionSsl ] [S172.20.0.253:47089, L172.20.14.100:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:07020:007 2017.05.07 11:11:43.400 INFO TcpService ] External TCP connection accepted: [Secure, 172.20.0.253:47089, L172.20.14.100:2115, {f504016d-153b-46f9-91c8-19b7baaed831}].
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:03076:011 2017.05.09 20:16:12.799 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.100
5/7/17
11:11:43.415 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.100

5/7/17
11:11:42.260 AM
[PID:03076:026 2017.05.07 11:11:42.260 DEBUG PersistentSubscripti] Lost connection from 172.20.0.252:47040
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionManager] Connection ‘external-secure’ [172.20.0.252:47040, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}] closed: SocketError.
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.260: N172.20.0.252:47040, L172.20.14.102:2115, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}]:Close reason: [SocketError] Authentication failed because the remote party has closed the transport stream.
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.260: N172.20.0.252:47040, L172.20.14.102:2115, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}]:Receive calls: 0, callbacks: 0
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.260: N172.20.0.252:47040, L172.20.14.102:2115, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}]:Send calls: 0, callbacks: 0
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.260: N172.20.0.252:47040, L172.20.14.102:2115, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}]:Received bytes: 0, Sent bytes: 0
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] [S172.20.0.252:47040, L172.20.14.102:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:021 2017.05.07 11:11:42.260 INFO TcpService ] External TCP connection accepted: [Secure, 172.20.0.252:47040, L172.20.14.102:2115, {d6097cf0-caf5-49b7-beb8-c2a76594f0ca}].
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:028 2017.05.07 11:11:42.260 INFO TcpConnectionSsl ] [S172.20.0.252:47040, L172.20.14.102:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:011 2017.05.09 20:16:12.799 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.102
5/7/17
11:11:42.260 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.102

5/7/17
11:11:42.096 AM
[PID:139260:005 2017.05.07 11:11:42.096 DEBUG PersistentSubscripti] Lost connection from 172.20.0.253:43842
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionManager] Connection ‘external-secure’ [172.20.0.253:43842, {3841b443-f61c-4b3a-9f64-843f1d58831b}] closed: SocketError.
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.096: N172.20.0.253:43842, L172.20.14.101:2115, {3841b443-f61c-4b3a-9f64-843f1d58831b}]:Close reason: [SocketError] Authentication failed because the remote party has closed the transport stream.
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.096: N172.20.0.253:43842, L172.20.14.101:2115, {3841b443-f61c-4b3a-9f64-843f1d58831b}]:Receive calls: 0, callbacks: 0
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.096: N172.20.0.253:43842, L172.20.14.101:2115, {3841b443-f61c-4b3a-9f64-843f1d58831b}]:Send calls: 0, callbacks: 0
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] ES TcpConnectionSsl closed [11:11:42.096: N172.20.0.253:43842, L172.20.14.101:2115, {3841b443-f61c-4b3a-9f64-843f1d58831b}]:Received bytes: 0, Sent bytes: 0
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] [S172.20.0.253:43842, L172.20.14.101:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:007 2017.05.07 11:11:42.096 INFO TcpService ] External TCP connection accepted: [Secure, 172.20.0.253:43842, L172.20.14.101:2115, {3841b443-f61c-4b3a-9f64-843f1d58831b}].
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:139260:038 2017.05.07 11:11:42.096 INFO TcpConnectionSsl ] [S172.20.0.253:43842, L172.20.14.101:2115]: Exception on EndAuthenticateAsServer.
System.IO.IOException: Authentication failed because the remote party has closed the transport stream.
at System.Net.Security.SslState.InternalEndProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.Security.SslState.EndProcessAuthentication(IAsyncResult result)
at EventStore.Transport.Tcp.TcpConnectionSsl.OnEndAuthenticateAsServer(IAsyncResult ar) in c:\projects\eventstore\src\EventStore.Transport.Tcp\TcpConnectionSsl.cs:line 154
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:03076:011 2017.05.09 20:16:12.799 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.101
5/7/17
11:11:42.096 AM
[PID:03076:011 2017.05.09 20:16:12.799 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 37316ms. Q: 0/22.
172.20.14.101

Please check these logs and ask me when you want more.

Thank you very much.

It looks like node:

172.20.0.253

is the opening/closing connections repeatedly, could there be a bad client with keepretrying enabled running there?

Thank you very much, this IP is our load balance, it will send health check to each server, I think we can skip these error that related to this IP. but for second error like “Very Slow”, do you have any ideas?