Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}

  1. I deployed an ESDB cluster using VMs with three nodes.
  2. In the configuration file, I used:
   DiscoverViaDns: false
   GossipAllowedDifferenceMs: 60000
   GossipSeed: [Internal IP1]:2113,[Internal IP2]:2113
  1. NTP is installed on the VMs for time synchronization, and the status shows as normal.
  2. Version: 23.10.1

After deployment, it has been reporting errors continuously.

{"@t":"2024-09-02T06:11:50.8540669+00:00","@mt":"Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}.","@r":["2024-09-02 06:11:50.854","2024-09-02 05:56:29.134"],"@l":"Error","@i":1738112734,"peerEndPoint":"Unspecified/10.0.2.6:2113","dateTime":"2024-09-02T06:11:50.8540415Z","peerTimestamp":"2024-09-02T05:56:29.1343539Z","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":44831,"ThreadId":16}
{"@t":"2024-09-02T06:11:58.8664832+00:00","@mt":"Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}.","@r":["2024-09-02 06:11:58.866","2024-09-02 05:10:28.444"],"@l":"Error","@i":1738112734,"peerEndPoint":"Unspecified/10.0.2.7:2113","dateTime":"2024-09-02T06:11:58.8664658Z","peerTimestamp":"2024-09-02T05:10:28.4445972Z","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":44831,"ThreadId":16}
{"@t":"2024-09-02T06:12:00.8760705+00:00","@mt":"Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}.","@r":["2024-09-02 06:12:00.876","2024-09-02 05:10:28.444"],"@l":"Error","@i":1738112734,"peerEndPoint":"Unspecified/10.0.2.7:2113","dateTime":"2024-09-02T06:12:00.8760533Z","peerTimestamp":"2024-09-02T05:10:28.4445972Z","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":44831,"ThreadId":16}
{"@t":"2024-09-02T06:12:02.8766960+00:00","@mt":"Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}.","@r":["2024-09-02 06:12:02.876","2024-09-02 05:10:28.444"],"@l":"Error","@i":1738112734,"peerEndPoint":"Unspecified/10.0.2.7:2113","dateTime":"2024-09-02T06:12:02.8766800Z","peerTimestamp":"2024-09-02T05:10:28.4445972Z","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":44831,"ThreadId":16}
{"@t":"2024-09-02T06:12:04.8799389+00:00","@mt":"Time difference between us and [{peerEndPoint}] is too great! UTC now: {dateTime:yyyy-MM-dd HH:mm:ss.fff}, peer's time stamp: {peerTimestamp:yyyy-MM-dd HH:mm:ss.fff}.","@r":["2024-09-02 06:12:04.879","2024-09-02 05:56:29.134"],"@l":"Error","@i":1738112734,"peerEndPoint":"Unspecified/10.0.2.6:2113","dateTime":"2024-09-02T06:12:04.8799129Z","peerTimestamp":"2024-09-02T05:56:29.1343539Z","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":44831,"ThreadId":16}

What should I do? Please help me.
Thank you.

1 Like

Hi,

There must be something off with the NTP server or at least how the node for which you posted the logs perceives the current time and how its peers perceive it. Note that GossipAllowedDifferenceMs is expressed in milliseconds, so the setting you have is 60 seconds, whereas the time difference you have is 00:15:21.7196876 for the first row, which translated to more than 15 minutes (which seems excessive). You can either bump GossipAllowedDifferenceMs or (preferably) find the root cause.

1 Like

Hi, could you post the configuration of each node?

1 Like
# Paths
Db: /esdb/data
Index: /esdb/index
Log: /var/log/eventstore

# Certificates configuration
CertificateFile: /esdb/etc/eventstore/certs/node/node.crt
CertificatePrivateKeyFile: /esdb/etc/eventstore/certs/node/node.key
TrustedRootCertificatesPath: /esdb/etc/eventstore/certs/ca

# Network configuration
ReplicationIp: 10.0.2.6
NodeIp: 10.0.2.6
ReplicationHostAdvertiseAs: node1.esdb.test
NodeHostAdvertiseAs: node1.esdb.test
HttpPort: 2113
IntTcpPort: 1112
ExtTcpPort: 1113
EnableExternalTcp: true

EnableAtomPubOverHttp: true
Insecure: true

# Cluster gossip
ClusterSize: 3

DiscoverViaDns: false
GossipSeed: 10.0.2.8:2113,10.0.2.7:2113

# Projections configuration
RunProjections: All
StartStandardProjections: true
ProjectionThreads: 8
ProjectionExecutionTimeout: 1000

# In memory
MemDb: false

# Log HTTP requests
LogHttpRequests: true

FaultOutOfOrderProjections: true

The configurations of the other two nodes are similar to this one, except for the values of a few fields below:

ReplicationIp: 
NodeIp: 
ReplicationHostAdvertiseAs: 
NodeHostAdvertiseAs: 
GossipSeed:
1 Like

@yves.reynhout

By the way, regarding the 15 minute difference:

The follower nodes are each equivalent to the manager nodes. The timestamp listed on the follower is correct, while the timestamp listed on the manager is the one reported as having too great of a difference. Note that the manager timestamp never changes, so the difference reported grows linearly.

I noticed in the ESDB source code that there is a function where these timestamps are set. All member types use utc.Now except for the manager which uses initial time stamp. In the image below, notice the time reported for the manager vs. the time reported in the following log.

UTC now: 2024-09-03 01:50:55.348, peer's time stamp: 2024-09-02 03:44:49.137. (Assuming the peer timestamp timezone is different, this is 12:44:49)

It may very well be a configuration issue on our end, but I’m wondering in the server code if somehow the timestamps for these managers are being used instead of the timestamps listed on the followers.

1 Like

Hi,

The concept of managers existed in the v5 era, a dedicated server to manage the cluster.

Now, the specified configuration makes the cluster believe there are Manager nodes. That’s because nodes defined in the gossip seed start out as Manager nodes. As gossip messages are processed in the cluster, the role associated with these nodes gets updated (away from being Manager). But that’s the issue here. There’s a disconnect between what the configuration tells a node and what gossip messages tell a node, resulting in dangling manager nodes.

I’ve set up a cluster with the same settings and was able to replicate the above behavior. The culprit seems to be the NodeHostAdvertiseAs.

There are two ways to get rid of the Manager nodes. Either remove the NodeHostAdvertiseAs setting or amend the GossipSeed to use the names of the nodes they are advertised as (so node3.esdb.test:2113;node2.esdb.test:2113 instead of 10.0.2.8:2113,10.0.2.7:2113).

Hope this helps.

2 Likes

Thank you very much for taking the time to solve this issue and provide the extra detail. We’ve updated our environments and they are all working as expected.

You guys are awesome!

1 Like