New Epoch request came not in master state

Gmail · March 23, 2016, 12:59am

Hi all,

Recently on production environment, we had a outage for EventStore cluster, we have three EventStore Instances which

version is OSS-3.0.5. We found some error logs which are generated during the period out outage on Instance A.

From the logs, it seems this instance is already elected as Master role but failed to handle WriteEpoch message due to the

invalid state. After we rebooting the EventStore service, it goes back to normal now. During the outage, **instance B **is also

at a very high cpu utilization (around 90 percentage).

Any help appreciated.

Thanks in advance.

[PID:07992:026 2016.03.22 09:11:27.387 INFO TcpConnectionManager] Connection ‘external-secure’ [60.58.62.8:57783, {29d5ed97-c7da-4fb9-8c36-95ff2b4ec1ff}] closed: SocketError.

[PID:07992:010 2016.03.22 09:11:30.819 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 45605ms. Q: 0/197.

[PID:07992:010 2016.03.22 09:11:30.819 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 45605ms. Q: 0/197.

[PID:07992:013 2016.03.22 09:11:30.835 INFO ClusterVNodeControll] ========== [10.10.13.101:2113] IS MASTER!!! SPARTA!!!

[PID:07992:010 2016.03.22 09:11:30.851 FATAL StorageWriterService] Unexpected error in StorageWriterService. Terminating the process… System.Exception: New Epoch request came not in master state!!! State: PreMaster.

at EventStore.Core.Services.Storage.StorageWriterService.EventStore.Core.Bus.IHandle<EventStore.Core.Messages.SystemMessage.WriteEpoch>.Handle(WriteEpoch message) in c:\EventStore\src\EventStore.Core\Services\Storage\StorageWriterService.cs:line 195 at EventStore.Core.Bus.MessageHandler`1.TryHandle(Message message) in c:\EventStore\src\EventStore.Core\Bus\MessageHandler.cs:line 33 at EventStore.Core.Bus.InMemoryBus.Publish(Message message) in c:\EventStore\src\EventStore.Core\Bus\InMemoryBus.cs:line 336 at EventStore.Core.Services.Storage.StorageWriterService.CommonHandle(Message message) in c:\EventStore\src\EventStore.Core\Services\Storage\StorageWriterService.cs:line 157

[PID:07992:010 2016.03.22 09:11:30.866 ERROR Application ]

Exiting with exit code: 1. Exit reason: Unexpected error in StorageWriterService: New Epoch request came not in master state!!! State: PreMaster.

Greg_Young1 · March 23, 2016, 1:02am

Are you not running any form of supervision? Nodes do for varying
reason kill themselves occasionally.

More interesting is what is this:

[PID:07992:010 2016.03.22 09:11:30.819 ERROR QueuedHandlerMRES ]
---!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares -
45605ms. Q: 0/197.

That looks like 45 seconds to do a disk write.

Gmail · March 23, 2016, 1:34am

We don’t have the manager nodes for EventStore cluster.

Since we have 3 instances which are running as a cluster, if one of the instance is in outage,
the other two instances should still work well.
But during the outage, the gossip status kept changing until 2 of the instances are not alive.

In terms of the log “VERY SLOW QUEUE MSG”, we found a lot of this log on all EventStore
instances during the outage. In general what might be the causes of this error? disk IOPS?
cup utilization?

Best Regards,

Zhengquan Yang

Greg_Young1 · March 23, 2016, 1:40am

That particular one was writing to storage. Is storage network
attached? If all 3 nodes couldn't write to storage then an election
couldn't be successful (they write an epoch at end of election).
Further logs would help.

Nodes in general should be supervised (even just autorestart), there
are quite a few edge conditions where the best option is to restart so
the node commits suicide. Other examples include a deposed master and
file handles that go invalid to test put your db on a usb stick and
pull it out

Gmail · March 23, 2016, 8:06am

The EventStore servers are virtual machines which running on top of
hypervisors. The data is persisted into fiber SAN network with multiple

SSD drivers. Disk IOPS may not be the bottleneck.

Here are the logs for EventStore instances near the outage:

Instance A: https://gist.github.com/Zhengquan/aaaa37f367a124059d92

Instance B: https://gist.github.com/Zhengquan/2f2e1d1dd819482da39a

Instance C: https://gist.github.com/Zhengquan/fd04c0782cdde0c70a85

Actually there are no logs generated for EventStore instance B during outage,

It was shutting down at 2016.03.22 09:08:42.304 UTC time which 20 minutes

before the outage of EventStore cluster.

What’s the meaning of “offline transaction”, what the issue with instance B at

that time? Do we need to fix it manually?

Logs on instance B before outage:

[PID:05828:013 2016.03.22 09:08:42.948 INFO Application ] Exiting with exit code: 0.

Exit reason: Shutdown with exiting from process was requested.

[PID:05828:013 2016.03.22 09:08:42.917 INFO ClusterVNodeControll] ========== [10.10.13.102:2113] IS SHUT DOWN!!! SWEET DREAMS!!!

[PID:05828:009 2016.03.22 09:08:42.480 TRACE InMemoryBus ] SLOW BUS MSG [SubscriptionsBus]: BecomeShuttingDown - 62ms. Handler: SubscriptionsService.

[PID:05828:013 2016.03.22 09:08:42.449 INFO ClusterVNodeControll] ========== [10.10.13.102:2113] All Services Shutdown.

[PID:05828:013 2016.03.22 09:08:42.449 INFO ClusterVNodeControll] ========== [10.10.13.102:2113] Service ‘Storage Chaser’ has shut down.

[PID:05828:013 2016.03.22 09:08:42.433 INFO ClusterVNodeControll] ========== [10.10.13.102:2113] Service ‘Master Replication Service’ has shut down.

[PID:05828:013 2016.03.22 09:08:42.418 TRACE GossipServiceBase ] CLUSTER HAS CHANGED (TCP connection lost to [10.10.13.101:2116])

Old:

VND {625acf3f-d4cb-474f-b8f4-e3f1f71ccdac} [Slave, 10.10.13.103:2111, 10.10.13.103:2116, 147.42.14.102:2112, 147.42.14.102:2115, 10.10.13.103:2113, 147.42.14.102:2114] 88988811705/88988812354/88988812116/E259@88988811878:{afc1c45c-50dd-4762-b179-141ff961ea95} | 2016-03-22 09:08:42.296

VND {a4c1bdc0-2643-4512-b4b9-3a3d5c08cc4f} [ShuttingDown, 10.10.13.102:2111, 10.10.13.102:2116, 147.42.14.101:2112, 147.42.14.101:2115, 10.10.13.102:2113, 147.42.14.101:2114] 88988811590/88988812753/88988812753/E257@88988805718:{16994d92-0059-494f-a5a7-80eff4f04316} | 2016-03-22 09:08:42.418

VND {c532c941-4817-4064-92f4-92ddcdde98fe} [Master, 10.10.13.101:2111, 10.10.13.101:2116, 147.42.14.100:2112, 147.42.14.100:2115, 10.10.13.101:2113, 147.42.14.100:2114] 88988811705/88988811878/88988811878/E258@88988809991:{e392ad92-1d92-47aa-9bcf-0debc33afa87} | 2016-03-22 09:08:42.101

New:

VND {625acf3f-d4cb-474f-b8f4-e3f1f71ccdac} [Slave, 10.10.13.103:2111, 10.10.13.103:2116, 147.42.14.102:2112, 147.42.14.102:2115, 10.10.13.103:2113, 147.42.14.102:2114] 88988811705/88988812354/88988812116/E259@88988811878:{afc1c45c-50dd-4762-b179-141ff961ea95} | 2016-03-22 09:08:42.296

VND {a4c1bdc0-2643-4512-b4b9-3a3d5c08cc4f} [ShuttingDown, 10.10.13.102:2111, 10.10.13.102:2116, 147.42.14.101:2112, 147.42.14.101:2115, 10.10.13.102:2113, 147.42.14.101:2114] 88988811590/88988812753/88988812753/E257@88988805718:{16994d92-0059-494f-a5a7-80eff4f04316} | 2016-03-22 09:08:42.418

VND {c532c941-4817-4064-92f4-92ddcdde98fe} [Master, 10.10.13.101:2111, 10.10.13.101:2116, 147.42.14.100:2112, 147.42.14.100:2115, 10.10.13.101:2113, 147.42.14.100:2114] 88988811705/88988811878/88988811878/E258@88988809991:{e392ad92-1d92-47aa-9bcf-0debc33afa87} | 2016-03-22 09:08:42.418

Greg_Young1 · March 23, 2016, 10:38am

This is a place where it shut itself down expecting to be restarted
which is why you supervise nodes (its one of the cases I mention
above)

In the OSS linux packages as example this is done via init.d.

Gmail · March 23, 2016, 10:59am

I noticed that during replication if the subscription position of master
is less than the writer checkpoint of slave, the database will be truncated

to the subscription position. (code)

Will the newly committed transaction also be truncated and cause loss of data?

Could you help us answer following questions if we understand correctly

when will the replication happen?
why does master need to subscribe change from slave role?
does it make change if we do a upgrading to latest stable version?

Thanks in advance.

Best Regards,

Zhengquan Yang

Greg_Young1 · March 23, 2016, 11:10am

"I noticed that during replication if the subscription position of master
is less than the writer checkpoint of slave, the database will be truncated
to the subscription position. (code) "

No. The situation is, we were a master, then we got partitioned. Then
a new master is elected. If our log position is greater than where we
get subscribed then we should truncate (this scenario is known as a
deposed master).

"Will the newly committed transaction also be truncated and cause loss of data?
Could you help us answer following questions if we understand correctly"

No transaction will be lost during truncation, a write in the cluster
is required to be on a majority of nodes.

"- when will the replication happen?"

When the node starts up.

"- why does master need to subscribe change from slave role? "

The "master" is no longer a "master". The nodes are homegenous
"master" is a transient concept.

"- does it make change if we do a upgrading to latest stable version?"

Not sure I understand the question.

What you have seen here on this node wanting truncation is a perfectly
valid case. This is why nodes should be supervised. When you restart
the node yourself I am guessing everything just worked and it rejoined
the cluster after.

Greg_Young1 · March 23, 2016, 11:32am

Instance A logs seems to also be instance C logs (in your gists they
are the same) do you have logs for instance A?

Instance B shutsdown if supervised it would have just started up again
and rejoined. Without supervision it will be down until someone
restarts it.

VND {625acf3f-d4cb-474f-b8f4-e3f1f71ccdac} <LIVE> [Clone,
10.10.13.103:2111, 10.10.13.103:2116, 147.42.14.102:2112,
147.42.14.102:2115, 10.10.13.103:2113, 147.42.14.102:2114]
88988811186/88988811359/88988811359/E258@88988809991:{e392ad92-1d92-47aa-9bcf-0debc33afa87}

2016-03-22 09:08:41.282

VND {a4c1bdc0-2643-4512-b4b9-3a3d5c08cc4f} <LIVE> [Clone,
10.10.13.102:2111, 10.10.13.102:2116, 147.42.14.101:2112,
147.42.14.101:2115, 10.10.13.102:2113, 147.42.14.101:2114]
88988811590/88988812753/88988812753/E257@88988805718:{16994d92-0059-494f-a5a7-80eff4f04316}

2016-03-22 09:08:42.304

VND {c532c941-4817-4064-92f4-92ddcdde98fe} <LIVE> [Master,
10.10.13.101:2111, 10.10.13.101:2116, 147.42.14.100:2112,
147.42.14.100:2115, 10.10.13.101:2113, 147.42.14.100:2114]
88988811705/88988811878/88988811878/E258@88988809991:{e392ad92-1d92-47aa-9bcf-0debc33afa87}

2016-03-22 09:08:42.101

New:
VND {625acf3f-d4cb-474f-b8f4-e3f1f71ccdac} <LIVE> [Slave,
10.10.13.103:2111, 10.10.13.103:2116, 147.42.14.102:2112,
147.42.14.102:2115, 10.10.13.103:2113, 147.42.14.102:2114]
88988811705/88988812354/88988812116/E259@88988811878:{afc1c45c-50dd-4762-b179-141ff961ea95}

2016-03-22 09:08:42.296

VND {a4c1bdc0-2643-4512-b4b9-3a3d5c08cc4f} <LIVE> [Clone,
10.10.13.102:2111, 10.10.13.102:2116, 147.42.14.101:2112,
147.42.14.101:2115, 10.10.13.102:2113, 147.42.14.101:2114]
88988811590/88988812753/88988812753/E257@88988805718:{16994d92-0059-494f-a5a7-80eff4f04316}

2016-03-22 09:08:42.304

VND {c532c941-4817-4064-92f4-92ddcdde98fe} <LIVE> [Master,
10.10.13.101:2111, 10.10.13.101:2116, 147.42.14.100:2112,
147.42.14.100:2115, 10.10.13.101:2113, 147.42.14.100:2114]
88988811705/88988811878/88988811878/E258@88988809991:{e392ad92-1d92-47aa-9bcf-0debc33afa87}

Gmail · March 23, 2016, 12:06pm

Log for instance A is updated:
https://gist.github.com/Zhengquan/aaaa37f367a124059d92

For instance B, the new logs are generated 30 minutes later

until we restarted the EventStore manually.

Best Regards,

Zhengquan Yang

victoryw · March 23, 2016, 12:50pm

by your answer"No. The situation is, we were a master, then we got partitioned. Then
a new master is elected. If our log position is greater than where we
get subscribed then we should truncate (this scenario is known as a
deposed master). "

I have a question

when “we” are not master, who will be master.

and will the new master have full of transactions.

I afraid of the loss of data when switch the old/new master .

Greg_Young1 · March 23, 2016, 12:54pm

Nodes are homogenous, "master" is transient.

https://geteventstore.com/blog/20130301/ensuring-writes-multi-node-replication/
explains more in detail