Eventstore in kubernetes proving to be problematic.

We’ve been running Eventstore in Kubernetes for awhile now, and have had quite a few problems. As of late, we’ve been seeing very odd behaviour where a mirrored node was using nearly 40GB of memory whilst the rest were using ~8, and its cpu usage was barely anything. I’m curious what others experiences are with this technology in Kubernetes, as it seems to be very “temperamental” inside docker. Also, if there are others in here running in Kubernetes, what are you setting your requests and limits too on your Eventstore pods (given what we’ve seen with the wide spikes in memory consumption, we’re nervous to cap it for fear we just end up with OOM kills inside the container).

Some background:

We run 3 node clusters, all on SSD disks. Right now they are free to use whatever resources they need, which appears to be VERY sporadic.

Thanks

Will eventstore 5 respect cgroup limits is probably the biggest question here, as I know dotnet core didn’t until v3.0 and eventstore 5 im pretty sure is running on mono?

We’ve had issues with it on Azure AKS, but on GKE it’s been running fine with 3 nodes for a long time. We set the limits to 1 CPU with 8 gigs of RAM, using a large SSD.

What cloud are you running on?

What size/type is your SSD?

What version of Eventstore?

How is the Cluster setup? Regional?

What node types, affinity, disruption budgets etc?

We have put some changes in to mitigate what we saw

SKIP DB and INDEX check. They will most definitely cause issues when you have multiple eventstore pods on a single node (which we do).

Next up is to move to ES 6 so pod limits work properly.

  1. GKE

  2. 200GB gke ssd

  3. ES 5.0.5

  4. Cluster is in the same zone (3 nodes)

  5. Affinity is enabled, but we can still have es pods (from other tenants on the same node) . We have are multi-tenant in kubernetes via namespace isolation.