Error collecting stats

Just a heads up on an error I encountered today:

[PID:06956:040 2013.08.23 16:36:22.231 ERROR MonitoringService ] Error while collecting stats

System.InvalidOperationException: The Counter layout for the Category specified is invalid, a counter of the type: AverageCount64, AverageTimer32, CounterMultiTimer, CounterMultiTimerInverse, CounterMultiTimer100Ns, CounterMultiTimer100NsInverse, RawFraction, or SampleFraction has to be immediately followed by any of the base counter types: AverageBase, CounterMultiBase, RawBase or SampleBase.

at System.Diagnostics.CategorySample.GetCounterDefinitionSample(String counter)

at System.Diagnostics.PerformanceCounter.NextSample()

at System.Diagnostics.PerformanceCounter.NextValue()

at EventStore.Core.Services.Monitoring.SystemStatsHelper.GetSystemStats() in c:\BuildAgent1\work\oss\windows\releasebuilds\src\EventStore\EventStore.Core\Services\Monitoring\SystemStatsHelper.cs:line 73

at EventStore.Core.Services.Monitoring.MonitoringService.CollectStats() in c:\BuildAgent1\work\oss\windows\releasebuilds\src\EventStore\EventStore.Core\Services\Monitoring\MonitoringService.cs:line 153

Most of the web-interface stopped working as a consequence. A restart seems to have fixed the problem. Not a showstopper, but I thought you would like to know…

Was running an import of around 5m events, took roughly 5hours, transferring commit by commit (I guess doing them one stream at the time would have been a lot faster, but I couldn’t figure out how to do that without messing upp global ordering, wich would have made projections listening to multiple streams problematic).

/Peter

Did you go into sleep mode on he machine? I have seen it there

This error continuously shows up on my Azure machine (latest binaries).

Rinat,

Do you know if Azure has a different model for performance counters?

James

Does it show on startup?

Yes, it might have after the job completed. Definitely no biggie then!

I can duplicate on machine hibernate. Will fix that one to start with.

Trying to isolate. It appears just sleep is not enough to make it happen. Rinat I would be interested in your azure case as well as I would not expect those nodes to sleep… Does it happen from startup?

I found some other issues in this code in particular dealing with localization where some counters may be named differently.

I can post my isolated tests if anyone else wants to play with them.

Cheers

Greg

I get this on an Azure node as well. Not from startup, but after a while of running. Restarting the node fixes the error (for a while).

What we see is that the web interface stops producing graphs. Everything else works OK.

I am not able to reproduce on purpose, and I do not see the relationship with any events on the Azure node. At least I am not able to connect the dots :slight_smile:

I can duplicate our node, and give RDP-access to a node where this has happened, if you want to investigate.

I’ll see if I can get it locally reproduced first. My guess is the performance counter somehow “goes bad” over time. The simple solution is to just recreate it which will probably work but want to see if there is some direct causation

I don’t think there is much of Azure specifics in my current setup, since I’m running ES on a VM (not as a Worker). OS is Windows Server 2012 Datacenter x64. Exact error message:

[PID:01480:017 2013.08.09 23:53:51.251 ERROR MonitoringService ] Error while collecting stats

System.InvalidOperationException: The Counter layout for the Category specified is invalid, a counter of the type: AverageCount64, AverageTimer32, CounterMultiTimer, CounterMultiTimerInverse, CounterMultiTimer100Ns, CounterMultiTimer100NsInverse, RawFraction, or SampleFraction has to be immediately followed by any of the base counter types: AverageBase, CounterMultiBase, RawBase or SampleBase.

at System.Diagnostics.CategorySample.GetCounterDefinitionSample(String counter)

at System.Diagnostics.PerformanceCounter.NextSample()

at System.Diagnostics.PerformanceCounter.NextValue()

at EventStore.Core.Services.Monitoring.SystemStatsHelper.GetSystemStats() in c:\BuildAgent1\work\oss\windows\releasebuilds\src\EventStore\EventStore.Core\Services\Monitoring\SystemStatsHelper.cs:line 73

at EventStore.Core.Services.Monitoring.MonitoringService.CollectStats() in c:\BuildAgent1\work\oss\windows\releasebuilds\src\EventStore\EventStore.Core\Services\Monitoring\MonitoringService.cs:line 153

Best,

Rinat

Does it happen on start up or after some period of time?

Hard to tell, I don’t check error logs of ES that often, since it just works.

Will pay more attention on the next ES restart.

Best,

Rinat

Ok.

I think this issue is resolved in dev btw.

Do you have any plans for binary releases soon?

There will be multiple before sept 17. Will open a chat internally and get back to you on next.

This will be released to binaries this week along with some other changes including a 40-60% perf increase on writes (+ 50% reduction in latency) and internal ldap support on clustered version.

Greg

Nice. Latency reduction == good (esp. on Azure). Did you merge prepares with commit in one step?

Best,

Rinat

Yep

Looking forward to benchmark this one.