Performance Issue

I am running some tests and writes are incredibly slow. db is on a striped array.In the logs i see this message repeatedly:

[PID:230004 2013.03.09 21:43:48.030 Trace QueuedHandlerMRES 11]

SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 660ms. Q: 0/0.

Why would I see such awful write speed?

Looks like a fsync is taking 660ms is it a local drive?

15/s is the performance I am seeing.

If its an array the two big things to look at are is it a local array and does it have a write back cache? Is the write back enabled? As part of the transaction we will fsync the drive (make sure data is durable). It appears this operation is taking a long time (storage writer is what does this).

Hmm, I am not sure what any of that means…

Can you describe your pattern of writing data? How many (everything approximately) write requests per second you do, how many events there is in one write request, what is the size of Data & Metadata in each event?

If you upload very large pieces of data (like megabyte or few) in one event, than it may be that the write throughput is very large. That way to measure write performance as just write requests per second is not really good idea.

Shot in the dark here but do you have anti virus / anti spyware installed on this machine?

I am writing using a very similar pattern to the Event Repository posted by James Nugent. As far as requests per second, it’s hard to say. My command queue receives many thousands per second. As of now Event Store would never catch up. These events are pretty tiny e.g. a few KB.

As others have said this seems to be related to the storage writer and the amount of time it takes to do a round trip to disk (common if your storage is remote).

A few questions:

  1. Are you on Mono or .NET?

  2. If you use the Test Client which comes with the ES binaries and try a “WRFL” (write flood), what sort of performance is reported? This should identify whether it’s a problem in the client side code or at the server end.

  3. Can you post the code you’re using to drive it? If you don’t want to post your code publically feel free to email it to me (james [at] geteventstore.com) and I’ll take a look and see if I can see anything that might be causing this.

Cheers,

James

  1. .NET

  2. I haven’t used the Test Client. Easy to get up and running?

  3. The code is very close to the repository you posted. I did a little refactoring, but only on the api side and not EventStore interaction.

Something is definitely not right. Hopefully it’s something server side. The drives are definitely local. Monitoring the system nothing is close to fully utilized.

Run eventstore.testclient.exe … Type wrfl 10 100000 at prompt. What does it print? What about just wr?

According to what you said, your one write request consists of 500 events few kilobytes each. That means that each write request is about few megabytes. So if you have 15 req/s your write throughput is around few dozen megabytes per second, not that bad, actually. So, here are few questions and suggestions to make picture more clear:

  1. Do you make simultaneous write requests or simultaneous. Especially in case of simultaneous requests, some of your requests can timeout as server is busy handling other requests. That will cause ClientAPI to automatically retry operation, reducing successful req/s number,

  2. Could you try to send events with smaller batches, say 50 (or maybe even 20).

  3. If you are running dev build, there are --prepare-timeout and --commit-timeout switches, which you could try to set to something bigger than default 2 second. Say, set 20000 ms for both commit and prepare.

Please, let us know if there is any positive effect.

I’ve attached the the output.

eventstore-TestClientOutput.txt (4.51 KB)

This has the event store doing 10000 transactions/second maybe you can send up your measurement code that has it at 15/second?

No, I am with you. The code I have is pretty much what James has posted. Is the 500 that Andrii is referring to the WritePageSize in James’ event store repository?

I should probably clarify the 15/s. That wasn’t a direct write performance number for Event Store, but the messages off the queue. There is very little processing overhead besides reading events streams and appending.

Can you post?

As Greg says, If you can either post (or email if it’s not public) the code you’re using to drive it we might be able to spot what’s going on - from the output of the test client it looks like the event store itself is performing fine on your hardware.

Thanks,

James

Thanks gents. I’ll do some more tests. The only other place that makes sense it RabbitMQ. I’ve never had any trouble with it before.

Interestingly things are just as slow taking RabbitMQ completely out of the equation. Quite odd.