Can i safely remove some chunk-XXXXXX.XXXXXX files ?

HW1 · March 18, 2015, 12:10pm

Hello,

Always on a journey to try to make geteventstore work properly - and official doc, sprayed around the web, is not helping.

At startup “Rebuilding” index is taking tons of time though the number of events i handle is extremely small (few hundreds events recorded).

The problem is that my data folder is 2.5 Gb which look out of control for such a few datas :
-rwxrwxr-x 1 eventStore eventStore 8 18 mars 06:14 chaser.chk
-r-xr-xr-x 1 eventStore eventStore 268430383 14 janv. 10:20 chunk-000000.000000
-r–r--r-- 1 eventStore eventStore 268430554 20 janv. 04:14 chunk-000001.000000
-r–r--r-- 1 eventStore eventStore 268420431 30 janv. 05:19 chunk-000002.000000
-r–r--r-- 1 eventStore eventStore 268421026 4 févr. 22:40 chunk-000003.000000
-r–r--r-- 1 eventStore eventStore 268434418 10 févr. 15:30 chunk-000004.000000
-r–r--r-- 1 eventStore eventStore 268419822 16 févr. 06:51 chunk-000005.000000
-r–r--r-- 1 eventStore eventStore 268425677 21 févr. 22:57 chunk-000006.000000
-r–r--r-- 1 eventStore eventStore 268422287 27 févr. 15:47 chunk-000007.000000
-r–r--r-- 1 eventStore eventStore 268435328 5 mars 08:47 chunk-000008.000000
-r–r--r-- 1 eventStore eventStore 268432071 11 mars 01:50 chunk-000009.000000
-rw-r–r-- 1 eventStore eventStore 268435712 18 mars 06:14 chunk-000010.000000
-rwxrwxr-x 1 eventStore eventStore 8 18 mars 06:13 epoch.chk
drwxrwxr-x 2 eventStore eventStore 4096 8 janv. 17:16 index
-rwxrwxr-x 1 eventStore eventStore 8 8 janv. 17:16 truncate.chk
-rwxrwxr-x 1 eventStore eventStore 8 18 mars 06:14 writer.chk

``

When i look inside the folder if have those huge “chunk-” files that take hours to “Rebuild”.

Does anyone know how i can fix this :

Maybe i can keep only one chunk file to speedup “Rebuilding” ?
If not maybe i can skip “Rebuilding” to have a working DB ?
Thanx in advance.

nmehlei · March 18, 2015, 12:36pm

As those are your data files, you would effectively reset your database, deleting all your data.

But I am not sure why rebuilding is even necessary, as I understand, that should only be needed if something went wrong (crash, process got killed, etc.).

The reason why you have such a big database even though (as you said) your committed events are few and small is statistics data. You can limit those so that your database doesn’t get bloated, should be explained a few times in this group.

Hope that helped,

Regards

Nicolas

Greg_Young1 · March 18, 2015, 1:07pm

"
But I am not sure why rebuilding is even necessary, as I understand,
that should only be needed if something went wrong (crash, process got
killed, etc.)."

Rebuilding is always needed to some level. Its just it usually happens
in under a second as most drives give more than 190kb/sec access (by
2-3 orders of magnitude).

You can scavenge/limit statistics data counts and you can control the
maximum size of the amount to be rebuilt.

Greg

HW1 · March 18, 2015, 3:01pm

Ok so i will try a much bigger ec2 instance, to see if it does the trick.

HW1 · March 18, 2015, 4:47pm

I tried changing my ec2 instance to ec2.large…and now the database is crashing with this (i then tried with getEventStore 3.0.3 - same message) :

[02678,16,16:40:59.977] ReadIndex Rebuilding: processed 92263 records (41,5%).
Stacktrace:

Native stacktrace:

./clusternode() [0x6314f2]
./clusternode() [0x5d016b]
./clusternode() [0x453aa3]
/lib64/libpthread.so.0(+0xf5b0) [0x7f3fc69335b0]
./clusternode() [0x475afa]
./clusternode() [0x4e7ae5]
[0x40ee833b]

Debug info from gdb:

Greg_Young1 · March 18, 2015, 4:52pm

What os?

Greg_Young1 · March 18, 2015, 5:05pm

Usually what you have is caused by things like running 64 bit binary on a 32bit os or running an os the binaries arent built for glibc mismatch etc

The issue though is the 180kb/sec you are getting to storage with a multigb database.

In the future to keep it down lower memtable size (max count on rebuild) to something like 50k its 1m by default. And set max count on stats stream.

Greg

HW1 · March 18, 2015, 5:23pm

Ok for disk speed.

Its running on amazon-ami (Linux ip-10-38-136-120 3.14.27-25.47.amzn1.x86_64) and i am using EventStore-OSS-Linux-v3.0.1.

Extra message i have in log is :

at System.Threading.Thread.StartInternal () [0x00000] in :0
[PID:02791:011 2015.03.18 16:45:52.465 ERROR ProjectionManager ] The ‘stream-boat-aggregate-projection’ projection faulted due to ‘The projection subsystem failed to load a libjs1.so/js1.dll/… or one of its dependencies. The original error message is: js1’
[PID:02942:009 2015.03.18 16:51:21.726 FATAL GLOBAL-LOGGER ] Global Unhandled Exception occurred.

``

Greg_Young1 · March 18, 2015, 5:26pm

You need to set ld_library_path to include the path js1 is in.

HW1 · March 19, 2015, 11:27am

Thank you everyone - the problem is now resolved.

My problems were due to the very poor hdd performance (or limitation) on amazon ec2 instances.

Cheers