I am still having issues here with cached TFChunks unfortunately.
At first I thought it was my app growing too large for ES since I was using the embedded store. So I switched to running standalone nodes and received similar results.
The store only grows to a max of 700 megs in memory, kind of osculating between 300 and 700 but after a few hours I start seeing
[PID:02880:011 2016.04.01 04:30:50.482 ERROR TFChunk ] CACHING FAILED due to OutOfMemory exception in TFChunk #2-2 (chunk-000002.000000).
After 4-5 of these the store just closes.
I figured the store is running out of unmanaged memory so I added some command line arguments. Here is the full list:
C:\eventstore\EventStore.ClusterNode.exe --int-ip 172.16.0.200 --ext-ip 172.16.0.200 --chunks-cache-size=1073741824 --log=C:\logs\ --db=C:\data --cached-chunks=2 --int-tcp-port=3113 --ext-tcp-port=3112 --int-http-port=3110 --ext-http-port=3111 --int-http-prefixes=http://:3110/ --ext-http-prefixes=http://:3111/ --cluster-size=5 --discover-via-dns=false --gossip-timeout-ms=2000 --gossip-seed=172.16.0.200:3110,172.16.0.201:3110,172.16.0.202:3110,172.16.0.203:3110,172.16.0.204:3110
I set the cache size to 1g up from 512m hoping the chunks would have enough space to be swapped in and out but no dice. I do have a suspect however - now that I have a store log with trace on I can see the following:
[PID:02880:008 2016.04.01 04:55:02.039 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #1]: ReadAllEventsForward - 218ms. Q: 0/3.
[PID:02880:006 2016.04.01 04:55:02.039 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #3]: ReadAllEventsForward - 234ms. Q: 0/4.
[PID:02880:011 2016.04.01 04:55:02.992 TRACE TFChunk ] CACHED TFChunk #3-3 (chunk-000003.000000) in 00:00:00.0003341.
[PID:02880:006 2016.04.01 04:55:03.042 TRACE TFChunk ] UNCACHED TFChunk #1-1 (chunk-000001.000000).
[PID:02880:013 2016.04.01 04:55:03.117 INFO MasterReplicationSer] Subscribed replica [172.16.0.204:3113,S:49a8a465-1fc0-450f-8d01-f50e3c89388f] for data send at 805306368 (0x30000000).
[PID:02880:013 2016.04.01 04:55:03.133 INFO MasterReplicationSer] Subscribed replica [172.16.0.202:3113,S:d4f418b4-5f91-499b-a7cc-09f0fe28a5cf] for data send at 805306368 (0x30000000).
[PID:02880:013 2016.04.01 04:55:03.133 INFO MasterReplicationSer] Subscribed replica [172.16.0.203:3113,S:db3328d4-6c8a-45c0-8b10-db74451568e0] for data send at 805306368 (0x30000000).
[PID:02880:013 2016.04.01 04:55:03.133 INFO MasterReplicationSer] Subscribed replica [172.16.0.200:3113,S:825e34dc-d662-4ac3-82b9-658782eb3e5d] for data send at 805306368 (0x30000000).
``
Its caching a new chunk BEFORE uncaching the old?
This is an example of an OutOfMemory cache
[PID:02880:021 2016.04.01 04:28:24.544 TRACE InMemoryBus ] SLOW BUS MSG [Worker #2 Bus]: TcpSend - 214ms. Handler: TcpSendService.
[PID:02880:021 2016.04.01 04:28:24.560 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [Worker #2]: TcpSend - 230ms. Q: 2/9.
[PID:02880:011 2016.04.01 04:30:50.482 ERROR TFChunk ] CACHING FAILED due to OutOfMemory exception in TFChunk #2-2 (chunk-000002.000000).
[PID:02880:005 2016.04.01 04:30:50.560 TRACE TFChunk ] UNCACHED TFChunk #0-0 (chunk-000000.000000).
[PID:02880:013 2016.04.01 04:30:50.608 INFO MasterReplicationSer] Subscribed replica [172.16.0.204:3113,S:49a8a465-1fc0-450f-8d01-f50e3c89388f] for data send at 536870912 (0x20000000).
[PID:02880:013 2016.04.01 04:30:50.608 INFO MasterReplicationSer] Subscribed replica [172.16.0.202:3113,S:d4f418b4-5f91-499b-a7cc-09f0fe28a5cf] for data send at 536870912 (0x20000000).
``
Could this be causing the issues we are seeing? Any way to fix?
I am using that CacheSet tool btw, I set min/max to 1024 / 2097152 (2g) with no change in results.