Hi,
I’ve noticed that the scavenge process is taking too long. I ran a test with 22 chunk files (around 6GB) that were backed up from our production environment and restored for testing purposes, and it took 10 hours to complete.
I ran the scavenge with 4 threads and 100 throttlePercent.
In the test environment, I’ve configured an ESDB cluster on AKS with 3 nodes and 1 readonly replica. Each node has been allocated with sufficient resources, even more than what’s allocated in our production environment.
resources:
limits:
cpu: 3000m
memory: 12Gi
requests:
cpu: 1500m
memory: 8Gi
We also use Azure Premium Files. Each node has own storage, and the capacity of the storage on each node is:
5Ti
Maximum IO/s: 8120
Burst IO/s : 15360
Throughput rate 613.0 MiB/s
There are two persistent subscriptions which are configured like the below.
"config": {
"resolveLinktos": true,
"startFrom": 0,
"startPosition": "C:0/P:0",
"messageTimeoutMilliseconds": 30000,
"extraStatistics": false,
"maxRetryCount": 3,
"liveBufferSize": 500,
"bufferSize": 500,
"readBatchSize": 20,
"preferRoundRobin": true,
"checkPointAfterMilliseconds": 2000,
"minCheckPointCount": 1,
"maxCheckPointCount": 1000,
"maxSubscriberCount": 0,
"namedConsumerStrategy": "RoundRobin"
},
And this is the node configuration info.
2024-03-11 02:25:04.457 "ES VERSION:" "23.10.0.0" ("oss-v23.10.0"/"8660912b8", "2023-10-24T16:13:49+01:00")
2024-03-11 02:25:04.479 [ 1, 1,02:25:04.457,INF] "RUNTIME:" ".NET 6.0.23/e0f0de876" (64-bit)
2024-03-11 02:25:04.479 [ 1, 1,02:25:04.455,INF] "OS:" Linux ("Unix 5.15.0.1041")
2024-03-11 02:25:04.479 [ 1, 1,02:25:04.445,INF] "OS ARCHITECTURE:" X64
2024-03-11 02:25:04.480 [ 1, 1,02:25:04.457,INF] "LOGS:" "/var/log/eventstore"
2024-03-11 02:25:04.480 [ 1, 1,02:25:04.457,INF] "GC:" "3 GENERATIONS" "IsServerGC: False" "Latency Mode: Interactive"
2024-03-11 02:25:04.480 SCAVENGE BACKEND CACHE SIZE: 2147483648 (Environment Variables)
2024-03-11 02:25:04.480 MAX TRUNCATION: -1 (Environment Variables)
2024-03-11 02:25:04.480 DATABASE OPTIONS:
2024-03-11 02:25:04.480 QUORUM SIZE: 2 (Environment Variables)
2024-03-11 02:25:04.480 NODE PRIORITY: 0 (Environment Variables)
2024-03-11 02:25:04.480 LEADER ELECTION TIMEOUT MS: 20000 (Environment Variables)
2024-03-11 02:25:04.480 EXT TCP HEARTBEAT INTERVAL: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 EXT HOST ADVERTISE AS: (<DEFAULT>)
2024-03-11 02:25:04.480 ENABLE UNIX SOCKET: False (<DEFAULT>)
2024-03-11 02:25:04.480 CONNECTION PENDING SEND BYTES THRESHOLD: 10485760 (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE STATS ON HTTP: False (<DEFAULT>)
2024-03-11 02:25:04.480 DEV MODE OPTIONS:
2024-03-11 02:25:04.480 DEV: False (<DEFAULT>)
2024-03-11 02:25:04.480 REMOVE DEV CERTS: False (<DEFAULT>)
2024-03-11 02:25:04.480 GRPC OPTIONS:
2024-03-11 02:25:04.480 KEEP ALIVE INTERVAL: 10000 (<DEFAULT>)
2024-03-11 02:25:04.480 KEEP ALIVE TIMEOUT: 10000 (<DEFAULT>)
2024-03-11 02:25:04.480 INTERFACE OPTIONS:
2024-03-11 02:25:04.480 ENABLE TRUSTED AUTH: False (<DEFAULT>)
2024-03-11 02:25:04.480 ADVERTISE HTTP PORT TO CLIENT AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 DEFAULT OPS PASSWORD: ******** (<DEFAULT>)
2024-03-11 02:25:04.480 ADVERTISE NODE PORT TO CLIENT AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 ADVERTISE TCP PORT TO CLIENT AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 CONNECTION QUEUE SIZE THRESHOLD: 50000 (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE EXTERNAL TCP TLS: False (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE GOSSIP ON HTTP: False (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE INTERNAL TCP TLS: False (<DEFAULT>)
2024-03-11 02:25:04.480 GOSSIP TIMEOUT MS: 5000 (Environment Variables)
2024-03-11 02:25:04.480 GOSSIP SEED: <Node1 External Ingress>:2115,<Node2 External Ingress>:2115,<Readonly Replica External Ingress>:2115 (Environment Variables)
2024-03-11 02:25:04.480 GOSSIP ON SINGLE NODE: (<DEFAULT>)
2024-03-11 02:25:04.480 CLUSTER SIZE: 3 (Environment Variables)
2024-03-11 02:25:04.480 NODE TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 REPLICATION TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 LOGGING OPTIONS:
2024-03-11 02:25:04.480 DISABLE LOG FILE: False (<DEFAULT>)
2024-03-11 02:25:04.480 LOG: /var/log/eventstore (<DEFAULT>)
2024-03-11 02:25:04.480 LOG CONFIG: logconfig.json (<DEFAULT>)
2024-03-11 02:25:04.480 LOG CONSOLE FORMAT: Plain (<DEFAULT>)
2024-03-11 02:25:04.480 NODE PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE OPTIONS:
2024-03-11 02:25:04.480 ALLOW ANONYMOUS ENDPOINT ACCESS: true (Environment Variables)
2024-03-11 02:25:04.480 APPLICATION OPTIONS:
2024-03-11 02:25:04.480 MODIFIED OPTIONS:
2024-03-11 02:25:04.480 SCAVENGE BACKEND PAGE SIZE: 32768 (Environment Variables)
2024-03-11 02:25:04.480 INSECURE: false (Environment Variables)
2024-03-11 02:25:04.480 DISCOVER VIA DNS: false (Environment Variables)
2024-03-11 02:25:04.480 NODE HEARTBEAT TIMEOUT: 1000 (<DEFAULT>)
2024-03-11 02:25:04.480 INT TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 CLUSTER GOSSIP PORT: 2115 (Environment Variables)
2024-03-11 02:25:04.480 CLUSTER OPTIONS:
2024-03-11 02:25:04.480 CERTIFICATE PRIVATE KEY FILE: /opt/eventstore/certs/dev/node3/node3.key (Environment Variables)
2024-03-11 02:25:04.480 CERTIFICATE FILE: /opt/eventstore/certs/dev/node3/node3.crt (Environment Variables)
2024-03-11 02:25:04.480 CERTIFICATE OPTIONS (FROM FILE):
2024-03-11 02:25:04.480 TRUSTED ROOT CERTIFICATES PATH: /opt/eventstore/certs/dev/ca (Environment Variables)
2024-03-11 02:25:04.480 EXT TCP HEARTBEAT TIMEOUT: 1000 (<DEFAULT>)
2024-03-11 02:25:04.480 NODE HEARTBEAT INTERVAL: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 EXT TCP PORT: 1113 (<DEFAULT>)
2024-03-11 02:25:04.480 DEFAULT ADMIN PASSWORD: ******** (<DEFAULT>)
2024-03-11 02:25:04.480 HTTP PORT: 2113 (<DEFAULT>)
2024-03-11 02:25:04.480 HTTP PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 INT HOST ADVERTISE AS: (<DEFAULT>)
2024-03-11 02:25:04.480 INT TCP HEARTBEAT INTERVAL: 700 (<DEFAULT>)
2024-03-11 02:25:04.480 INT TCP HEARTBEAT TIMEOUT: 700 (<DEFAULT>)
2024-03-11 02:25:04.480 INT TCP PORT: 1112 (<DEFAULT>)
2024-03-11 02:25:04.480 EXT TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 DEFAULT USER OPTIONS:
2024-03-11 02:25:04.480 SKIP DB VERIFY: false (Environment Variables)
2024-03-11 02:25:04.480 WRITE THROUGH: False (<DEFAULT>)
2024-03-11 02:25:04.480 LOG FAILED AUTHENTICATION ATTEMPTS: False (<DEFAULT>)
2024-03-11 02:25:04.480 LOG HTTP REQUESTS: False (<DEFAULT>)
2024-03-11 02:25:04.480 MAX APPEND SIZE: 1048576 (<DEFAULT>)
2024-03-11 02:25:04.480 WRITE TIMEOUT MS: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 OVERRIDE ANONYMOUS ENDPOINT ACCESS FOR GOSSIP: True (<DEFAULT>)
2024-03-11 02:25:04.480 SKIP INDEX SCAN ON READS: False (<DEFAULT>)
2024-03-11 02:25:04.480 STATS PERIOD SEC: 30 (<DEFAULT>)
2024-03-11 02:25:04.480 TELEMETRY OPTOUT: False (<DEFAULT>)
2024-03-11 02:25:04.480 ADVERTISE HOST TO CLIENT AS: <Node 3 External Ingress> (Environment Variables)
2024-03-11 02:25:04.480 WHAT IF: False (<DEFAULT>)
2024-03-11 02:25:04.480 WORKER THREADS: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 AUTHENTICATION/AUTHORIZATION OPTIONS:
2024-03-11 02:25:04.480 AUTHENTICATION CONFIG: (<DEFAULT>)
2024-03-11 02:25:04.480 AUTHENTICATION TYPE: internal (<DEFAULT>)
2024-03-11 02:25:04.480 AUTHORIZATION CONFIG: (<DEFAULT>)
2024-03-11 02:25:04.480 AUTHORIZATION TYPE: internal (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE FIRST LEVEL HTTP AUTHORIZATION: False (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE OPTIONS:
2024-03-11 02:25:04.480 CERTIFICATE RESERVED NODE COMMON NAME: (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE OPTIONS (FROM FILE):
2024-03-11 02:25:04.480 CERTIFICATE PASSWORD: ******** (<DEFAULT>)
2024-03-11 02:25:04.480 HELP: False (<DEFAULT>)
2024-03-11 02:25:04.480 ENABLE HISTOGRAMS: False (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE HTTP CACHING: False (<DEFAULT>)
2024-03-11 02:25:04.480 CONFIG: /etc/eventstore/eventstore.conf (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE ADMIN UI: false (Environment Variables)
2024-03-11 02:25:04.480 ENABLE ATOM PUB OVER HTTP: true (Environment Variables)
2024-03-11 02:25:04.480 ENABLE EXTERNAL TCP: true (Environment Variables)
2024-03-11 02:25:04.480 EXT IP: 0.0.0.0 (Yaml)
2024-03-11 02:25:04.480 INT IP: 0.0.0.0 (Yaml)
2024-03-11 02:25:04.480 NODE HOST ADVERTISE AS: <Node 3 External Ingress> (Environment Variables)
2024-03-11 02:25:04.480 NODE IP: <Node IP Address> (Environment Variables)
2024-03-11 02:25:04.480 NODE PORT: 2115 (Environment Variables)
2024-03-11 02:25:04.480 NODE TCP PORT: 1112 (Environment Variables)
2024-03-11 02:25:04.480 REPLICATION HEARTBEAT INTERVAL: 5000 (Environment Variables)
2024-03-11 02:25:04.480 CERTIFICATE PRIVATE KEY PASSWORD: ******** (<DEFAULT>)
2024-03-11 02:25:04.480 REPLICATION HEARTBEAT TIMEOUT: 3000 (Environment Variables)
2024-03-11 02:25:04.480 REPLICATION IP: <Readonly Replica IP Address> (Environment Variables)
2024-03-11 02:25:04.480 REPLICATION PORT: 1115 (Environment Variables)
2024-03-11 02:25:04.480 LOGGING OPTIONS:
2024-03-11 02:25:04.480 LOG LEVEL: Verbose (Environment Variables)
2024-03-11 02:25:04.480 PROJECTION OPTIONS:
2024-03-11 02:25:04.480 RUN PROJECTIONS: All (Environment Variables)
2024-03-11 02:25:04.480 DEFAULT OPTIONS:
2024-03-11 02:25:04.480 APPLICATION OPTIONS:
2024-03-11 02:25:04.480 ALLOW ANONYMOUS STREAM ACCESS: False (<DEFAULT>)
2024-03-11 02:25:04.480 ALLOW UNKNOWN OPTIONS: False (<DEFAULT>)
2024-03-11 02:25:04.480 REPLICATION HOST ADVERTISE AS: <Node 3 External Ingress> (Environment Variables)
2024-03-11 02:25:04.480 CERTIFICATE OPTIONS (FROM STORE):
2024-03-11 02:25:04.480 VERSION: False (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE STORE NAME: (<DEFAULT>)
2024-03-11 02:25:04.480 CHUNK INITIAL READER COUNT: 5 (<DEFAULT>)
2024-03-11 02:25:04.480 CHUNK SIZE: 268435456 (<DEFAULT>)
2024-03-11 02:25:04.480 CHUNKS CACHE SIZE: 536871424 (<DEFAULT>)
2024-03-11 02:25:04.480 COMMIT TIMEOUT MS: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 DB: /var/lib/eventstore (<DEFAULT>)
2024-03-11 02:25:04.480 DB LOG FORMAT: V2 (<DEFAULT>)
2024-03-11 02:25:04.480 DISABLE SCAVENGE MERGING: False (<DEFAULT>)
2024-03-11 02:25:04.480 HASH COLLISION READ LIMIT: 100 (<DEFAULT>)
2024-03-11 02:25:04.480 INDEX: (<DEFAULT>)
2024-03-11 02:25:04.480 INDEX CACHE DEPTH: 16 (<DEFAULT>)
2024-03-11 02:25:04.480 INDEX CACHE SIZE: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 INITIALIZATION THREADS: 1 (<DEFAULT>)
2024-03-11 02:25:04.480 MAX AUTO MERGE INDEX LEVEL: 2147483647 (<DEFAULT>)
2024-03-11 02:25:04.480 MAX MEM TABLE SIZE: 1000000 (<DEFAULT>)
2024-03-11 02:25:04.480 CACHED CHUNKS: -1 (<DEFAULT>)
2024-03-11 02:25:04.480 MEM DB: False (<DEFAULT>)
2024-03-11 02:25:04.480 OPTIMIZE INDEX MERGE: False (<DEFAULT>)
2024-03-11 02:25:04.480 PREPARE TIMEOUT MS: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 READER THREADS COUNT: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 REDUCE FILE CACHE PRESSURE: False (<DEFAULT>)
2024-03-11 02:25:04.480 SCAVENGE HASH USERS CACHE CAPACITY: 100000 (<DEFAULT>)
2024-03-11 02:25:04.480 SCAVENGE HISTORY MAX AGE: 30 (<DEFAULT>)
2024-03-11 02:25:04.480 SKIP INDEX VERIFY: False (<DEFAULT>)
2024-03-11 02:25:04.480 STATS STORAGE: File (<DEFAULT>)
2024-03-11 02:25:04.480 STREAM EXISTENCE FILTER SIZE: 256000000 (<DEFAULT>)
2024-03-11 02:25:04.480 UNBUFFERED: False (<DEFAULT>)
2024-03-11 02:25:04.480 UNSAFE DISABLE FLUSH TO DISK: False (<DEFAULT>)
2024-03-11 02:25:04.480 UNSAFE IGNORE HARD DELETE: False (<DEFAULT>)
2024-03-11 02:25:04.480 USE INDEX BLOOM FILTERS: True (<DEFAULT>)
2024-03-11 02:25:04.480 WRITE STATS TO DB: False (<DEFAULT>)
2024-03-11 02:25:04.480 MIN FLUSH DELAY MS: 2 (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE STORE LOCATION: (<DEFAULT>)
2024-03-11 02:25:04.480 INTERFACE OPTIONS:
2024-03-11 02:25:04.480 GOSSIP INTERVAL MS: 2000 (<DEFAULT>)
2024-03-11 02:25:04.480 READ ONLY REPLICA: False (<DEFAULT>)
2024-03-11 02:25:04.480 ALWAYS KEEP SCAVENGED: False (<DEFAULT>)
2024-03-11 02:25:04.480 DATABASE OPTIONS:
2024-03-11 02:25:04.480 UNSAFE ALLOW SURPLUS NODES: False (<DEFAULT>)
2024-03-11 02:25:04.480 GOSSIP ALLOWED DIFFERENCE MS: 60000 (<DEFAULT>)
2024-03-11 02:25:04.480 CLUSTER DNS: fake.dns (<DEFAULT>)
2024-03-11 02:25:04.480 CLUSTER OPTIONS:
2024-03-11 02:25:04.480 DEAD MEMBER REMOVAL PERIOD SEC: 1800 (<DEFAULT>)
2024-03-11 02:25:04.480 TRUSTED ROOT CERTIFICATE SUBJECT NAME: (<DEFAULT>)
2024-03-11 02:25:04.480 STREAM INFO CACHE CAPACITY: 0 (<DEFAULT>)
2024-03-11 02:25:04.480 TRUSTED ROOT CERTIFICATE STORE NAME: (<DEFAULT>)
2024-03-11 02:25:04.480 TRUSTED ROOT CERTIFICATE STORE LOCATION: (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE THUMBPRINT: (<DEFAULT>)
2024-03-11 02:25:04.480 CERTIFICATE SUBJECT NAME: (<DEFAULT>)
2024-03-11 02:25:04.480 TRUSTED ROOT CERTIFICATE THUMBPRINT: (<DEFAULT>)
2024-03-11 02:25:04.481 PROJECTION OPTIONS:
2024-03-11 02:25:04.481 LOG FILE INTERVAL: Day (<DEFAULT>)
2024-03-11 02:25:04.481 LOG FILE RETENTION COUNT: 31 (<DEFAULT>)
2024-03-11 02:25:04.481 LOG FILE SIZE: 1073741824 (<DEFAULT>)
2024-03-11 02:25:04.481 PROJECTION COMPILATION TIMEOUT: 500 (<DEFAULT>)
2024-03-11 02:25:04.481 [ 1, 1,02:25:04.480,WRN] DEPRECATED
2024-03-11 02:25:04.481 PROJECTION THREADS: 3 (<DEFAULT>)
2024-03-11 02:25:04.481 PROJECTIONS QUERY EXPIRY: 5 (<DEFAULT>)
2024-03-11 02:25:04.481 START STANDARD PROJECTIONS: False (<DEFAULT>)
2024-03-11 02:25:04.481 PROJECTION EXECUTION TIMEOUT: 250 (<DEFAULT>)
2024-03-11 02:25:04.481 FAULT OUT OF ORDER PROJECTIONS: False (<DEFAULT>)
2024-03-11 02:25:04.518 [ 1, 1,02:25:04.518,INF] Cannot find plugins path: "/usr/share/eventstore/plugins"
2024-03-11 02:25:04.661 [ 1, 1,02:25:04.661,DBG] MessageHierarchy initialization took 00:00:00.1303244.
2024-03-11 02:25:04.718 [ 1, 1,02:25:04.717,INF] The log level was adjusted to: Verbose
2024-03-11 02:25:04.720 [ 1, 1,02:25:04.720,INF] Loading the node's certificate(s) from file: "/opt/eventstore/certs/dev/node3/node3.crt"
2024-03-11 02:25:04.953 [ 1, 1,02:25:04.953,INF] "CertificateReservedNodeCommonName" auto-configured to: "eventstoredb-node" based on certificate
2024-03-11 02:25:05.085 [ 1, 1,02:25:05.085,INF] Quorum size set to 2.
2024-03-11 02:25:05.085 [ 1, 1,02:25:05.085,INF] All certificates successfully loaded.
2024-03-11 02:25:05.085 [ 1, 1,02:25:05.085,INF] Certificate chain verification successful.
2024-03-11 02:25:05.104 [ 1, 1,02:25:05.104,DBG] Using File Checkpoints
2024-03-11 02:25:05.196 [ 1, 1,02:25:05.195,INF] ReaderThreadsCount set to 6. Calculated based on processor count of 3 and configured value of 0
2024-03-11 02:25:05.197 [ 1, 1,02:25:05.196,INF] WorkerThreads set to 10. Calculated based on a reader thread count of 6 and a configured value of 0
I’m wondering if there’s anything I misconfigured for the scavenge process, or if it’s normal for it to take that long to run?
Additionally, we write a checkpoint every 2 seconds if there is one or more events. Could this be causing the slow scavenge issue?
Your assistance would be greatly appreciated.