Can't connect to local Docker Compose cluster

Writing some integration tests to evaluate the behavior of a 3-node EventStoreDB cluster. The tests launch the cluster in Docker Compose running on the local machine environment.

I’ve got my dev machine connecting to the cluster and running the tests correctly, however it fails on our Appveyor build environment with the following:

   EventStore.Client.DiscoveryException : Failed to discover candidate in 10 attempts.
  Stack Trace:
     at EventStore.Client.GossipChannelSelector.DiscoverAsync(CancellationToken cancellationToken)
   at EventStore.Client.GossipChannelSelector.SelectChannelAsync(CancellationToken cancellationToken)
   at EventStore.Client.EventStoreClientBase.GetChannelInfoExpensive(ReconnectionRequired reconnectionRequired, Action`1 onReconnectionRequired, IChannelSelector channelSelector, CancellationToken cancellationToken)
   at EventStore.Client.SharingProvider`2.FillBoxAsync(TaskCompletionSource`1 box, TInput input)
   at EventStore.Client.TaskExtensions.WithCancellation[T](Task`1 task, CancellationToken cancellationToken)
   at EventStore.Client.EventStoreClientBase.GetChannelInfo(CancellationToken cancellationToken)
   at EventStore.Client.EventStoreProjectionManagementClient.EnableAsync(String name, Nullable`1 deadline, UserCredentials userCredentials, CancellationToken cancellationToken)

The setup seems extremely simple. I am using the EventStoreDB published docker-compose.yml with vars.env setup:

docker-compose.yml

version: "3.5"

services:
  setup:
    image: eventstore/es-gencert-cli:1.0.2
    entrypoint: bash
    user: "1000:1000"
    command: >
      -c "mkdir -p ./certs && cd /certs
      && es-gencert-cli create-ca
      && es-gencert-cli create-node -out ./node1 -ip-addresses 127.0.0.1,172.30.240.11 -dns-names localhost
      && es-gencert-cli create-node -out ./node2 -ip-addresses 127.0.0.1,172.30.240.12 -dns-names localhost
      && es-gencert-cli create-node -out ./node3 -ip-addresses 127.0.0.1,172.30.240.13 -dns-names localhost
      && find . -type f -print0 | xargs -0 chmod 666"
    container_name: setup
    volumes:
      - ./certs:/certs

  node1.eventstore: &template
    image: eventstore/eventstore:21.10.2-bionic
    container_name: node1.eventstore
    env_file:
      - vars.env
    environment:
      - EVENTSTORE_INT_IP=172.30.240.11
      - EVENTSTORE_ADVERTISE_HTTP_PORT_TO_CLIENT_AS=2111
      - EVENTSTORE_ADVERTISE_TCP_PORT_TO_CLIENT_AS=1111
      - EVENTSTORE_GOSSIP_SEED=172.30.240.12:2113,172.30.240.13:2113
      - EVENTSTORE_TRUSTED_ROOT_CERTIFICATES_PATH=/certs/ca
      - EVENTSTORE_CERTIFICATE_FILE=/certs/node1/node.crt
      - EVENTSTORE_CERTIFICATE_PRIVATE_KEY_FILE=/certs/node1/node.key
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl --fail --insecure https://node1.eventstore:2113/health/live || exit 1",
        ]
      interval: 5s
      timeout: 5s
      retries: 24
    ports:
      - 1111:1113
      - 2111:2113
    volumes:
      - ./certs:/certs
    depends_on:
      - setup
    restart: always
    networks:
      clusternetwork:
        ipv4_address: 172.30.240.11

  node2.eventstore:
    <<: *template
    container_name: node2.eventstore
    env_file:
      - vars.env
    environment:
      - EVENTSTORE_INT_IP=172.30.240.12
      - EVENTSTORE_ADVERTISE_HTTP_PORT_TO_CLIENT_AS=2112
      - EVENTSTORE_ADVERTISE_TCP_PORT_TO_CLIENT_AS=1112
      - EVENTSTORE_GOSSIP_SEED=172.30.240.11:2113,172.30.240.13:2113
      - EVENTSTORE_TRUSTED_ROOT_CERTIFICATES_PATH=/certs/ca
      - EVENTSTORE_CERTIFICATE_FILE=/certs/node2/node.crt
      - EVENTSTORE_CERTIFICATE_PRIVATE_KEY_FILE=/certs/node2/node.key
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl --fail --insecure https://node2.eventstore:2113/health/live || exit 1",
        ]
      interval: 5s
      timeout: 5s
      retries: 24
    ports:
      - 1112:1113
      - 2112:2113
    networks:
      clusternetwork:
        ipv4_address: 172.30.240.12

  node3.eventstore:
    <<: *template
    container_name: node3.eventstore
    env_file:
      - vars.env
    environment:
      - EVENTSTORE_INT_IP=172.30.240.13
      - EVENTSTORE_ADVERTISE_HTTP_PORT_TO_CLIENT_AS=2113
      - EVENTSTORE_ADVERTISE_TCP_PORT_TO_CLIENT_AS=1113
      - EVENTSTORE_GOSSIP_SEED=172.30.240.11:2113,172.30.240.12:2113
      - EVENTSTORE_TRUSTED_ROOT_CERTIFICATES_PATH=/certs/ca
      - EVENTSTORE_CERTIFICATE_FILE=/certs/node3/node.crt
      - EVENTSTORE_CERTIFICATE_PRIVATE_KEY_FILE=/certs/node3/node.key
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl --fail --insecure https://node3.eventstore:2113/health/live || exit 1",
        ]
      interval: 5s
      timeout: 5s
      retries: 24
    ports:
      - 1113:1113
      - 2113:2113
    networks:
      clusternetwork:
        ipv4_address: 172.30.240.13

networks:
  clusternetwork:
    name: eventstoredb.local
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 172.30.240.0/24

vars.env

EVENTSTORE_CLUSTER_SIZE=3
EVENTSTORE_RUN_PROJECTIONS=All
EVENTSTORE_DISCOVER_VIA_DNS=false
EVENTSTORE_ENABLE_EXTERNAL_TCP=true
EVENTSTORE_ENABLE_ATOM_PUB_OVER_HTTP=true
EVENTSTORE_ADVERTISE_HOST_TO_CLIENT_AS=127.0.0.1

I had some trouble with the Configurator because it didn’t like me putting 127.0.0.1 as the external IP address of all three nodes (frankly I’m not sure how else to set it up for a local integration test scenario…)

In any case, I used the following connection string successfully on my local dev machine:

esdb://admin:[email protected]:2111,127.0.0.1:2112,127.0.0.1:2113?tls=true&tlsVerifyCert=false

This exact same connection string failed with the above error on the Appveyor build/test run.

Code that sets up and connects to the cluster in the test fixture

(squashed from several modules for simplification)

var file = Path.Combine(Directory.GetCurrentDirectory(), "Static/Compose/EventStoreDb/docker-compose.yml");
_container = new Builder()  // NOTE: FluentDocker
    .UseContainer()
    .UseCompose()  // See above compose file.
    .FromFile(file)
    .RemoveOrphans()
    .WaitForPort("node1.eventstore","1113/tcp")
    .WaitForPort("node2.eventstore","1113/tcp")
    .WaitForPort("node3.eventstore","1113/tcp")
    .Build()
    .Start();

// Idiot test to verify endpoint is reachable.
foreach (var port in new int[] {2111, 2112, 2113})
{
    // Internally uses simple TcpClient check on this IP and port.
    if (!(await TestEndpoint("127.0.0.1", port)))
    {
        throw new IOException($"Can't find EventStoreDB server on port {port}.");
    }
    Console.WriteLine($"Verified can connect to 127.0.0.1:{port}");
}

var connStr = "esdb://admin:[email protected]:2111,127.0.0.1:2112,127.0.0.1:2113?tls=true&tlsVerifyCert=false";

var settings = EventStoreClientSettings.Create(connStr);
var client = new EventStoreClient(settings);
var projectionClient = new EventStoreProjectionManagementClient(settings);
var subscriptionClient = new EventStorePersistentSubscriptionsClient(settings);

// DIES HERE on build server (not locally)
await projectionClient.EnableAsync("$by_category", deadline: TimeSpan.MaxValue);

UPDATE Turns out the setup was just fine. I need to add some extra code to ensure that all nodes were fully up and running and accepting input before running the tests. I accomplished this with a FluentDocker extension to perform health checks on all containers. I had thought .WaitForPort(...) would be sufficient, but it is not.

Thanks for the update @zblocker