readTransaction() not working in GitLab CI

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

I'm not sure, if this is the right place to ask, but I am running my unit tests in GitLab CI against a Neo4j-Cluster of 3 core servers. I am using the JavaScript driver and neo4j 3.5.12-enterprise. Most of the transactions are working well, both on my local machine, and on GitLab, but when I try to use read transactions like this:

db.session().readTransaction(async (tx) => { ... })

The execution fails with the following error message:

Failed to connect to server. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. [...]

It works when I change my readTransactions to writeTransactions. I think that may have something to do with the fact that I have no dedicated neo4j read replicas in my cluster, but that should not be a problem. It works on my machine just fine. In both cases the neo4j cluster runs in different docker containers that I am starting via docker compose. In GitLab CI I'm using docker-in-docker to achieve the same.

I have also noticed that on GitLab CI, some read queries take really long to execute, even on an empty database, maybe that's related. I don't know how I could solve this problem, replacing all readTransactions is obviously not an option. Any help is greatly appreciated.

UPDATE:
I added a read replica to my test cluster and the tests still fail, really weird.

2 REPLIES 2

Doing read vs. write transactions routes your query to a different machine, depending. Full gory details on how query routing works with drivers can be found here.

In terms of why things take a long time to run simple queries -- my bet would be that in a CI system you're creating a new driver instance each time. Drivers create connection pools -- and so if your'e starting totally cold and issuing a simple query and then tearing down the driver stack, then probably you're spending most of your time acquiring the connection pool to start. In the driver documentation it notes that drivers are expensive, sessions are cheap. If you have many individual steps in a CI pipeline, it's hard to get around this, but you can change driver settings when you connect to make only one connection (for example) and that may help speed the overall queries.

As to why read transactions fail -- I would suspect something's wrong with your cluster's routing table. The "failed to connect to server" error message may be a result of your advertised address settings not being right in the cluster. If for example only one of your 3 machines has an externally routable address, what you're reporting could happen.

I'd log into your cluster and do CALL dbms.cluster.overview() and inspect the results and make sure all advertised addresses are externally valid IP addresses and are routable.

Thank you for your answer, I ran CALL dbms.cluster.overview() on the cluster leader via the http api and got this result

[
  {
    "columns": [
      "id",
      "addresses",
      "role",
      "groups",
      "database"
    ],
    "data": [
      {
        "row": [
          "ff9e14fb-acf9-4cda-933e-1be02ba097d6",
          [
            "bolt://localhost:6687",
            "http://localhost:6474",
            "https://localhost:7473"
          ],
          "READ_REPLICA",
          [],
          "default"
        ],
        "meta": [
          null,
          null,
          null,
          null,
          null,
          null
        ]
      },
      {
        "row": [
          "d8f8c3c9-e20b-4985-b3c8-6b85919de018",
          [
            "bolt://localhost:7687",
            "http://localhost:7474",
            "https://localhost:7473"
          ],
          "LEADER",
          [],
          "default"
        ],
        "meta": [
          null,
          null,
          null,
          null,
          null,
          null
        ]
      },
      {
        "row": [
          "3b903e3d-63c1-4679-9993-794a7a5b2c3a",
          [
            "bolt://localhost:8687",
            "http://localhost:8474",
            "https://localhost:7473"
          ],
          "FOLLOWER",
          [],
          "default"
        ],
        "meta": [
          null,
          null,
          null,
          null,
          null,
          null
        ]
      },
      {
        "row": [
          "812afbb2-ee26-4829-8a94-0efa33a4f5a5",
          [
            "bolt://localhost:9687",
            "http://localhost:9474",
            "https://localhost:7473"
          ],
          "FOLLOWER",
          [],
          "default"
        ],
        "meta": [
          null,
          null,
          null,
          null,
          null,
          null
        ]
      }
    ]
  }
]

To me that looks good. I also logged the js driver output and there definitely seems to be something wrong with the routing, I get the following error messages:

Connection [2][] created towards localhost:6687
 Connection [2][] experienced a fatal error {"code":"SessionExpired","name":"Neo4jError"}
 Connection [2][] closing
 Routing driver 0 will forget localhost:6687 for database '' because of an error SessionExpired 'Failed to connect to server. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0. Caused by: connect ECONNREFUSED 127.0.0.1:6687'
 Connection [2][] closed
 Connection [3][] created towards localhost:9687
 Connection [3][] experienced a fatal error {"code":"SessionExpired","name":"Neo4jError"}
 Connection [3][] closing
 Routing driver 0 will forget localhost:9687 for database '' because of an error SessionExpired 'Failed to connect to server. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0. Caused by: connect ECONNREFUSED 127.0.0.1:9687'
 Connection [3][] closed
 Connection [4][] created towards localhost:8687
 Connection [4][] experienced a fatal error {"code":"SessionExpired","name":"Neo4jError"}
 Connection [4][] closing
 Routing driver 0 will forget localhost:8687 for database '' because of an error SessionExpired 'Failed to connect to server. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0. Caused by: connect ECONNREFUSED 127.0.0.1:8687'
 Connection [4][] closed
 Routing table is stale for database: "" and access mode: "READ": RoutingTable[database=default database, expirationTime=1588944079435, currentTime=1588943795982, routers=[localhost:9687,localhost:8687,localhost:7687], readers=[], writers=[localhost:7687]]
 Connection [5][] created towards localhost:9687
 Connection [5][] experienced a fatal error {"code":"SessionExpired","name":"Neo4jError"}
 Connection [5][] closing
 Connection [6][] created towards localhost:8687
 Connection [5][] closed
 Connection [6][] experienced a fatal error {"code":"SessionExpired","name":"Neo4jError"}
 Connection [6][] closing
 Connection [1][bolt-0] acquired from the pool localhost:7687
 Connection [1][bolt-0] C: RUN CALL dbms.cluster.routing.getRoutingTable($context) {"context":{}} {}
 Connection [1][bolt-0] C: PULL_ALL
 Connection [6][] closed
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{"t_first":{"low":2,"high":0},"fields":["ttl","servers"]}]}
 Connection [1][bolt-0] S: RECORD {"signature":113,"fields":[[{"low":300,"high":0},[{"addresses":["localhost:7687"],"role":"WRITE"},{"addresses":["localhost:8687","localhost:6687","localhost:9687"],"role":"READ"},{"addresses":["localhost:9687","localhost:7687","localhost:8687"],"role":"ROUTE"}]]]}
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{"bookmark":"neo4j:bookmark:v1:tx67","type":"r","t_last":{"low":0,"high":0}}]}
 Connection [1][bolt-0] C: RESET
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{}]}
 Connection [1][bolt-0] released to the pool localhost:7687
 Updated routing table RoutingTable[database=default database, expirationTime=1588944095999, currentTime=1588943795999, routers=[localhost:9687,localhost:7687,localhost:8687], readers=[localhost:8687,localhost:6687,localhost:9687], writers=[localhost:7687]]

on repeat a few times as the readTransaction does its retries. In the very same CI process a write request succeeds:

Connection [1][bolt-0] acquired from the pool localhost:7687
Connection [1][bolt-0] C: BEGIN {}
Connection [1][bolt-0] C: RUN MERGE (u: User { email: $emailAddress })
                        ON CREATE
                        SET u += { 
                         firstName: $firstName,
                         lastName: $lastName,
                         password: $password
                        } {"firstName":"testFirstName","lastName":"testLastName","password":"$2a$08$WwxvdxXON3LFL79aWZ0/gev2wxypir2yH3dTrVrtNunN81H9ntLsi","emailAddress":"test@email.test"} {}
 Connection [1][bolt-0] C: PULL_ALL
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{}]}
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{"t_first":{"low":72,"high":0},"fields":[]}]}
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{"stats":{"labels-added":{"low":1,"high":0},"nodes-created":{"low":1,"high":0},"properties-set":{"low":4,"high":0}},"type":"w","t_last":{"low":0,"high":0}}]}
 Connection [1][bolt-0] C: COMMIT
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{"bookmark":"neo4j:bookmark:v1:tx77"}]}
 Connection [1][bolt-0] C: RESET
 Connection [1][bolt-0] S: SUCCESS {"signature":112,"fields":[{}]}
 Connection [1][bolt-0] released to the pool localhost:7687