Neo4j

gerwin_van_de_s · ‎07-20-2020

After diagnosing various components in use regarding queries, we can reliably get queries failing when run from the cypher-shell.

1 row available after 1 ms, consumed after another 2 ms
neo4j@staging> WITH apoc.cypher.runFirstColumn("MATCH (reseller:Reseller) RETURN reseller", {offset:$offset, first:$first, cypherParams: $cypherParams}, True) AS x
UNWIND x AS `reseller` RETURN `reseller` {
.guid , .accountID , .deleted ,properties: head([ reseller_properties IN apoc.cypher.runFirstColumn("MATCH (this)-[r:HAS_PROPERTIES]->(rp:ResellerProperties) WHERE r.validUntil IS NULL AND r.validFrom IS NOT NULL RETURN rp ORDER BY r.validFrom DESC LIMIT 1", {this: reseller, cypherParams: $cypherParams}, true) | reseller_properties { .guid, .name }])
}
AS `reseller`;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| reseller                                                                                                                                                                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {guid: "b4abe751-d8d8-4560-bd19-d70acb1ae7a0", accountID: "7272480929", deleted: NULL, properties: {name: "Vadaxchange Testing Reseller", guid: "2f587e3e-8f1c-455d-bca9-b3dc7a9b3a98"}} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row available after 1 ms, consumed after another 1 ms
neo4j@staging> WITH apoc.cypher.runFirstColumn("MATCH (reseller:Reseller) RETURN reseller", {offset:$offset, first:$first, cypherParams: $cypherParams}, True) AS x
UNWIND x AS `reseller` RETURN `reseller` {
.guid , .accountID , .deleted ,properties: head([ reseller_properties IN apoc.cypher.runFirstColumn("MATCH (this)-[r:HAS_PROPERTIES]->(rp:ResellerProperties) WHERE r.validUntil IS NULL AND r.validFrom IS NOT NULL RETURN rp ORDER BY r.validFrom DESC LIMIT 1", {this: reseller, cypherParams: $cypherParams}, true) | reseller_properties { .guid, .name }])
}
AS `reseller`;
Unknown variable `reseller_properties`.
neo4j@staging> :params
:param cypherParams => {nodes: ['b4abe751-d8d8-4560-bd19-d70acb1ae7a0']}
:param first        => -1
:param offset       => 0

That time was the 10th time running the exact same query, the number of times to run the query before failing appears random. All subsequent queries fail.

The above query was generated via graphql-neo4j-js, when queried via our graphql endpoint certain forms of the query fail every time, adding or removing a field can then cause the query to function happily for awhile. But the error was always generated at the neo4j server end. When running in causal cluster mode the errors were almost guaranteed on the FOLLOWER nodes and never seen on the LEADER node when querying the core node set.

The above is run against a standalone instance (enterprise) created via the helm-chart, no resource limits were applied, no configuration settings done for any form of JVM memory usage, etc.. just what is in the chart definition.
The cluster is on AWS EKS 1.14.7 nodes, running on t3.large instances (i've moved them to EBS backed m5.xlarge nodes with 30G of local EBS storage, and using a gp2 PVC).

Related github issue with further details: https://github.com/neo4j-graphql/neo4j-graphql-js/issues/445

In case people wonder about the size of the DB and looking at memory issues:

neo4j@staging> MATCH (n) RETURN count(n);
+----------+
| count(n) |
+----------+
| 95       |
+----------+
1 row available after 25 ms, consumed after another 1 ms
neo4j@staging> MATCH (n:Reseller) RETURN count(n);
+----------+
| count(n) |
+----------+
| 1        |
+----------+
1 row available after 16 ms, consumed after another 1 ms
neo4j@staging> MATCH (n:ResellerProperties) RETURN count(n);
+----------+
| count(n) |
+----------+
| 1        |
+----------+
1 row available after 15 ms, consumed after another 1 ms
neo4j@staging> MATCH (:Reseller)-[r]-(:ResellerProperties) RETURN count(r);
+----------+
| count(r) |
+----------+
| 1        |
+----------+
1 row available after 24 ms, consumed after another 2 ms

gerwin_van_de_s · ‎07-20-2020

Just recreated the database using the neo4j 4.0.4-1 chart in standalone mode, with 8G of RAM resources, running neo4j 4.0.5 with apoc 4.0.0.17
Tried the query with the data in both the default DB and the DB i've created for this environment, problem continues to occur on the first run of the query.

And i've just replicated in a local docker container running neo4j 4.1.0 as well, standalone.
And neo4j 4.0.5, and 4.0.4, with various different versions of apoc.

I've also stepped back to 3.5.14 with latest apoc 3.5.0.12, and i'm unable to trigger the problem.
Stepped forward to 4.0.3 with any of 4.0.0.4/12/17 versions of APOC and i'm unable to trigger the problem.

Neo4j

Queries fail periodically (Neo4J post 4.0.3 broken)