Neo4j

Ajwah · ‎12-13-2019

I would like to have some guidance on creating a relationship between two existing nodes without introducing new nodes. The canonical way of doing this should be something like:

MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
MERGE (rule)-[r:APPLIES_TO]->(entity);

However, despite the indexes on entity.strength and rule.id as well as rule.column , rule.operation and rule.value this request does not complete.

When I do a count instead, it returns immediately
When I do a create instead of a merge it will finish in 6 seconds, but then I believe it creates new entities as well in the process

When I say I believe then I am simply stating that I am confused by the message the browser gives me when employing the create variant:

MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
CREATE (rule)-[r:APPLIES_TO]->(entity);

which states:

Created 586131 nodes, set 1172262 properties, created 586131 relationships, completed after 5834 ms

When I count the entity nodes or the Rule nodes, then no new nodes are introduced, but then in that case, why is it stating: Created 586131 nodes if it only introduced relationships?

I also tried the following:

MATCH (entity: Entity{ collection_id: 10 })
WHERE entity.strength > 0
MERGE (Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })-[r:APPLIES_TO]->(entity);

as such a Rule is unique anyways to no avail. It runs forever.

I also tried something like:

MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0 AND NOT (rule)-[:APPLIES_TO]->(entity)
CREATE (rule)-[r:APPLIES_TO]->(entity);

this also keeps running forever.

Ajwah · ‎12-13-2019

An update from my side. After deleting the relationships and repeating the above query:

MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
CREATE (rule)-[r:APPLIES_TO]->(entity);

The browser states:

Created 586131 relationships, completed after 4234 ms.

I am not sure why before got:

Created 586131 nodes, set 1172262 properties, created 586131 relationships, completed after 5834 ms

At this point my question pertains to why the MERGE query takes forever:

MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
MERGE (rule)-[r:APPLIES_TO]->(entity);

whereas the CREATE query only takes about 5 seconds?

Ajwah · ‎12-15-2019

Apparently you need to use neo4j-admin when importing larger files.
Importing through the browser is only suitable for small quantities.
1 million rows is simply not feasible because via the browser it will
only use one cpu core to do the import.

mojo2go · ‎12-15-2019

Hi Ajwah, Welcome to the community!

I want to help but I don't have good visibility into your dataset, and it looks like you have maybe two issues.

I suggest that you set aside your large dataset and play with 5 -10 nodes, so that you can see the entire dataset in your browser viewer. If there is anything happening that you don't expect, you won't need to look at the statistics of how many objects are being created, you'll be able to individually inspect any newly created nodes or relationships. What your query does to a small group of nodes it will also do to a large dataset.

Try placing EXPLAIN or PROFILE on the line before your code. EXPLAIN will ensure that your code will not execute but it will try to predict what steps it will take and most importantly show you when you are about to accidentally create a Cartesian product, which can be inefficient and may not what you actually wanted. If you use PROFILE, it will actually run your query and then will tell you exactly the steps were taken, and then you can check the database 'hits' for each step. It's invaluable for optimising queries.

Regarding large datasets that eat up a lot of ram, I often have good luck when using the APOC procedure periodic.iterate to break up the query into smaller chunks. It can often make a large query run more quickly.

Neo4j

Create relationship between two existing nodes without introducing new nodes