Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-13-2019 11:14 AM
I would like to have some guidance on creating a relationship between two existing
nodes without introducing new nodes. The canonical way of doing this should be something like:
MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
MERGE (rule)-[r:APPLIES_TO]->(entity);
However, despite the indexes on entity.strength
and rule.id
as well as rule.column
, rule.operation
and rule.value
this request does not complete.
create
instead of a merge
it will finish in 6 seconds, but then I believe it creates new entities as well in the processWhen I say I believe then I am simply stating that I am confused by the message the browser gives me when employing the create variant:
MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
CREATE (rule)-[r:APPLIES_TO]->(entity);
which states:
Created 586131 nodes, set 1172262 properties, created 586131 relationships, completed after 5834 ms
When I count the entity nodes or the Rule nodes, then no new nodes are introduced, but then in that case, why is it stating: Created 586131 nodes
if it only introduced relationships?
I also tried the following:
MATCH (entity: Entity{ collection_id: 10 })
WHERE entity.strength > 0
MERGE (Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })-[r:APPLIES_TO]->(entity);
as such a Rule
is unique anyways to no avail. It runs forever.
I also tried something like:
MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0 AND NOT (rule)-[:APPLIES_TO]->(entity)
CREATE (rule)-[r:APPLIES_TO]->(entity);
this also keeps running forever.
12-13-2019 12:07 PM
An update from my side. After deleting the relationships and repeating the above query:
MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
CREATE (rule)-[r:APPLIES_TO]->(entity);
The browser states:
Created 586131 relationships, completed after 4234 ms.
I am not sure why before got:
Created 586131 nodes, set 1172262 properties, created 586131 relationships, completed after 5834 ms
At this point my question pertains to why the MERGE
query takes forever:
MATCH (entity: Entity{ collection_id: 10 })
MATCH (rule: Rule{ collection_id: 10, id: "strength_greater_than_0", column: "strength", operation: "greater_than", value: "0" })
WHERE entity.strength > 0
MERGE (rule)-[r:APPLIES_TO]->(entity);
whereas the CREATE
query only takes about 5 seconds?
12-15-2019 04:41 AM
Apparently you need to use neo4j-admin when importing larger files.
Importing through the browser is only suitable for small quantities.
1 million rows is simply not feasible because via the browser it will
only use one cpu core to do the import.
12-15-2019 01:53 PM
Hi Ajwah, Welcome to the community!
I want to help but I don't have good visibility into your dataset, and it looks like you have maybe two issues.
I suggest that you set aside your large dataset and play with 5 -10 nodes, so that you can see the entire dataset in your browser viewer. If there is anything happening that you don't expect, you won't need to look at the statistics of how many objects are being created, you'll be able to individually inspect any newly created nodes or relationships. What your query does to a small group of nodes it will also do to a large dataset.
Try placing EXPLAIN or PROFILE on the line before your code. EXPLAIN will ensure that your code will not execute but it will try to predict what steps it will take and most importantly show you when you are about to accidentally create a Cartesian product, which can be inefficient and may not what you actually wanted. If you use PROFILE, it will actually run your query and then will tell you exactly the steps were taken, and then you can check the database 'hits' for each step. It's invaluable for optimising queries.
Regarding large datasets that eat up a lot of ram, I often have good luck when using the APOC procedure periodic.iterate to break up the query into smaller chunks. It can often make a large query run more quickly.
All the sessions of the conference are now available online