Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-16-2020 06:24 AM
Hello everyone!
I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:
with session.begin_transaction() as tx:
cypher_query = 'UNWIND $batch as row ' \
'CREATE (n:Node) ' \
'SET n += row'
tx.run(cypher_query, batch=batch)
The graph presents 4M relationships, and I am trying to create them in the following way:
with session.begin_transaction() as tx:
cypher_query = 'UNWIND $batch as row ' \
'MATCH (head:Node) WHERE head.id = row.head_id ' \
'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
'SET rel += row.properties'
tx.run(cypher_query, batch=batch)
The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?
Solved! Go to Solution.
07-16-2020 07:03 AM
Yeah I mean your id, not the Neo4j one
Execute this query on your database:
CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE
It will speed up everything
07-16-2020 06:41 AM
07-16-2020 07:02 AM
Hi @Cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.
In sake of curiosity, how would you use UNIQUE CONSTRAINTS?
Thanks,
Filippo
07-16-2020 07:03 AM
Yeah I mean your id, not the Neo4j one
Execute this query on your database:
CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE
It will speed up everything
07-16-2020 07:06 AM
Ok, thanks, @Cobra, I will try.
I should run that command before I create the nodes, right?
07-16-2020 07:07 AM
Normally yes, but you can do it after, you just need to wait a bit
07-16-2020 08:48 AM
Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE
threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion
Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?
07-16-2020 08:51 AM
The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read You must always use unique constraint if you want your queries to go quickly
05-29-2021 08:02 AM
This is awesome - setting the index literally made a hundred fold improvement in query times for a very long UNWIND clause inserting data
All the sessions of the conference are now available online