Neo4j

flpgrz · ‎07-16-2020

Hello everyone!

I am using the neo4j python API and connecting to a local database. I have a graph which contains 700.000 nodes. I can very quickly create the nodes by using:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'CREATE (n:Node) ' \
    'SET n += row'
    tx.run(cypher_query, batch=batch)

The graph presents 4M relationships, and I am trying to create them in the following way:

with session.begin_transaction() as tx: 
    cypher_query = 'UNWIND $batch as row ' \
    'MATCH (head:Node) WHERE head.id = row.head_id ' \
    'MATCH (tail:Node) WHERE tail.id = row.tail_id ' \
    'CREATE (head)-[rel:RELATIONSHIP]->(tail) ' \
    'SET rel += row.properties'
    tx.run(cypher_query, batch=batch)

The batch size is 10K.
The creation of the relationships is extremely slow. I calculated that it'd take around 30 days. Do you know a work around? Is it normal for it to be so slow?

Cobra · ‎07-16-2020

Yeah I mean your id, not the Neo4j one

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything

View solution in original post

Cobra · ‎07-16-2020

Hello @flpgrz

Did you use UNIQUE CONSTRAINTS on id?

Regards,
Cobra

flpgrz · ‎07-16-2020

Hi @Cobra, thanks for your answer. Do you mean "my id", or the internal Neo4J < id >? I assume you refer to "my id", as I think the latter should always be unique. I did not use UNIQUE CONSTRAINT, but the batch of nodes is an unordered set, where nodes are unique.

In sake of curiosity, how would you use UNIQUE CONSTRAINTS?

Thanks,
Filippo

Cobra · ‎07-16-2020

Yeah I mean your id, not the Neo4j one

Execute this query on your database:

CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE

It will speed up everything

flpgrz · ‎07-16-2020

Ok, thanks, @Cobra, I will try.
I should run that command before I create the nodes, right?

Cobra · ‎07-16-2020

Normally yes, but you can do it after, you just need to wait a bit

flpgrz · ‎07-16-2020

Indeed there were nodes with the same id and the command CREATE CONSTRAINT node_id ON (n:Node) ASSERT n.id IS UNIQUE threw an error. Now, without duplicated id, the relationship are created super quickly. Thanks a lot for the suggestion

Question: is the constraint only meant to detect nodes with the same property, which sows down the MATCH? If, hypothetically, all nodes had been unique w.r.t. the id, would the constraint have made a difference in terms of performance?

Cobra · ‎07-16-2020

The unique constraint is here to avoid duplicates in id but it is also here to speed up the load and the read You must always use unique constraint if you want your queries to go quickly

hazardousmonk · ‎05-29-2021

This is awesome - setting the index literally made a hundred fold improvement in query times for a very long UNWIND clause inserting data

Neo4j

Very slow relationship creation (UNWIND)