Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-10-2020 01:02 PM
Hello.
I am using python 3.6 and my neo4j v 4.0
I am currently writing large amounts of data to the neo4j database and it is taking way longer than I hope it to. I could really use some ideas on how to improve my current way of writing data to the database.
At the moment I receive big JSON files with anything between 300 - 3000 values that need to be inserted into neo4j. Some are nodes and some are relations.
Currently i'm using the python neo4j library to write to the database by running the following code:
with driver.session() as session:
for nodeData in data1:
session.write_transaction(create_data_function_node, nodeData, fileData)
Which essentially means im running thousands of individual MERGE queries against the database.
Currently my longest "write" time was about 270 seconds which is too much considering that I can't multi-thread writing to neo4j because as far as I know having 2 threads will just end up blocking each other (I did some mild testing on it and it did not seem to help the situation...)
What can I implement in order to try and write data in a more efficient manner ?
Solved! Go to Solution.
04-10-2020 01:41 PM
You should submit them in batches rather than 1 at a time, and experiment with batch size, but generally something like batches of 10 - 20k is a good starting point depending on your data. As a simplified example you'll have to adapt for your use, consider starting with this:
CREATE (:Person { name: "Bob" });
CREATE (:Person { name: "Angela" });
CREATE (:Person { name: "Sarah" });
And instead doing this:
WITH [{ name: "Bob" }, { name: "Sarah" }, { name: "Angela" }] as batch
UNWIND batch as individual
CREATE (p:Person)
SET p += individual;
Then make your batches 10-20k in size, and you're going to see a massive speedup. The reason why is simple: every time you run a transaction, there is overhead to ensure the transaction is properly committed and so forth. If you do records 1-by-1, you're paying that overhead over and over again.
04-10-2020 01:41 PM
You should submit them in batches rather than 1 at a time, and experiment with batch size, but generally something like batches of 10 - 20k is a good starting point depending on your data. As a simplified example you'll have to adapt for your use, consider starting with this:
CREATE (:Person { name: "Bob" });
CREATE (:Person { name: "Angela" });
CREATE (:Person { name: "Sarah" });
And instead doing this:
WITH [{ name: "Bob" }, { name: "Sarah" }, { name: "Angela" }] as batch
UNWIND batch as individual
CREATE (p:Person)
SET p += individual;
Then make your batches 10-20k in size, and you're going to see a massive speedup. The reason why is simple: every time you run a transaction, there is overhead to ensure the transaction is properly committed and so forth. If you do records 1-by-1, you're paying that overhead over and over again.
04-10-2020 02:00 PM
Thank you for your reply.
Just to make it clear I am a beginner with neo4j. I don't quite understand how to create an iterator that can do this for a data set.
If I understood you clearly the actual query should look something like this:
strCypher = """
WITH $batch as batch UNWIND batch as individual CREATE (p:Person) SET p += individual; """ tx.run(strCypher, batch=batch)
I am note quite sure how to create the $batch how should i be formatted python wise ? a list of dictionaries ?
04-10-2020 02:43 PM
A list of dictionaries would exactly do the trick, and submitting them as a parameter as in your example is the right way to approach it.
04-10-2020 03:03 PM
Great ! Thank you 🙂
04-11-2020 05:05 AM
Currently the method you proposed uses CREATE but I must use MERGE in the node creation because I have a lot of duplicates in the incoming data.
When I tried changing the current cypher to MERGE it just creates 1 node ^^ because he does not take the properties into consideration.
How could I do batch query wit a MERGE ?
Also, Would it be okay to ask how to implement the same with relation creation wit MERGE as well ?
I have 4 WHERE clauses in my relation creation so the same logic does not seem to work.
MATCH (a),(b)
WHERE a.data1 = $data1 AND b.data1 = $data1
AND a.data2 = $data2 AND b.data2 = $data2
MERGE (a)-[r:Relation { rel1: $rel1, rel2: $rel2}]->(b)
How do I batch the WHERE clause ?
04-11-2020 04:36 PM
You don't need to batch the WHERE clause. You should separate the matches though for performance, like what I have below, it will improve performance. As for the batching, it doesn't matter whether you use create or merge. I used create just as an example because I didn't know what your code was doing. Either way it will work.
Rewrite to this:
WITH $batch as batch
UNWIND batch as record
MATCH (a)
WHERE a.data1 = record.data1 AND a.data2 = record.data2
WITH a
MATCH (b)
WHERE b.data1 = $record.data1 AND b.data2 = record.data2
MERGE (a)-[r:Relation { /* Whatever here */ }]->(b)
Your overall performance will also go a lot better if you put labels on these nodes and don't merge them unlabeled like this.
04-11-2020 06:30 PM
Got it.
Thank you very much I seem to be getting the hang of this now !!
Just a note !
When I ran the query using the current syntax I got an error saying that record could not be found once it tried to
MATCH (b)
I then removed the
WITH a
line from the code and it worked !
I think the second WITH in the syntax essentially overwrites the previous
WITH $batch as batch
that was used to read from $batch.
All the sessions of the conference are now available online