Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-12-2021 06:50 PM
I'm trying to load in a csv and match 2 nodes and return both, and I'd like to execute this through python. This is the query I am using in the browser, which works perfectly:
LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id})
ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username
WITH n, row
MERGE (m:Person {username: row.source})
MERGE (m)-[r:FOLLOWS]->(n)
return count(n), count(m), count(r)
This is the query I am using in python, which only returns partial results:
result = neo.run(
"LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"WITH n, row "
"MERGE (m:Person {username: row.source}) "
"MERGE (m)-[r:FOLLOWS]->(n) "
"return r"
)
Currently, the browser is able to create the right number of nodes, while the python code creates only about half. I found that if I deleted csv columns, neo is able to make more nodes than before, but it still doesn't make the full amount of nodes that the browser does.
04-13-2021 01:07 AM
Hello @Empyr3an and welcome to the Neo4j community
Can you try with USING PERIODIC COMMIT 500
?
USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id})
ON CREATE SET n.id = row.id, n.name = row.name, n.username = row.username
WITH n, row
MERGE (m:Person {username: row.source})
MERGE (m)-[r:FOLLOWS]->(n)
return count(n), count(m), count(r)
Regards,
Cobra
04-13-2021 05:33 AM
Hi Cobra,
Thanks for the welcome, and I did try using periodic commit in my python code, however that didn't make a difference. In the browser the same command worked without using periodic commit.
04-13-2021 05:35 AM
That's weird, do you use the same version for the Python driver and the Neo4j database?
04-13-2021 05:41 AM
I just checked my browser version and it says its 4.1.3. My python version was 4.2 and I just downgraded to 4.1, and still am only getting partial results.
Also, although periodic commit didn't give me any issues yesterday, now when I try to run your command on browser it says
Executing queries that use periodic commit in an open transaction is not possible.
Edit: Nevermind, I fixed the periodic commit error by including :auto in my command, but the partial data problem remains
04-13-2021 05:57 AM
You check directly in the Neo4j browser to compare number of nodes and relationships?
04-13-2021 06:01 AM
Well, I counted the number of nodes and relations created by the browser using the count function. The browser consistently returns 314, while python (through jupyter notebook, not sure if that makes a difference) returns 195.
Weirdly, if I delete the name column from the csv, python is able to return 301, which is closer but still not all the entire dataset
04-13-2021 06:11 AM
Are you using the same CSV?
Did you clean the database before to load data?
Are you using the exact same query?
04-13-2021 06:17 AM
Yup same exact csv, same query, and I clean the database everytime I run a query.
EDIT: If I run this
"LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"WITH n, row "
"return n"
I am getting the expected result, but as soon as I add this line:
"LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"WITH n, row "
"MERGE (m:Person {username: '"+username+"'}) "
"return n, m"
I receive partial results. Infact, even if I return only n with the second command, I still only get partial
04-13-2021 06:25 AM
To be honest, I'm confused
First time I see this problem.
Can you upgrade to the last version of Neo4j?
04-13-2021 06:29 AM
I'm a bit unsure what to actually upgrade. It seems the browser version itself is 4.2.5, but the server is 4.1.3.
If you saw my last edit, do you think there's any chance that there is something wrong with the code?
And if possible, I can send you the data/code so you can try to reproduce the error. This is really frustrating, and I might just be making a mistake with my data
04-13-2021 07:09 AM
Neo4j Browser is different from Neo4j Server.
Last version of Neo4j Server is 4.2.5.
It must come from the code I guess.
Yes you can share code and data here.
Regards,
Cobra
04-13-2021 12:47 PM
Weird I was playing around with my code all day and it seems to be working now? the nodes and correct number of edges are being loaded properly, albeit a bit slow.
("USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
"MERGE (n:Person {id:row.id}) "
"ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
"MERGE (m:Person {username: '"+username+"'}) "
"MERGE (m)-[r:FOLLOWS]->(n) "
"return count(r)"
)
This is the overall command I'm running. Most of the csv's I'm loading correspond to around a 1000 new nodes and edges, however each takes 1-5 seconds to finish. Some have a couple thousand nodes/edges and can take up to a minute to execute.
Do you have any advice to speed up the query?
04-13-2021 01:21 PM
You should have a look at UNIQUE CONSTRAINTS . Create a unique constraint on id property node for example then load your nodes and relationships into the database.
04-13-2021 01:44 PM
ah yeah that was going to be my next step
CREATE CONSTRAINT twitter_id IF NOT EXISTS ON (n:Person) ASSERT n.id IS UNIQUE
Definitely faster, thanks for the help!
04-13-2021 01:48 PM
Happy to help, you can also pass a as a parameter username
, it wil also be faster and it's a best practice .
04-13-2021 01:51 PM
Okay so add that as another constraint?
And I'm testing adding the constraint by adding the same data twice, it seems checking the constraint is actually slower than when I added the data? I'm not sure why neo4j slows down the second time. For the sole purpose of adding data though it works.
04-13-2021 01:55 PM
Always faster the first time but it will always be faster than without constraint.
For your use case, you only need one constraint.
Parameters are different: Parameters - Neo4j Cypher Manual
04-13-2021 02:49 PM
Got it, thanks.
One last question hopefully. Currently, I have a for loop going through all the csvs, and importing each csv individually. It works fine for 33/314 total csvs, but after reaching the 33rd csv, neo4j just gets stuck. From there, I can only import csvs one at a time (which does work).
I'm not sure where this problem is even coming from. What would you suggest?
04-13-2021 02:54 PM
I will check if something changed in CSVs, maybe a line is broken or columns names are different.
All the sessions of the conference are now available online