Neo4j

Empyr3an · ‎04-12-2021

I'm trying to load in a csv and match 2 nodes and return both, and I'd like to execute this through python. This is the query I am using in the browser, which works perfectly:

LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row 
MERGE (n:Person {id:row.id}) 
ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username 
WITH n, row 
MERGE (m:Person {username: row.source}) 
MERGE (m)-[r:FOLLOWS]->(n) 
return count(n), count(m), count(r)

This is the query I am using in python, which only returns partial results:

            result = neo.run(
                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "MERGE (m:Person {username: row.source}) "
                        "MERGE (m)-[r:FOLLOWS]->(n) "
                        "return r"
                        )

Currently, the browser is able to create the right number of nodes, while the python code creates only about half. I found that if I deleted csv columns, neo is able to make more nodes than before, but it still doesn't make the full amount of nodes that the browser does.

Cobra · ‎04-13-2021

Hello @Empyr3an and welcome to the Neo4j community

Can you try with USING PERIODIC COMMIT 500?

USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row
MERGE (n:Person {id:row.id}) 
ON CREATE SET n.id = row.id, n.name = row.name, n.username = row.username 
WITH n, row 
MERGE (m:Person {username: row.source}) 
MERGE (m)-[r:FOLLOWS]->(n) 
return count(n), count(m), count(r)

Regards,
Cobra

Empyr3an · ‎04-13-2021

Hi Cobra,

Thanks for the welcome, and I did try using periodic commit in my python code, however that didn't make a difference. In the browser the same command worked without using periodic commit.

Cobra · ‎04-13-2021

That's weird, do you use the same version for the Python driver and the Neo4j database?

Empyr3an · ‎04-13-2021

I just checked my browser version and it says its 4.1.3. My python version was 4.2 and I just downgraded to 4.1, and still am only getting partial results.

Also, although periodic commit didn't give me any issues yesterday, now when I try to run your command on browser it says
Executing queries that use periodic commit in an open transaction is not possible.

Edit: Nevermind, I fixed the periodic commit error by including :auto in my command, but the partial data problem remains

Cobra · ‎04-13-2021

You check directly in the Neo4j browser to compare number of nodes and relationships?

Empyr3an · ‎04-13-2021

Well, I counted the number of nodes and relations created by the browser using the count function. The browser consistently returns 314, while python (through jupyter notebook, not sure if that makes a difference) returns 195.

Weirdly, if I delete the name column from the csv, python is able to return 301, which is closer but still not all the entire dataset

Cobra · ‎04-13-2021

Are you using the same CSV?
Did you clean the database before to load data?
Are you using the exact same query?

Empyr3an · ‎04-13-2021

Yup same exact csv, same query, and I clean the database everytime I run a query.

EDIT: If I run this

                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "return n"

I am getting the expected result, but as soon as I add this line:


                        "LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                        "MERGE (n:Person {id:row.id}) "
                        "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                        "WITH n, row "
                        "MERGE (m:Person {username: '"+username+"'}) "
                        "return n, m"

I receive partial results. Infact, even if I return only n with the second command, I still only get partial

Cobra · ‎04-13-2021

To be honest, I'm confused
First time I see this problem.

Can you upgrade to the last version of Neo4j?

Empyr3an · ‎04-13-2021

I'm a bit unsure what to actually upgrade. It seems the browser version itself is 4.2.5, but the server is 4.1.3.

If you saw my last edit, do you think there's any chance that there is something wrong with the code?

And if possible, I can send you the data/code so you can try to reproduce the error. This is really frustrating, and I might just be making a mistake with my data

Cobra · ‎04-13-2021

Neo4j Browser is different from Neo4j Server.
Last version of Neo4j Server is 4.2.5.

It must come from the code I guess.

Yes you can share code and data here.

Regards,
Cobra

Empyr3an · ‎04-13-2021

Weird I was playing around with my code all day and it seems to be working now? the nodes and correct number of edges are being loaded properly, albeit a bit slow.

                ("USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:/follows.csv' AS row "
                "MERGE (n:Person {id:row.id}) "
                "ON CREATE SET n.id = row.id, n.name=row.name, n.username=row.username "
                "MERGE (m:Person {username: '"+username+"'}) "
                "MERGE (m)-[r:FOLLOWS]->(n) "
                "return count(r)"
                )

This is the overall command I'm running. Most of the csv's I'm loading correspond to around a 1000 new nodes and edges, however each takes 1-5 seconds to finish. Some have a couple thousand nodes/edges and can take up to a minute to execute.

Do you have any advice to speed up the query?

Cobra · ‎04-13-2021

You should have a look at UNIQUE CONSTRAINTS . Create a unique constraint on id property node for example then load your nodes and relationships into the database.

Empyr3an · ‎04-13-2021

ah yeah that was going to be my next step

CREATE CONSTRAINT twitter_id IF NOT EXISTS ON (n:Person) ASSERT n.id IS UNIQUE

Definitely faster, thanks for the help!

Cobra · ‎04-13-2021

Happy to help, you can also pass a as a parameter username, it wil also be faster and it's a best practice .

Empyr3an · ‎04-13-2021

Okay so add that as another constraint?

And I'm testing adding the constraint by adding the same data twice, it seems checking the constraint is actually slower than when I added the data? I'm not sure why neo4j slows down the second time. For the sole purpose of adding data though it works.

Cobra · ‎04-13-2021

Always faster the first time but it will always be faster than without constraint.

For your use case, you only need one constraint.

Parameters are different: Parameters - Neo4j Cypher Manual

Empyr3an · ‎04-13-2021

Got it, thanks.

One last question hopefully. Currently, I have a for loop going through all the csvs, and importing each csv individually. It works fine for 33/314 total csvs, but after reaching the 33rd csv, neo4j just gets stuck. From there, I can only import csvs one at a time (which does work).

I'm not sure where this problem is even coming from. What would you suggest?

Cobra · ‎04-13-2021

I will check if something changed in CSVs, maybe a line is broken or columns names are different.

Neo4j

Query returning all results properly in Browser, but only returning partial in neo4j python driver