Neo4j

giff-h · ‎12-25-2022

I have a huge CSV that I'm trying to import, I've already created the relevant nodes of both types from it, and now I'm creating the relationships. I'm using this query to be re-entrant on failures:

USING PERIODIC COMMIT LOAD CSV FROM 'file:///posts.csv' AS line
UNWIND split(line[1], ' ') AS tag
MATCH (i:ImageNode {image_id: line[0]}), (t:TagNode {value: tag})
MERGE (i)-[r:TAGGED]->(t)
ON CREATE SET r.created_at = timestamp() / 1000.0

When I run it on the shell as this:

$ cat posts.cql | cypher-shell -u neo4j -p [the password]

JVM memory usage of the keeps climbing. Why? Am I not doing something to clear unused data out of memory when it commits?

Cypher-Shell 4.1.12 and Neo4j Driver 4.1.4

glilienfield · ‎12-26-2022

One thing you can do is move the match on ImageNode to before the unwind. As it stands, you are repeating this same match for each tag element on a row. You only need to match the tag and create the relationship after the unwind.

have you monitored the jvm with tools like jConsole or visualVm to see what is happening?

https://www.rapid7.com/blog/post/2012/12/31/guide-to-monitoring-jvm-memory-usage-draft/

bennu_neo · ‎01-04-2023

Hi @giff-h

Can you try something like:

:auto   LOAD CSV FROM 'file:///posts.csv' AS line
CALL {
    with line
    UNWIND split(line[1], ' ') AS tag
    MATCH (i:ImageNode {image_id: line[0]}), (t:TagNode {value: tag})
    MERGE (i)-[r:TAGGED]->(t)
    ON CREATE SET r.created_at = timestamp() / 1000.0
} IN TRANSACTIONS OF 10 ROWS

Oh, y’all wanted a twist, ey?

Neo4j

Memory usage keeps climbing in PERIODIC COMMIT