Neo4j

cerdo_fruta · ‎02-09-2021

I am new to neo4j, my data is in csv files trying load them in db and create relationships.

departments.csv(9 rows)

dept_name
dept_no

dept_emp.csv(331603 rows)

dept_no
emp_no
from_date
to_date

I have create nodes with labels departments and dept_emp with all columns as properties. now trying to create relationship between them.

CALL apoc.periodic.iterate("
load csv with headers from 'file:///dept_emp.csv' as row return row",
"match(de:dept_emp)
match(d:departments)
where de.dept_no=row.dept_no and d.dept_no= row.dept_no
merge (de)-[:BELONGS_TO]->(d)",{batchSize:10000, parallel:false})

when I try to run this it is taking ages to complete(many days). When I change the batch size to 10 it created 331603 relations, but it kept on running until it completes all the batches which is taking too long. When it encounters 9 different dept_no at initial rows in dept_emp.csv it is creating all the relations but it has to complete all the batches. In each batch it has to scan all the 331603 relations which were create in first two batches or so. Please help me with optimizing this.

Here I have used apoc.periodic.iterate to deal with the huge data in future, here how the data is related and how I am trying to establish the relation is making the problem . Each department will be having many dept_emp nodes connected.

currently using Neo4j 4.2.1 version
max heap size is 1G due to my laptop limitations.

Edit:
I do have indexes on dept_emp(dept_no) and departments(dept_no)

match(de:dept_emp)
match(d:departments)
where de.dept_no= d.dept_no
merge (de)-[:BELONGS_TO]->(d)

This alone will work for now, it takes around 16 seconds to run with my config. This is not what I am looking for. This is not feasible for huge data.

dana_canzano · ‎02-09-2021

do you have indexes on :dept_emp and/or :departments?

cerdo_fruta · ‎02-09-2021

I do have indexes on dept_emp(dept_no) and departments(dept_no)

Neo4j

Skiping relationship creation if already exist, not about MERGE