Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-09-2021 09:06 AM
I am new to neo4j, my data is in csv files trying load them in db and create relationships.
departments.csv(9 rows)
dept_emp.csv(331603 rows)
I have create nodes with labels departments and dept_emp with all columns as properties. now trying to create relationship between them.
CALL apoc.periodic.iterate("
load csv with headers from 'file:///dept_emp.csv' as row return row",
"match(de:dept_emp)
match(d:departments)
where de.dept_no=row.dept_no and d.dept_no= row.dept_no
merge (de)-[:BELONGS_TO]->(d)",{batchSize:10000, parallel:false})
when I try to run this it is taking ages to complete(many days). When I change the batch size to 10 it created 331603 relations, but it kept on running until it completes all the batches which is taking too long. When it encounters 9 different dept_no at initial rows in dept_emp.csv it is creating all the relations but it has to complete all the batches. In each batch it has to scan all the 331603 relations which were create in first two batches or so. Please help me with optimizing this.
Here I have used apoc.periodic.iterate to deal with the huge data in future, here how the data is related and how I am trying to establish the relation is making the problem . Each department will be having many dept_emp nodes connected.
currently using Neo4j 4.2.1 version
max heap size is 1G due to my laptop limitations.
Edit:
I do have indexes on dept_emp(dept_no) and departments(dept_no)
match(de:dept_emp)
match(d:departments)
where de.dept_no= d.dept_no
merge (de)-[:BELONGS_TO]->(d)
This alone will work for now, it takes around 16 seconds to run with my config. This is not what I am looking for. This is not feasible for huge data.
02-09-2021 11:14 AM
do you have indexes on :dept_emp and/or :departments?
02-09-2021 11:16 AM
I do have indexes on dept_emp(dept_no) and departments(dept_no)
All the sessions of the conference are now available online