Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-30-2019 01:37 AM
We have written a sightly complex LOAD CSV
script to load a few million lines. Unfortunately, I can't post the script as is, but each line results in two nodes, where one has an extra label (set with apoc.create.addLabel
), an edge between the two nodes, and optionally two more edges to existing nodes. We also have implemented custom triggers to set created_by
, created_at
, updated_by
, and updated_at
for all created/updated nodes and edges.
When I run the script with LIMIT
up to 10000, it succeeds, but with USING PERIODIC COMMIT
even as low as 100, it fails with OutOfMemoryError.
We are trying to read the CSV file as provided (no preprocessing). We could of course read the files in another way, but none the less I would like to understand the behaviour of USING PERIODIC COMMIT
.
For solution, I am considering:
apoc.periodic.commit
and pass in values to SKIP
and LIMIT
(if this behaves different from USING PERIODIC COMMIT
).Any input is greatly valued!
Best regards,
Øyvind Matheson Wergeland
01-30-2019 05:34 AM
It very likely turns into an EAGER operation where cypher pulls in the whole file through each step, basically disabling periodic commit.
Usually apoc.periodic.iterate solves it for you putting the load csv into the driving statement and the remainder into the action statement.
If possible share the EXPLAiN output of your statement.
01-30-2019 07:17 AM
Thank you for the reply, and the link to more information about EAGER. (I had just found another post referring to problems with LOAD CSV and EAGER, without explaining what it actually means.) Profiling the script indeed reports an execution plan with several EAGER operations. There are about 100 steps, so I don't know how useful it would be to post it.
We are doing a large initial import, while daily (or so) updates are quite small, but using the same format, so we're using the same script as a basis for both.
I will try running the initial import with apoc.periodic.commit
.
Best regards,
Øyvind Matheson Wergeland.
01-31-2019 03:33 AM
Use apoc.periodic.iterate
that works better for imports.
01-31-2019 04:20 AM
I was just going to post that we successfully imported using apoc.periodic.iterate
, but it was somewhat slow (1 hour for 6,5 million lines). We managed to optimise the query somewhat, but were not able to get rid of all EAGER.
The EAGERs I have not been able to get rid of, comes from that we're trying to detect name changes, and preserve historic names. A distilled version of the script follows:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:/companies.csv' AS line
MERGE (c:company { id:line.id })
SET
c.name = line.name,
c.type = line.type
// [removed code]
// Detect name change
WITH c,line
OPTIONAL MATCH (c)-[r:named { is_active:true }]->(name:name)
WHERE name.name <> line.name
SET
r.end_date = date(),
r.is_active = false
WITH c,line
MERGE (name:companyName { name:line.name })
MERGE (c)-[r:named]->(name)
ON CREATE SET
r.start_date = date(),
r.is_active = true
Could this be a case where we should read the CSV file in two passes - first create the names, and then just use MATCH in the second pass?
01-31-2019 01:43 PM
yes either multi-pass or use periodic iterate -> what was the exact call you used?
are you sure you have indexes/constraints for company(id)
and :companyName(name)
? (odd choice of labels?)
01-31-2019 11:30 PM
Yes, there are unique constraint on both properties. And the labels and properties have been renamed from the actual script, so I see why you find them a bit odd.
I ran it this way:
CALL apoc.periodic.iterate(
"LOAD CSV WITH HEADERS FROM 'file:/companies.csv' AS line RETURN line",
<import script>,
{ batchSize: 10000, parallel: false }
);
I assume the best would be to refactor the script to get rid of the two EAGERs, but it is running within acceptable time, as we will only have to do the initial import once (per environment).
02-01-2019 05:55 AM
Hmm looks ok to me.
It should do iterateList:true
automatically.
perhaps you can try this:
EXPLAIN
WITH {} AS line
MERGE (c:company { id:line.id })
SET
c.name = line.name,
c.type = line.type
// [removed code]
// Detect name change
WITH c,line
OPTIONAL MATCH (c)-[r:named { is_active:true }]->(name:name)
WHERE name.name <> line.name
SET
r.end_date = date(),
r.is_active = false
WITH c,line
MERGE (name:companyName { name:line.name })
MERGE (c)-[r:named]->(name)
ON CREATE SET
r.start_date = date(),
r.is_active = true
(with your setup and share the query plan).
02-01-2019 06:29 AM
Of course:
+----------------------------------------------------------------------+
| Plan | Statement | Version | Planner | Runtime | Time |
+----------------------------------------------------------------------+
| "EXPLAIN" | "WRITE_ONLY" | "CYPHER 3.5" | "COST" | "SLOTTED" | 390 |
+----------------------------------------------------------------------+
+------------------------------+----------------+---------------------------+---------------------------------------------+
| Operator | Estimated Rows | Identifiers | Other |
+------------------------------+----------------+---------------------------+---------------------------------------------+
| +ProduceResults | 1 | r, line, name, name, r, c | |
| | +----------------+---------------------------+---------------------------------------------+
| +EmptyResult | 1 | r, line, name, name, r, c | |
| | +----------------+---------------------------+---------------------------------------------+
| +Apply | 1 | r, line, name, name, r, c | |
| |\ +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply | 1 | name, r, c | |
| | |\ +----------------+---------------------------+---------------------------------------------+
| | | +SetProperty | 1 | name, r, c | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +SetProperty | 1 | name, r, c | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateRelationship | 1 | name, r, c | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +Argument | 1 | name, c | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply | 1 | name, r, c | |
| | |\ +----------------+---------------------------+---------------------------------------------+
| | | +Optional | 1 | name, r, c | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +ActiveRead | 0 | name, r, c | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +Expand(Into) | 0 | name, r, c | (c)-[r:named]->(name) |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +LockNodes | 1 | name, c | c, name |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +Argument | 1 | name, c | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Optional | 1 | name, r, c | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead | 0 | name, r, c | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Expand(Into) | 0 | name, r, c | (c)-[r:named]->(name) |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Argument | 1 | name, c | |
| | +----------------+---------------------------+---------------------------------------------+
| +Apply | 1 | line, name, name, r, c | |
| |\ +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply | 1 | name, line | |
| | |\ +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateNode | 1 | name, line | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +Argument | 1 | line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Optional | 1 | name, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead | 0 | name, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Filter | 0 | name, line | `name`.name = line.name |
| | | +----------------+---------------------------+---------------------------------------------+
| | +NodeByLabelScan | 1 | name, line | :companyName |
| | +----------------+---------------------------+---------------------------------------------+
| +Eager | 1 | name, r, c, line | |
| | +----------------+---------------------------+---------------------------------------------+
| +Apply | 1 | name, r, c, line | |
| |\ +----------------+---------------------------+---------------------------------------------+
| | +SetProperty | 1 | name, r, c, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +SetProperty | 1 | name, r, c, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Eager | 1 | name, r, c, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +OptionalExpand(All) | 1 | name, r, c, line | (c)-[r:named]->(name); `r`.is_active = true |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Argument | 1 | c, line | |
| | +----------------+---------------------------+---------------------------------------------+
| +Eager | 1 | c, line | |
| | +----------------+---------------------------+---------------------------------------------+
| +SetProperty | 1 | c, line | |
| | +----------------+---------------------------+---------------------------------------------+
| +SetProperty | 1 | c, line | |
| | +----------------+---------------------------+---------------------------------------------+
| +Apply | 1 | c, line | |
| |\ +----------------+---------------------------+---------------------------------------------+
| | +AntiConditionalApply | 1 | c, line | |
| | |\ +----------------+---------------------------+---------------------------------------------+
| | | +MergeCreateNode | 1 | c, line | |
| | | | +----------------+---------------------------+---------------------------------------------+
| | | +Argument | 1 | line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Optional | 1 | c, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +ActiveRead | 0 | c, line | |
| | | +----------------+---------------------------+---------------------------------------------+
| | +Filter | 0 | c, line | c.id = line.id |
| | | +----------------+---------------------------+---------------------------------------------+
| | +NodeByLabelScan | 4910263 | c, line | :company |
| | +----------------+---------------------------+---------------------------------------------+
| +Projection | 1 | line | {line : {}} |
+------------------------------+----------------+---------------------------+---------------------------------------------+
03-17-2019 09:10 AM
Looks as if you forgot to create a constraint for :company(id)
Same for companyName(name)
Please do that first and then report back.
It does a full label scan twice in your statement, which totally adds up.
All the sessions of the conference are now available online