Neo4j

pa0lo · ‎06-27-2019

The cypher below is hanging. The CSV has 180K rows

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///connectedEntity_00000.csv' AS row
MERGE(con:Connection {uen:row.UEN, name:row.SHORT_NM})
MERGE(e:Entity {id:row.ID, uen:row.UEN_1, legalName:row.LEGAL_NM, is_primary:row.PRIMARY_IND})
MERGE(c:Country {cty_code:row.CUST_CTRY_RISK})
MERGE(p:Product {prod_name:row.PRODUCT, auth_amt:row.AUTHAMT, ccy:row.CCY})
MERGE(i:Industry {ind_name:row.SIC_CA})
MERGE(e)-[:BELONGS_TO {conn_type:row.CONN_TYPE}]->(con)
MERGE (e)-[:HAS_PRODUCT]->(p)
MERGE (e)-[:IS_IN_INDUSTRY]->(i)
MERGE (e)-[:RISK_CTY]->(c)

andrew_bowman · ‎06-27-2019

Which version of Neo4j are you on?

Can you provide the query plan from an EXPLAIN of the query (please expand all elements of the plan using the double-down arrow in the lower-right corner of the result pane first)?

Also please give us the output of :schema. My initial guess is that you don't have the right indexes in place to support quick lookup, so you are likely doing up to 5 different label scans for each of those 180k rows, which would explain the slowdown.

pa0lo · ‎06-27-2019

Thanks Andrew. Here is a screenshot of the explain result

I haven't created any indexes. Should I create those before running the load or in the same statement?

andrew_bowman · ‎06-27-2019

You will need to create those before your load (and figure out if it makes sense to create indexes alone or unique constraints (which include an index), depending on what makes sense in your data model).

A MERGE is like a MATCH followed by a CREATE (provided the match failed), so these include lookups by default, and without indexes you were performing 5 label scans across the 5 labels of the MERGEd nodes, and that was happening per row of your 180k rows, so 900k total label scans, each row getting a bit more costly as nodes were added to the associated labels.

If we add the appropriate indexes or unique constraints you'll be executing 5 index lookups (hopefully unique) per row, much more efficient.

Also when you MERGE you should only merge on the properties associated with the indexes you want to use. So if we assume that `id is unique to :Entity nodes, then you might have a unique constraint on :Entity(id), so we only want to MERGE on that and use ON CREATE SET for the remaining properties:

...
MERGE(e:Entity {id:row.ID})
ON CREATE SET uen = row.UEN_1, legalName = row.LEGAL_NM, is_primary = row.PRIMARY_IND
...

pa0lo · ‎06-27-2019

Got it! I'll try it that way. Thanks for the quick response

Neo4j

Import CSV hanging