cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Load large CSV file with a list of value in node property

Hello, everyone.

I'm working to load a large CSV file (around 100MB) with 1M lines. I manage to load, but it takes around 2 and a half hour to complete the query.

I tried to create index to certain columns, but then I encounter an issue related to one of my node properties, which is an array of strings.

"Property value is too large to index" is the error that I received, and I think is tied to the property "parent_item", which in the file, it may contain more than 500 string values.

Query:

CREATE INDEX ON :BOM_COMP(item_id);
CREATE INDEX ON :BOM_COMP(item);
CREATE INDEX ON :BOM_COMP(parent_id);
CREATE INDEX ON :BOM_COMP(parent_item);

:auto USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM 'file:///items.csv' AS row
WITH DISTINCT row.Child_Item_ID as id,
row.Child_Item as item,
COLLECT(DISTINCT row.Parent_Item) as parent_item,
COLLECT(DISTINCT row.Parent_Item_ID) as parent_item_id
MERGE (i:BOM_COMP {item_id:id})
ON CREATE SET
i.item=item
i.parent_item=parent_item,
i.parent_item_id=parent_item_id;

Is there a way to optimize this load?

1 ACCEPTED SOLUTION

Bennu
Graph Fellow

Hi @david.casillas ,

You don't really need an index on parent_item yet. (We can discuss about the model of this property later). We need the one on item_id.

Try removing the index for a sec and lemme know.

View solution in original post

2 REPLIES 2

Bennu
Graph Fellow

Hi @david.casillas ,

You don't really need an index on parent_item yet. (We can discuss about the model of this property later). We need the one on item_id.

Try removing the index for a sec and lemme know.

Thanks, @Bennu.

Using only the index for parent_id solved the performance issue.