Neo4j

dariusaudryc1 · ‎05-24-2019

Hi,

I want to ask whether cypher may miss merging relationship during csv load.

So I want to load csv data into neo4j using the following query:

Using Periodic Commit 10000
    Load CSV with headers from 'file:///{filename}' as row
    FIELDTERMINATOR '|'

MATCH (c:user{user_id: row.user_id} )
MATCH (d:service_provider{sp_id: row.sp_id} )
MATCH (b:trx{order_id: row.order_id} )
MERGE (c)-[:creates]->(b)
MERGE (d)-[:completes]->(b)

//user device information
WITH d, c, row
OPTIONAL MATCH (c_deviceid:device_id {device_id: row.user_device_id})
FOREACH (i IN CASE WHEN c_deviceid IS NOT NULL THEN [c_deviceid] ELSE [] END |
	MERGE (c)-[:associates]-(c_deviceid)
)
WITH d, c, row
OPTIONAL MATCH (c_d1:d1{{d1: row.user_d1}})
FOREACH (i IN CASE WHEN c_d1 IS NOT NULL THEN [c_d1] ELSE [] END |
	MERGE (c)-[:associates]-(c_d1)
)
WITH d, c, row
OPTIONAL MATCH (c_devtoken:devtoken{{devtoken: row.user_devtoken}})
FOREACH (i IN CASE WHEN c_devtoken IS NOT NULL THEN [c_devtoken] ELSE [] END |
	MERGE (c)-[:associates]-(c_devtoken)
)

//sp device information
WITH d, row
OPTIONAL MATCH (d_deviceid:device_id {device_id: row.sp_device_id})
FOREACH (i IN CASE WHEN d_deviceid IS NOT NULL THEN [d_deviceid] ELSE [] END |
	MERGE (d)-[:associates]-(d_deviceid)
)
WITH d, row
OPTIONAL MATCH (d_d1:d1{{d1: row.sp_d1}})
FOREACH (i IN CASE WHEN d_d1 IS NOT NULL THEN [d_d1] ELSE [] END |
	MERGE (d)-[:associates]-(d_d1)
)
WITH d, row
OPTIONAL MATCH (d_devtoken:devtoken{{devtoken: row.sp_devtoken}})
FOREACH (i IN CASE WHEN d_devtoken IS NOT NULL THEN [d_devtoken] ELSE [] END |
	MERGE (d)-[:associates]-(d_devtoken)
)
;

This CQL only merges relationship since the node creation is done on separate CQL script. So the issue that I have is that after running this script, relationships with devtoken and d1 are not created while device_id is created. There is no error in the creation process.

Maybe what I want to understand more is how the load csv works. My current assumption is that the load csv will take the data per row and apply the graph data schema on the SQL. My guess is that the relationship is not created due to the optional match.

Can someone give some light on this problem?

jsmccrumb · ‎05-24-2019

Can you confirm that it is finding some of each of those nodes that are not being related?

LOAD CSV WITH HEADERS FROM 'file:///{filename}' as row
FIELD TERMINATOR '|'
MATCH (c:user{user_id: row.user_id} )
MATCH (d:service_provider{sp_id: row.sp_id} )
MATCH (b:trx{order_id: row.order_id} )
MATCH (c_devtoken:devtoken{{devtoken: row.user_devtoken}})
RETURN * LIMIT 3

This uses match to require the :devtoken node can be found, and returns first 3 that it can find from the file.
Then do the same thing replacing the c_devtoken line with
MATCH (d_d1:d1{{d1: row.sp_d1}})

and then
MATCH (d_devtoken:devtoken{{devtoken: row.sp_devtoken}})

My first thought is there is a disconnect between the CSV file and the properties on the node (extra space or type difference) so it cannot find the nodes you want to build the relationships to, whereas for :device_id it does find the nodes which is why that segment works.

dariusaudryc1 · ‎05-24-2019

Hey Jacob, thanks for the reply.

Yes I have confirmed it that it can be query. The csv file that I used to defined the node is also the same csv file to define the relationship. So I have this huge csv file for both.

What I did is that I just separate them. I tried casting all of them as a string. but the result remain the same.

One thing though, the devtoken and d1 have null value in it. I assume that the issue can arise if the load csv function in cypher reads the whole column as one rather than going row per row.

jsmccrumb · ‎05-24-2019

Could you post the query you used to create the nodes?

The null values are because it cannot match a node for that row, it could be a simple thing as a typo between the query to create the nodes and the query to create the relationships. Since the one relationship is being created but not the others, I would lean towards a minor typo between the two queries (this happens very often to me when I am writing queries...).

ameyasoft · ‎05-24-2019

Try adding WITH row before the first MATCH statement

Load CSV with headers from 'file:///{filename}' as row
FIELDTERMINATOR '|'

WITH row

MATCH (c:user{user_id: row.user_id} )
..........

Neo4j

Relationship Load Failure