cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Avoiding duplicate Links

I have a bunch of connected activities (nodes) in CSV format, one file containing the activities, and another the connections. I have no problem creating the nodes, but I just cannot get the links created without duplicates.

Activities/Nodes

ResourceName	Min	Mode	Max
Case_Start	0	0.3	9.8
Create_Delivery	0	0	0
Create_Quotation	0	0	0
Create_Sales_Order_Item	0	0	0

Cypher to create the nodes:

LOAD CSV WITH HEADERS FROM "file:///ACTIVITIES_O2C.csv" AS row
CREATE (a:Activity {Name:row.ResourceName, Min: toFloat(row.Min), Mode: toFloat(row.Mode), Max: toFloat(row.Max), Cost: toFloat(row.CostRate)})

Connections/Links

ConnectorName	StartingActivity	EndingActivity	LinkProbability	Min	Mode	Max
Case_Start::Create_Sales_Order_Item	Case_Start	Create_Sales_Order_Item	70.38	0.00	0.00	0.00
Case_Start::Create_Delivery	Case_Start	Create_Delivery	24.77	0.00	0.00	0.00
Case_Start::Create_Quotation	Case_Start	Create_Quotation	4.84	0.00	0.00	0.00

Cypher to create the links:

LOAD CSV WITH HEADERS FROM "file:///CONNECTIONS_O2C.csv" AS row
MATCH (lft { Name: row.StartingActivity })
MATCH (rgt { Name: row.EndingActivity })
MERGE (lft)-[:FEEDS { Likelihood: toFloat(row.LinkProbability), Min: toFloat(row.Min), Mode: toFloat(row.Mode), Max: toFloat(row.Max) }]->(rgt)

The issue is that I get links created between all the nodes and even some circular links.

I know I'm doing something incorrectly. Just need help with the correct Cypher.

3 REPLIES 3

Try this for creating the links:

LOAD CSV WITH HEADERS FROM "file:///CONNECTIONS_O2C.csv" AS row
MATCH (lft { Name: row.StartingActivity })
MATCH (rgt { Name: row.EndingActivity })
MERGE (lft)-[r:FEEDS]->(rgt)
SET r.Likelihood = toFloat(row.LinkProbability), 
r.Min = toFloat(row.Min), 
r.Mode = toFloat(row.Mode),
r.Max = toFloat(row.Max)

The MERGE clause takes the whole statement into account to determine if a match is made. I suspect that as your data loads, there are multiple entries in your connection file that would map a start node to an end node. In the cypher I wrote, it would look up to see if there's already a relationship between the two nodes and if there is, it's going to update the relationship instead of creating a second relationship.

As far as the circular paths, I would validate the source data again. The fact that you're getting multiple relationships between nodes and circular paths, I would double check how the CSVs are being generated.

Thanks Mike, but that doesn't solve the problem. And, no, I do not have duplicates in the data file. If after loading the data I run the Cypher statement

MATCH p=(:Activity [Name: 'Case_Start'])-[r:FEEDS]->() RETURN p

I get the following result

Actually, turns out I was wrong all this time. The graph created is perfectly correct. Thanks for the challenge Mike.