cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Multiple Nodes in 1 column as target

lipchean
Node Link

Hi,

3X_6_f_6fafb4a4d3a1ae6bd2d3ba15bbbf70a135356323.png
The CSV shown above is the result of a manual manipulation from the original csv.

I then proceed to enter the following cypher.
LOAD CSV WITH HEADERS FROM 'file:///Data_3.csv' AS row
MERGE (m:Module {ID: row.ID, RefBy1:COALESCE(row.RefBy1,"NA") , RefBy2: COALESCE(row.RefBy2,"NA")})
return m

Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy1
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent

Followed by
MATCH (child: Module)
MATCH (parent: Module) where parent.ID = child.RefBy2
MERGE (child)-[:RefBy]->(parent)
RETURN child, parent

The original CSV file is
3X_e_d_ed3688b3569c8fdf5c9a18c22024f0d6ede157d0.png

Do you have any suggestion how i can create a single script to create the same model i showed above. I have tried using UNWIND, as shown below but nothing gets created
LOAD CSV WITH HEADERS FROM 'file:///Data_1.csv' AS row
WITH row.ID AS impID, SPLIT(row.RefBy, ",") AS multiRef
UNWIND multiRef as impRef
MATCH (LI:LineItem {LIName: impID})
MATCH (RB:ReferenceBy {RBName:TRIM(impRef)})
MERGE (LI)-[rel:RefBy]->(RB)
return LI, RB

Any suggestions? Thanks in advance.

Cheers,
LC

1 ACCEPTED SOLUTION

Try this. It creates the nodes first, then processes the file again to create the relationships.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
UNION ALL
load CSV with headers FROM 'file:///Data_1.csv' AS row
with row
where row.RefBy is not null
match(n:Module{ID:row.ID})
with n, split(row.RefBy, ",") as references
unwind references as reference
match(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)

Test Data:
3X_e_8_e82c8bccd42f647340730d38f845fdcacbd2be6b.png

Result:

View solution in original post

8 REPLIES 8

lipchean
Node Link

The model should look like this:

Try this. It creates the nodes first, then processes the file again to create the relationships.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
UNION ALL
load CSV with headers FROM 'file:///Data_1.csv' AS row
with row
where row.RefBy is not null
match(n:Module{ID:row.ID})
with n, split(row.RefBy, ",") as references
unwind references as reference
match(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)

Test Data:
3X_e_8_e82c8bccd42f647340730d38f845fdcacbd2be6b.png

Result:

Hi @glilienfield

Your solution works. I do have a follow up question, i.e. why is UNION ALL required? Is there a way where both row and reference are created from a single LOAD?

Thanks

hi @glilienfield

To make things even more complicated, how would you utilize the 'Value' column, such that the script will return the path with the largest aggregated value (should be A > C > D, because A = 2, C = 10, D = 3 for a total of 15)

3X_4_9_491d195a908bb34da83fe6eb4bdda39625bc5e0e.png

Thanks,
LC

This seems to work too.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}

Thanks @glilienfield
I'll try this too

First update the query to consume the value attribute. Store it as an attribute of the node.

load CSV with headers FROM 'file:///Data_1.csv' AS row
merge(n:Module{ID:row.ID, value:toInteger(row.value)})
with n, row
call {
with n, row
with n, split(row.RefBy, ",") as references
unwind references as reference
merge(m:Module{ID:reference})
merge(n)-[:RefBy]->(m)
}

You can then sum up the values along each path using the 'reduce' operation on the collection of nodes along each path, calculating the path's rank metric.

match p=(n:Module)-[:RefBy*]->(m:Module)
with nodes(p) as nodes
return [i in nodes | i.ID] as pathNodes, reduce(s=0, i in nodes | s + i.value) as rank
order by rank desc
limit 1

3X_6_9_693a4076d93130408c1ab563741d106100401c48.png

Note, the query does find intermediate paths, i.e. A->C. You can add a constraint to remove these if they are not needed and you only want full paths. I did not do so, since they would never have the largest rank since your values are all positive. Let me know if you may have negative values and want to eliminate these as a potential result.

You can run the queries separately or concatenate them to achieve the result in one operation.

Thanks for your very fast response @glilienfield , i'll try it out.