Neo4j

nawar_alwesh · ‎08-15-2022

Hi

I am a newbie to neo4j, I would like to load a csv file and create node and relationship from that data

The csv looks like this

Task	sub_task1	sub_task2	sub_task3	Expertise
A	A1	A2	A3	Expert
B	B1	B2		Novice
C	C1			Novice

Task and subtask1, and Expertise cells can't be empty

I would like to create

A [has_child] A1 [has_child] A2 [has_child] A3

B [has_child] B1 [has_child] B2

C [has_child] C1

I was able to achieve that but I couldn't do the following

For each row, create a relationship between Expertise and the last nonempty cell with a property "Expert",or "Novice" depending on the data

So there should be a relationship created

between Expertise node and A3 with type "Expert"

between Expertise node and B2 with type "Novice"

between Expertise node and C1 with type "Novice"

I would really appreciate any help in this

Thanks

glilienfield · ‎08-16-2022

I was able to come up with something. I used some APOC procedures to make it easier to set the labels dynamically. Now that the query depends on APOC, you may want to leverage the 'apoc.do.case' procedure to replace the call subqueries that create the relationship type based on the value of expertise. It would be cleaner.

load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:sub_task1{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex
call apoc.cypher.doIt("merge(n:sub_task" + (maxIndex + 1) + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Expert'
    merge (subtask)-[:EXPERT]->()
}
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Novice'     
    merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:sub_task" + (index + 1) + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:sub_task" + (index + 2) + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)

The reason that the 'with' clause is repeated is because the cypher parser complains that the 'with' clause must be 'simple' when it is followed by a 'where' clause. The workaround is to have a second 'with' clause, so the first one is simple and the second one is paired with the 'where' clause.

View solution in original post

glilienfield · ‎08-18-2022

Unfortunately you can not create node labels nor relationship types dynamically. They must be explicitly specified. That is why I used apoc methods to do that. There other relevant apoc methods for creating nodes and relations where you can pass the labels and type. Look at the apoc.create family of procedures.

I don’t know of tutorials that discuss the approaches I used. I have been programming algorithms for some time as an engineer. I will say a good place to start us to understand all cypher list related stuff, such as list expressions, list comprehension, and list predicates. There is similar stuff for maps too.

https://neo4j.com/docs/cypher-manual/current/syntax/lists/

https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-list

https://neo4j.com/docs/cypher-manual/current/functions/predicate/

https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-map

https://neo4j.com/docs/cypher-manual/current/syntax/maps/

This reference card is very helpful

https://neo4j.com/docs/cypher-refcard/current/

View solution in original post

glilienfield · ‎08-16-2022

Since you stated you achieved the first part, I assume you successfully sealed with the empty cells and know the last node in the row. Given that node, you can create the relationship using an apoc procedure, as it lets you specify the relationship type as a string.

https://neo4j.com/labs/apoc/4.1/overview/apoc.create/apoc.create.relationship/

You can also achieve the same result using only cypher, but it is more difficult because cypher does not allow you to specify a relationship’s type as a parameter when creating a relationship. You can get around this by using a series of subqueries as follows.

call{

with n, m, expertise

with n, m, expertise

where expertise = ‘Expert’

merge (n)-[:EXPERT]->(m)

}

call {

with n, m, expertise

where expertise = ‘Novice’

merge (n)-[:NOVICE]->(m)

}

nawar_alwesh · ‎08-16-2022

Thank you so much for your reply. I haven't really started learning apoc, i am just first trying to get familiar with cypher before i move to apoc.

I think the problem i have is the way i am finding the last non empty cell. The following is the code i used, I am getting the correct parent child relationship between task-subtask1-subtask2-sub_task3. The problem is I am getting a relationship created with expertise and a the last non empty cell and all its parents which i don't want. I would really appreciate it if you can give me any advice on my code . Thanks

The output that I want is

A<-[:CHILD_OF]-A1<-[:CHILD_OF]-A2<-[:CHILD_OF]-A3-[:EXPERT]->expertise

B<-[:CHILD_OF]-B1<-[:CHILD_OF]-B2-[:NOVICE]->expertise

C<-[:CHILD_OF]-C1-[:NOVICE]->expertise

CREATE (e:Expertise {name:'Expertise'})

LOAD CSV WITH HEADERS FROM ‘test.csv' AS row

WITH row

MATCH(e:Expertise)

WHERE row.Task IS NOT NULL

MERGE (t:Task {name: row.Task})

// if Subtask1 field is not empty, then create that node and create parent-child relationship to the Task

FOREACH (x1 in SPLIT(row.sub_task1 , ',') |

MERGE (s1:S1 {name:x1}) MERGE (t)-[:HAS_CHILD]->(s1)

// if Subtask2 field is not empty, then create that node and create parent-child relationship to the S1

FOREACH (x2 in SPLIT(row.sub_task2 , ',') |

MERGE (s2:S2 {name:x2}) MERGE (s1)-[:HAS_CHILD]->(s2)

// if Subtask3 field is not empty, then create that node and create parent-child relationship to the S2

FOREACH (x3 in SPLIT(row.sub_task3 , ',') |

MERGE (s3:S3 {name:x3}) MERGE (s2)-[:HAS_CHILD]->(s3)

// for each expertise, create the relationship to the S3 node (since S3 wasn’t empty)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Expert' THEN [1] ELSE [] END |

MERGE (s3)-[n:EXPERT]->(e)

)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Novice' THEN [1] ELSE [] END |

MERGE (s3)-[n:NOVICE]->(e)

)

) //S3

// for each expertise, create the relationship to the S2 node (since S2 wasn’t empty and S3 was empty)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Expert' THEN [1] ELSE [] END |

MERGE (s2)-[n:EXPERT]->(e)

)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Novice' THEN [1] ELSE [] END |

MERGE (s2)-[n:NOVICE]->(e)

)

)//S2

// for each expertise, create the relationship to the S1 node (since S1 wasn’t empty and S2 was empty)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Expert' THEN [1] ELSE [] END |

MERGE (s1)-[n:EXPERT]->(e)

)

FOREACH (ignoreMe in CASE WHEN row.Expertise = 'Novice' THEN [1] ELSE [] END |

MERGE (s1)-[n:NOVICE]->(e)

)

)//S1

RETURN c

glilienfield · ‎08-16-2022

Do you really want each subtask to have their own specific label, i.e. S1, S2, and S3, or can they all be a Task or Subtask instead? Is there are need for them to be separate labels? Typically they would be the same. The questions is whether you need to identify them differently than a Task, such as a Subtask.

glilienfield · ‎08-16-2022

I worked on it assuming the subtasks had a label of 'Task.' It can be changed to 'Subtask' in my query. The query has be written entirely differently if you want each subtask to have a different label.

The query may be a little complicated, as I assumed you can have zero to three subtasks. I also did not try to simplify it any; I just tried to get something that works.

Basically, it first creates the Task node, the last subtask node (if one exists), and creates the relationship from the last task node and the expertise node. That is lines 1-23.

Lines above 23 create the remaining subtasks and link them together.

You can add more subtask columns to your spreadsheet. You will just need to add them in the array on line 2.

I did it using pure cypher since you have not started using APOC.

load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:Task{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, subtasks[size(subtasks)-1] as lastSubTask
call{
    with lastSubTask, expertise
    with lastSubTask, expertise
    where expertise = 'Expert'
    merge(m:Task{name: lastSubTask})
    MERGE (m)-[:EXPERT]->()
}
call{
    with lastSubTask, expertise
    with lastSubTask, expertise
    where expertise = 'Novice'
    merge(m:Task{name: lastSubTask})       
    merge(m)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
merge(n:Task{name: subtasks[index]})
merge(m:Task{name: subtasks[index+1]})
merge(n)-[:HAS_CHILD]->(m)

nawar_alwesh · ‎08-16-2022

Wow thank u sooo much for your help. Your code has taught me quite abit and no relationship at all to my poor attempt. I now feel i only scratched the surface with all the tutorials i have done.

I ran the code, it works perfectly. I definitely got the main idea of what you did but I am going to spend more time really understanding it. I really like the idea of subtasks[size(subtasks)-1] , didn't know u can that

I do need to have each sub_task as a label on its own, ie A1 is a node with label "sub_task1", not sure how i can modify your code to incorporate it cause if i understand the code correctly i have no idea which subtask the "lastSubTask" is from. I would really appreciate if u can help with this part , but if you can't then the amount of help already u have is greatly appreciated. I would appreciate also if u can explain why "with lastSubTask, expertise" is repeated twice, i have seen that before, never figured out why (i previously assumed it was a typo error)

Thanks again for all your help and time 🙂

glilienfield · ‎08-16-2022

I was able to come up with something. I used some APOC procedures to make it easier to set the labels dynamically. Now that the query depends on APOC, you may want to leverage the 'apoc.do.case' procedure to replace the call subqueries that create the relationship type based on the value of expertise. It would be cleaner.

load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:sub_task1{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex
call apoc.cypher.doIt("merge(n:sub_task" + (maxIndex + 1) + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Expert'
    merge (subtask)-[:EXPERT]->()
}
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Novice'     
    merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:sub_task" + (index + 1) + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:sub_task" + (index + 2) + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)

The reason that the 'with' clause is repeated is because the cypher parser complains that the 'with' clause must be 'simple' when it is followed by a 'where' clause. The workaround is to have a second 'with' clause, so the first one is simple and the second one is paired with the 'where' clause.

nawar_alwesh · ‎08-17-2022

Thank u so much, it worked perfectly. The best part i have learned from your code is how u manipulated subtasks and used it as an array, i didn't know u can do that. i need to look at more complex examples and advanced tutorials and need to learn how to use apoc. I am also curious if it is possible to pick up the headers and create an array similiar to subtasks and apply the same tricks u did instead of using apoc because that will remove any restrictions on the naming of those headers and they don't need to have a pattern. I will definitely look into this Thank u for all your effort and time, greatly appreciated

glilienfield · ‎08-18-2022

Yes, you can get the headers if you don't use 'with headers' and manipulate the row data a bit. He is a snippet of code you can use in the beginning of your query. It captures the subtask columns so you can use it for your labels, it then begins the import after skipping the first line (which has the header info) and extracts the stuff you need for the rest of the query I gave earlier.

load csv from "file:///Book2.csv" as header
with header[1..4] as subTaskHeaders
limit 1
load csv from "file:///Book2.csv" as row
with subTaskHeaders, row[0] as task, row[4] as expertise, [x in row[1..4] where x is not null] as nonNullSubtasks
skip 1
return task, nonNullSubtasks, expertise, subTaskHeaders

Sample output:

Merging the above with the earlier query, you get:

load csv from "file:///Book2.csv" as header
with header[1..4] as subTaskHeaders
limit 1
load csv from "file:///Book2.csv" as row
with subTaskHeaders, row[0] as task, row[4] as expertise, [x in row[1..4] where x is not null] as subtasks
skip 1
merge(t:Task{name: task})
with t, expertise, subtasks, subTaskHeaders
where size(subtasks) > 0
call apoc.cypher.doIt("merge(n:" + subTaskHeaders[0] + "{name: $name}) return n", {name: subtasks[0]}) yield value
with t, expertise, subtasks, subTaskHeaders, value.n as s
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex, subTaskHeaders
call apoc.cypher.doIt("merge(n:" + subTaskHeaders[maxIndex] + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Expert'
    merge (subtask)-[:EXPERT]->()
}
call{
    with expertise, value
    with expertise, value.n as subtask
    where expertise = 'Novice'     
    merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes, subTaskHeaders
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:" + subTaskHeaders[index] + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:" + subTaskHeaders[index+1] + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)

nawar_alwesh · ‎08-18-2022

Oh wow, thank u again 🙂. I really didn't want you to waste more of your time but thank u so much. I yesterday did figure out how to extract the headers but after a few hours gave up cause couldn't find a way to dynamically create labels from headers. eg if headers[0]= task, then something like merge(t:headers[0] {name: task}) but I didn't try using apoc as u did with the subtasks, i will try your code and also try using apoc to create the Task node with label from the header. All the cypher tutorials i have done don't show the complex things that u have demonstrated are there any tutorials that u can recommend that is of the level of code that u wrote?

Seriously thanks again for all you help

glilienfield · ‎08-18-2022

Unfortunately you can not create node labels nor relationship types dynamically. They must be explicitly specified. That is why I used apoc methods to do that. There other relevant apoc methods for creating nodes and relations where you can pass the labels and type. Look at the apoc.create family of procedures.

I don’t know of tutorials that discuss the approaches I used. I have been programming algorithms for some time as an engineer. I will say a good place to start us to understand all cypher list related stuff, such as list expressions, list comprehension, and list predicates. There is similar stuff for maps too.

https://neo4j.com/docs/cypher-manual/current/syntax/lists/

https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-list

https://neo4j.com/docs/cypher-manual/current/functions/predicate/

https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-map

https://neo4j.com/docs/cypher-manual/current/syntax/maps/

This reference card is very helpful

https://neo4j.com/docs/cypher-refcard/current/

nawar_alwesh · ‎08-18-2022

Thanks again for all the help, will definitely check out the links u sent and will start trying to get more familiar with apoc. To be honest i didn't expect much help here, first time for me post here but it has been great

Thanks again and best wishes

Neo4j

Help: How to create relationship with last nonempty cell in a row