Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-15-2022 06:26 PM - edited 08-15-2022 06:29 PM
Hi
I am a newbie to neo4j, I would like to load a csv file and create node and relationship from that data
The csv looks like this
Task | sub_task1 | sub_task2 | sub_task3 | Expertise |
A | A1 | A2 | A3 | Expert |
B | B1 | B2 | Novice | |
C | C1 | Novice |
Task and subtask1, and Expertise cells can't be empty
I would like to create
A [has_child] A1 [has_child] A2 [has_child] A3
B [has_child] B1 [has_child] B2
C [has_child] C1
I was able to achieve that but I couldn't do the following
For each row, create a relationship between Expertise and the last nonempty cell with a property "Expert",or "Novice" depending on the data
So there should be a relationship created
between Expertise node and A3 with type "Expert"
between Expertise node and B2 with type "Novice"
between Expertise node and C1 with type "Novice"
I would really appreciate any help in this
Thanks
Solved! Go to Solution.
08-16-2022 09:17 PM
I was able to come up with something. I used some APOC procedures to make it easier to set the labels dynamically. Now that the query depends on APOC, you may want to leverage the 'apoc.do.case' procedure to replace the call subqueries that create the relationship type based on the value of expertise. It would be cleaner.
load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:sub_task1{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex
call apoc.cypher.doIt("merge(n:sub_task" + (maxIndex + 1) + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Expert'
merge (subtask)-[:EXPERT]->()
}
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Novice'
merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:sub_task" + (index + 1) + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:sub_task" + (index + 2) + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)
The reason that the 'with' clause is repeated is because the cypher parser complains that the 'with' clause must be 'simple' when it is followed by a 'where' clause. The workaround is to have a second 'with' clause, so the first one is simple and the second one is paired with the 'where' clause.
08-18-2022 07:56 PM - edited 08-18-2022 07:56 PM
Unfortunately you can not create node labels nor relationship types dynamically. They must be explicitly specified. That is why I used apoc methods to do that. There other relevant apoc methods for creating nodes and relations where you can pass the labels and type. Look at the apoc.create family of procedures.
I don’t know of tutorials that discuss the approaches I used. I have been programming algorithms for some time as an engineer. I will say a good place to start us to understand all cypher list related stuff, such as list expressions, list comprehension, and list predicates. There is similar stuff for maps too.
https://neo4j.com/docs/cypher-manual/current/syntax/lists/
https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-list
https://neo4j.com/docs/cypher-manual/current/functions/predicate/
https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-map
https://neo4j.com/docs/cypher-manual/current/syntax/maps/
This reference card is very helpful
08-16-2022 05:10 AM
Since you stated you achieved the first part, I assume you successfully sealed with the empty cells and know the last node in the row. Given that node, you can create the relationship using an apoc procedure, as it lets you specify the relationship type as a string.
https://neo4j.com/labs/apoc/4.1/overview/apoc.create/apoc.create.relationship/
You can also achieve the same result using only cypher, but it is more difficult because cypher does not allow you to specify a relationship’s type as a parameter when creating a relationship. You can get around this by using a series of subqueries as follows.
call{
with n, m, expertise
with n, m, expertise
where expertise = ‘Expert’
merge (n)-[:EXPERT]->(m)
}
call {
with n, m, expertise
with n, m, expertise
where expertise = ‘Novice’
merge (n)-[:NOVICE]->(m)
}
08-16-2022 02:38 PM - edited 08-16-2022 02:44 PM
Thank you so much for your reply. I haven't really started learning apoc, i am just first trying to get familiar with cypher before i move to apoc.
I think the problem i have is the way i am finding the last non empty cell. The following is the code i used, I am getting the correct parent child relationship between task-subtask1-subtask2-sub_task3. The problem is I am getting a relationship created with expertise and a the last non empty cell and all its parents which i don't want. I would really appreciate it if you can give me any advice on my code . Thanks
The output that I want is
A<-[:CHILD_OF]-A1<-[:CHILD_OF]-A2<-[:CHILD_OF]-A3-[:EXPERT]->expertise
B<-[:CHILD_OF]-B1<-[:CHILD_OF]-B2-[:NOVICE]->expertise
C<-[:CHILD_OF]-C1-[:NOVICE]->expertise
08-16-2022 03:32 PM
Do you really want each subtask to have their own specific label, i.e. S1, S2, and S3, or can they all be a Task or Subtask instead? Is there are need for them to be separate labels? Typically they would be the same. The questions is whether you need to identify them differently than a Task, such as a Subtask.
08-16-2022 04:37 PM - edited 08-16-2022 04:38 PM
I worked on it assuming the subtasks had a label of 'Task.' It can be changed to 'Subtask' in my query. The query has be written entirely differently if you want each subtask to have a different label.
The query may be a little complicated, as I assumed you can have zero to three subtasks. I also did not try to simplify it any; I just tried to get something that works.
Basically, it first creates the Task node, the last subtask node (if one exists), and creates the relationship from the last task node and the expertise node. That is lines 1-23.
Lines above 23 create the remaining subtasks and link them together.
You can add more subtask columns to your spreadsheet. You will just need to add them in the array on line 2.
I did it using pure cypher since you have not started using APOC.
load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:Task{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, subtasks[size(subtasks)-1] as lastSubTask
call{
with lastSubTask, expertise
with lastSubTask, expertise
where expertise = 'Expert'
merge(m:Task{name: lastSubTask})
MERGE (m)-[:EXPERT]->()
}
call{
with lastSubTask, expertise
with lastSubTask, expertise
where expertise = 'Novice'
merge(m:Task{name: lastSubTask})
merge(m)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
merge(n:Task{name: subtasks[index]})
merge(m:Task{name: subtasks[index+1]})
merge(n)-[:HAS_CHILD]->(m)
08-16-2022 05:30 PM
Wow thank u sooo much for your help. Your code has taught me quite abit and no relationship at all to my poor attempt. I now feel i only scratched the surface with all the tutorials i have done.
I ran the code, it works perfectly. I definitely got the main idea of what you did but I am going to spend more time really understanding it. I really like the idea of subtasks[size(subtasks)-1] , didn't know u can that
I do need to have each sub_task as a label on its own, ie A1 is a node with label "sub_task1", not sure how i can modify your code to incorporate it cause if i understand the code correctly i have no idea which subtask the "lastSubTask" is from. I would really appreciate if u can help with this part , but if you can't then the amount of help already u have is greatly appreciated. I would appreciate also if u can explain why "with lastSubTask, expertise" is repeated twice, i have seen that before, never figured out why (i previously assumed it was a typo error)
Thanks again for all your help and time 🙂
08-16-2022 09:17 PM
I was able to come up with something. I used some APOC procedures to make it easier to set the labels dynamically. Now that the query depends on APOC, you may want to leverage the 'apoc.do.case' procedure to replace the call subqueries that create the relationship type based on the value of expertise. It would be cleaner.
load csv with headers from "file:///Book2.csv" as row
with row.Task as task, [row.sub_task1, row.sub_task2, row.sub_task3] as subtasks, row.Expertise as expertise
with task, expertise, [x in subtasks where x is not null] as subtasks
merge(t:Task{name: task})
with t, expertise, subtasks
where size(subtasks) > 0
merge(s:sub_task1{name: subtasks[0]})
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex
call apoc.cypher.doIt("merge(n:sub_task" + (maxIndex + 1) + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Expert'
merge (subtask)-[:EXPERT]->()
}
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Novice'
merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:sub_task" + (index + 1) + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:sub_task" + (index + 2) + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)
The reason that the 'with' clause is repeated is because the cypher parser complains that the 'with' clause must be 'simple' when it is followed by a 'where' clause. The workaround is to have a second 'with' clause, so the first one is simple and the second one is paired with the 'where' clause.
08-17-2022 02:16 PM
Thank u so much, it worked perfectly. The best part i have learned from your code is how u manipulated subtasks and used it as an array, i didn't know u can do that. i need to look at more complex examples and advanced tutorials and need to learn how to use apoc. I am also curious if it is possible to pick up the headers and create an array similiar to subtasks and apply the same tricks u did instead of using apoc because that will remove any restrictions on the naming of those headers and they don't need to have a pattern. I will definitely look into this Thank u for all your effort and time, greatly appreciated
08-18-2022 11:27 AM - edited 08-18-2022 01:00 PM
Yes, you can get the headers if you don't use 'with headers' and manipulate the row data a bit. He is a snippet of code you can use in the beginning of your query. It captures the subtask columns so you can use it for your labels, it then begins the import after skipping the first line (which has the header info) and extracts the stuff you need for the rest of the query I gave earlier.
load csv from "file:///Book2.csv" as header
with header[1..4] as subTaskHeaders
limit 1
load csv from "file:///Book2.csv" as row
with subTaskHeaders, row[0] as task, row[4] as expertise, [x in row[1..4] where x is not null] as nonNullSubtasks
skip 1
return task, nonNullSubtasks, expertise, subTaskHeaders
Sample output:
Merging the above with the earlier query, you get:
load csv from "file:///Book2.csv" as header
with header[1..4] as subTaskHeaders
limit 1
load csv from "file:///Book2.csv" as row
with subTaskHeaders, row[0] as task, row[4] as expertise, [x in row[1..4] where x is not null] as subtasks
skip 1
merge(t:Task{name: task})
with t, expertise, subtasks, subTaskHeaders
where size(subtasks) > 0
call apoc.cypher.doIt("merge(n:" + subTaskHeaders[0] + "{name: $name}) return n", {name: subtasks[0]}) yield value
with t, expertise, subtasks, subTaskHeaders, value.n as s
merge(t)-[:HAS_CHILD]->(s)
with expertise, subtasks, size(subtasks)-1 as maxIndex, subTaskHeaders
call apoc.cypher.doIt("merge(n:" + subTaskHeaders[maxIndex] + "{name: $name}) return n", {name: subtasks[maxIndex]}) yield value
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Expert'
merge (subtask)-[:EXPERT]->()
}
call{
with expertise, value
with expertise, value.n as subtask
where expertise = 'Novice'
merge(subtask)-[:NOVICE]->()
}
with subtasks, range(0, size(subtasks)-2) as indexes, subTaskHeaders
where size(subtasks) > 1
unwind indexes as index
call apoc.cypher.doIt("merge(node:" + subTaskHeaders[index] + "{name: $name}) return node", {name: subtasks[index]}) yield value as n
call apoc.cypher.doIt("merge(node:" + subTaskHeaders[index+1] + "{name: $name}) return node", {name: subtasks[index+1]}) yield value as m
with n.node as n_node, m.node as m_node
merge(n_node)-[:HAS_CHILD]->(m_node)
08-18-2022 04:27 PM
Oh wow, thank u again 🙂. I really didn't want you to waste more of your time but thank u so much. I yesterday did figure out how to extract the headers but after a few hours gave up cause couldn't find a way to dynamically create labels from headers. eg if headers[0]= task, then something like merge(t:headers[0] {name: task}) but I didn't try using apoc as u did with the subtasks, i will try your code and also try using apoc to create the Task node with label from the header. All the cypher tutorials i have done don't show the complex things that u have demonstrated are there any tutorials that u can recommend that is of the level of code that u wrote?
Seriously thanks again for all you help
08-18-2022 07:56 PM - edited 08-18-2022 07:56 PM
Unfortunately you can not create node labels nor relationship types dynamically. They must be explicitly specified. That is why I used apoc methods to do that. There other relevant apoc methods for creating nodes and relations where you can pass the labels and type. Look at the apoc.create family of procedures.
I don’t know of tutorials that discuss the approaches I used. I have been programming algorithms for some time as an engineer. I will say a good place to start us to understand all cypher list related stuff, such as list expressions, list comprehension, and list predicates. There is similar stuff for maps too.
https://neo4j.com/docs/cypher-manual/current/syntax/lists/
https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-list
https://neo4j.com/docs/cypher-manual/current/functions/predicate/
https://neo4j.com/docs/cypher-manual/current/syntax/operators/#query-operators-map
https://neo4j.com/docs/cypher-manual/current/syntax/maps/
This reference card is very helpful
08-18-2022 08:06 PM
Thanks again for all the help, will definitely check out the links u sent and will start trying to get more familiar with apoc. To be honest i didn't expect much help here, first time for me post here but it has been great
Thanks again and best wishes
All the sessions of the conference are now available online