cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Creating a relationship to form tree structure

Hello all,

I've created a (p:childnode) -[:HAS_CHILD]-(c:childnode) relationship to form a tree structure from my database. I have the same label name 'childnode' for both child and parent node since a child can also be a parent. An instance of the data looks like this

child_id | parent_id
   39    |    1000
   40    |     39
   41    |     40
   42    |     39

I used the following query to create the relationship

MATCH
  (c:childnode {childId: childId}),
  (p:childnode {parentId: parentId})

MERGE (p)-[:HAS_CHILD]->(c)

When I implement this query, every parent is connected to its child (parent->child) and forms disconnected clusters so I'm not able to traverse to the second level. For example, when I try to return a path from 39 to 41, I get no changes, no records as result.

The expected result is to obtain a tree structure so that I will be able to apply the path-finding algorithms to it.

Thanks in advance

1 ACCEPTED SOLUTION

Hi,
Thinking about your data, I would approach it a bit differently (and hopefully correctly, but in the land of the blind, the one-eyed is king). So here goes.
When you create a node (childnode) it either has property keys of parent id parent name as well as child name and child id. So you are describing the nodes of their relationships but not actually describing themselves. As check how many childnodes are there in your database. It should be the unique number of individuals. I fear you may be creating two nodes for every individual one as a parent and a second as a child and they should be one entity.

One approach is that each entity has two properties an ID and a name and those keys only describe that entity (node). Then when you create the relationship, :HAS_CHILD in this case all you need to create are those one to one relationships.

By way of example my data CSV file looked like this

From there I loaded each cpc as a node with the field as an ID. Then I loaded the parent as a cpc node with its field as an ID to capture any that where not in the first list. So I only have cpc nodes at this time. Then I create the relationship by matching on the two fields. This what I mean by letting the database handle the child relationship ID.
Is this of help?
Andy

View solution in original post

13 REPLIES 13

andy_hegedus
Graph Fellow

Hi,

In your match cause make sure you have a variable length relationship such as [:HAS_CHILD*0..2]. This will have 0,1,2 hops of the relationship type [:HAS_CHILD] use the values to set the range you are seeking.
Andy

I have tried this solution before, the problem is every child node with childId is linked to its respective parent node with parentId. When I try to return it using variable-length I don't get an answer because the intermittent node acts only as a child. The expectation is that it should act both as child and a parent. In the below screenshot all the child nodes are intermittent nodes.

The central node is 39 acting as a parent and all other nodes are children nodes.
As I have created the relationship as (parent)-[:has_child]->(child) it doesn't show the children's of the child node.

Hmmm?

I just tried a query on a database with similar structure and got the expected result. So I am wondering if something is different in the construction of your initial data set. For the query you have, can you click on one of the child nodes and expand it. I am interested into what Neo4J thinks are the connections.
Andy

I tried expanding the child nodes and none of them expanded. I am not able to understand what this means because, when I try the same query for parent id '40', I get a graph similar to above with its child nodes.

My guess is that the relationships where not created. When I created my database which is a large tree structure (250K+ nodes), one thing I did a bit different was in the creation of the relationships, perhaps. I notice that you have explicit property keys that state parent_id and child_id. Instead of that scheme, I created nodes with just unique IDs and then created the relationships, but there is not explicit property key attached to the node with that information, I just let the relationship do that. For example if a node has multiple children then attaching a singular property key with child ID does make sense since there are many. May I ask what commands you used in the creation of the database?
Andy

Sure, I used the following query to load and create the child and parent nodes

--Creating index on tax_id------
CREATE INDEX FOR (c:childnode) ON (c.childid)

--First stage of loading the csv-----
:auto using periodic commit 
5000 load csv with headers from 'file:///node.csv' 
as row with toInteger(row.child_id) as childid, row.child_name as childname 

merge (c:childnode {childid: childid, c_name: childname})

--Second stage of load csv----
:auto using periodic commit 5000 
load csv with headers from 'file:///node.csv' 
as row with toInteger(row.parent_id) as parentid, row.parent_name as parentname

merge (p:childnode {parentid: parentid, p_name: parentname })

--Final stage of load csv. Creating the relationship-----
:auto using periodic commit 5000 load csv with headers from 'file:///node.csv' 
as row with toInteger(row.parent_id) as parentid, toInteger(row.childid) as childid, row.child_name as childname, row.parent_name as parentname

MATCH
  (c:childnode {childid: childid, c_name: childname}),
  (p:childnode {parentid: parentid, p_name: parentname})
MERGE (p) - [:HAS_CHILD]->(c)

I'm working on a large dataset, therefore, implementing the load in three stages. Also, I cannot understand the sentence " but there is no explicit property key attached to the node with that information, I just let the relationship do that" could you brief this a bit?

Best,
Ankita R

Hi,
Thinking about your data, I would approach it a bit differently (and hopefully correctly, but in the land of the blind, the one-eyed is king). So here goes.
When you create a node (childnode) it either has property keys of parent id parent name as well as child name and child id. So you are describing the nodes of their relationships but not actually describing themselves. As check how many childnodes are there in your database. It should be the unique number of individuals. I fear you may be creating two nodes for every individual one as a parent and a second as a child and they should be one entity.

One approach is that each entity has two properties an ID and a name and those keys only describe that entity (node). Then when you create the relationship, :HAS_CHILD in this case all you need to create are those one to one relationships.

By way of example my data CSV file looked like this

From there I loaded each cpc as a node with the field as an ID. Then I loaded the parent as a cpc node with its field as an ID to capture any that where not in the first list. So I only have cpc nodes at this time. Then I create the relationship by matching on the two fields. This what I mean by letting the database handle the child relationship ID.
Is this of help?
Andy

Ahhh, so my understanding is that I need to load the child nodes with label

:childnode and property childid. Then I should create parentnode with the same label :childnode and property childid but this time I should load the parentid column.
By doing this there will only be a single property childid for both parent and child nodes.
Am I correct?

It really isn't a childid or parentid, it is just an id. Then in each row is the expression of the relationship. The id in the parent field defines one end of the relationship and the child filed defines the other end.
Andy

I think I'm able to understand. I will try this solution and get back to you with the modified query.

Thanks a lot for your help.

Best,
Ankita R

andy_hegedus
Graph Fellow

Good luck.

Also make sure you create a constraint of the ID uniqueness. For example in case the text in the filed is called subgroup so to make each ID unique.

CREATE CONSTRAINT ON (a:cpc) ASSERT a.subgroup IS UNIQUE

Andy

Hello Andy,

So I created the constraint, the node, and the relationship based on your solution and it worked. I'm now able to get tree structures and traverse them.

Thank you so much...!!!!

Best,
Ankita R

andy_hegedus
Graph Fellow

Great!

Glad I could be of help.
Andy