cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Alternatives : Creating some what of a tree diagram per column

Reuben
Graph Buddy

Please I am creating somewhat of a tree diagram for columns only-from a csv file. From the code I have written so far, it worked, but then I am looking for a shorter way to execute the same thing. any suggestions.

graph.png

Screenshot 2022-09-27 at 11.40.23.png

  

load csv with headers from 'file:///company.csv'as row
with row where row.Id is not null
with row where row.Location is not null
 
// create a mainsource node
merge(d:Dataframe{name:"Data Ware House"})
// create sub-caegories in the dataware house
merge(s1:Company{companyId:"Company Id"})
merge(s2:Cname{companyname:"Company Name"})
merge(s3:Clocation{companylocation:"Company Location"})
merge(s4:Cemail{companyemail:"Company email"})
merge(s5:Cbusinesstype{businesstype:"Company Business Type"})
// connects the subsections to the main node
merge(s1)-[r1:is_in]->(d)
merge(s2)-[r2:is_in]->(d)
merge(s3)-[r3:is_in]->(d)
merge(s4)-[r4:is_in]->(d)
merge(s5)-[r5:is_in]->(d)
//show the compositions of s1-s5 in a node form
merge(ci:company{companyId:row.Id})
merge(ci)-[r1ci:available_in]->(s1)
merge(cn:cname{companyname:row.Name})
merge(cn)-[r2cn:available_in]->(s2)
merge(cL:cLocation{companyLocation:row.Location})
merge(cL)-[r3cL:available_in]->(s3)
merge(cE:cEmail{companyEmail:row.Email})
merge(cE)-[r4cE:available_in]->(s4)
merge(ct:ctype{companyEmail:row.Email})
merge(ct)-[r5ct:available_in]->(s5)
3 ACCEPTED SOLUTIONS

The one comment I have is that creating the data-warehouse and its sub-categories is repeated for each row in the excel file. it does not change each time since it is not a function of the row data. As such, it can be moved above the 'load csv', so it is only executed once.  

The rest of the query from line 13 on is correct for an import such as yours.

merge(d:Dataframe{name:"Data Ware House"})
merge(s1:Company{companyId:"Company Id"})
merge(s2:Cname{companyname:"Company Name"})
merge(s3:Clocation{companylocation:"Company Location"})
merge(s4:Cemail{companyemail:"Company email"})
merge(s5:Cbusinesstype{businesstype:"Company Business Type"})
merge(s1)-[r1:is_in]->(d)
merge(s2)-[r2:is_in]->(d)
merge(s3)-[r3:is_in]->(d)
merge(s4)-[r4:is_in]->(d)
merge(s5)-[r5:is_in]->(d)
with d,s1,s2,s3,s4,s5
load csv with headers from 'file:///company.csv'as row
with row,d,s1,s2,s3,s4,s5 
where row.Id is not null and row.Location is not null
merge(ci:company{companyId:row.Id})
merge(ci)-[r1ci:available_in]->(s1)
merge(cn:cname{companyname:row.Name})
merge(cn)-[r2cn:available_in]->(s2)
merge(cL:cLocation{companyLocation:row.Location})
merge(cL)-[r3cL:available_in]->(s3)
merge(cE:cEmail{companyEmail:row.Email})
merge(cE)-[r4cE:available_in]->(s4)
merge(ct:ctype{companyEmail:row.Email})
merge(ct)-[r5ct:available_in]->(s5)


 

View solution in original post

1. Do you want to skip the entire row if any of the row properties are null? If so, you can use a predicate like this:

where all(i in keys(row) where row[i] is not null)

2. Another option is to use the coalesce() method to set properties with a null value to a default value. https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce

3. You can use the apoc 'do' family of methods to conditionally execute cypher statements. In your case, you could check for if the property is null and only set the property if not null. 

 
Sorry, I don't understand your second question. Can you expand, or give me an example? 

View solution in original post

That looks like it is your s1 (companyId) node. You can select which property is displayed. The selection is by node label. Click on the node. You should get the details shown on the right. Click on the node’s label shown. You will be presented with a small pop up window. You can select what gets displayed at the bottom. 

View solution in original post

7 REPLIES 7

The one comment I have is that creating the data-warehouse and its sub-categories is repeated for each row in the excel file. it does not change each time since it is not a function of the row data. As such, it can be moved above the 'load csv', so it is only executed once.  

The rest of the query from line 13 on is correct for an import such as yours.

merge(d:Dataframe{name:"Data Ware House"})
merge(s1:Company{companyId:"Company Id"})
merge(s2:Cname{companyname:"Company Name"})
merge(s3:Clocation{companylocation:"Company Location"})
merge(s4:Cemail{companyemail:"Company email"})
merge(s5:Cbusinesstype{businesstype:"Company Business Type"})
merge(s1)-[r1:is_in]->(d)
merge(s2)-[r2:is_in]->(d)
merge(s3)-[r3:is_in]->(d)
merge(s4)-[r4:is_in]->(d)
merge(s5)-[r5:is_in]->(d)
with d,s1,s2,s3,s4,s5
load csv with headers from 'file:///company.csv'as row
with row,d,s1,s2,s3,s4,s5 
where row.Id is not null and row.Location is not null
merge(ci:company{companyId:row.Id})
merge(ci)-[r1ci:available_in]->(s1)
merge(cn:cname{companyname:row.Name})
merge(cn)-[r2cn:available_in]->(s2)
merge(cL:cLocation{companyLocation:row.Location})
merge(cL)-[r3cL:available_in]->(s3)
merge(cE:cEmail{companyEmail:row.Email})
merge(cE)-[r4cE:available_in]->(s4)
merge(ct:ctype{companyEmail:row.Email})
merge(ct)-[r5ct:available_in]->(s5)


 

 I got it, Thank You.

Reuben
Graph Buddy

Please permit me to ask another question.

(1) Assuming I have more than 100 columns and there are many null values in each column, do I need to always use " where row. (item) is not null" for each column or there is a method similar to apply or applymap for dataframes. 

(2) What causes the absence of names on the node even though it is indicated? For example with reference to the example above.

Once again Thank you @glilienfield  

1. Do you want to skip the entire row if any of the row properties are null? If so, you can use a predicate like this:

where all(i in keys(row) where row[i] is not null)

2. Another option is to use the coalesce() method to set properties with a null value to a default value. https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce

3. You can use the apoc 'do' family of methods to conditionally execute cypher statements. In your case, you could check for if the property is null and only set the property if not null. 

 
Sorry, I don't understand your second question. Can you expand, or give me an example? 

Thanks for the clarification! Please with regards to the second question, I have attached this image. This image is from the earlier question on the query. Though I assigned names to the nodes, not all the names were displayed. So, I would like to ask what I might have missed or probably did wrong. 

graph.png

That looks like it is your s1 (companyId) node. You can select which property is displayed. The selection is by node label. Click on the node. You should get the details shown on the right. Click on the node’s label shown. You will be presented with a small pop up window. You can select what gets displayed at the bottom. 

Thank you for the help!