Neo4j

Reuben · ‎09-26-2022

Please I am creating somewhat of a tree diagram for columns only-from a csv file. From the code I have written so far, it worked, but then I am looking for a shorter way to execute the same thing. any suggestions.

load csv with headers from 'file:///company.csv'as row

with row where row.Id is not null

with row where row.Location is not null

// create a mainsource node

merge(d:Dataframe{name:"Data Ware House"})

// create sub-caegories in the dataware house

merge(s1:Company{companyId:"Company Id"})

merge(s2:Cname{companyname:"Company Name"})

merge(s3:Clocation{companylocation:"Company Location"})

merge(s4:Cemail{companyemail:"Company email"})

merge(s5:Cbusinesstype{businesstype:"Company Business Type"})

// connects the subsections to the main node

merge(s1)-[r1:is_in]->(d)

merge(s2)-[r2:is_in]->(d)

merge(s3)-[r3:is_in]->(d)

merge(s4)-[r4:is_in]->(d)

merge(s5)-[r5:is_in]->(d)

//show the compositions of s1-s5 in a node form

merge(ci:company{companyId:row.Id})

merge(ci)-[r1ci:available_in]->(s1)

merge(cn:cname{companyname:row.Name})

merge(cn)-[r2cn:available_in]->(s2)

merge(cL:cLocation{companyLocation:row.Location})

merge(cL)-[r3cL:available_in]->(s3)

merge(cE:cEmail{companyEmail:row.Email})

merge(cE)-[r4cE:available_in]->(s4)

merge(ct:ctype{companyEmail:row.Email})

merge(ct)-[r5ct:available_in]->(s5)

glilienfield · ‎09-27-2022

The one comment I have is that creating the data-warehouse and its sub-categories is repeated for each row in the excel file. it does not change each time since it is not a function of the row data. As such, it can be moved above the 'load csv', so it is only executed once.

The rest of the query from line 13 on is correct for an import such as yours.

merge(d:Dataframe{name:"Data Ware House"})
merge(s1:Company{companyId:"Company Id"})
merge(s2:Cname{companyname:"Company Name"})
merge(s3:Clocation{companylocation:"Company Location"})
merge(s4:Cemail{companyemail:"Company email"})
merge(s5:Cbusinesstype{businesstype:"Company Business Type"})
merge(s1)-[r1:is_in]->(d)
merge(s2)-[r2:is_in]->(d)
merge(s3)-[r3:is_in]->(d)
merge(s4)-[r4:is_in]->(d)
merge(s5)-[r5:is_in]->(d)
with d,s1,s2,s3,s4,s5
load csv with headers from 'file:///company.csv'as row
with row,d,s1,s2,s3,s4,s5 
where row.Id is not null and row.Location is not null
merge(ci:company{companyId:row.Id})
merge(ci)-[r1ci:available_in]->(s1)
merge(cn:cname{companyname:row.Name})
merge(cn)-[r2cn:available_in]->(s2)
merge(cL:cLocation{companyLocation:row.Location})
merge(cL)-[r3cL:available_in]->(s3)
merge(cE:cEmail{companyEmail:row.Email})
merge(cE)-[r4cE:available_in]->(s4)
merge(ct:ctype{companyEmail:row.Email})
merge(ct)-[r5ct:available_in]->(s5)

View solution in original post

glilienfield · ‎09-27-2022

1. Do you want to skip the entire row if any of the row properties are null? If so, you can use a predicate like this:

where all(i in keys(row) where row[i] is not null)

2. Another option is to use the coalesce() method to set properties with a null value to a default value. https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce

3. You can use the apoc 'do' family of methods to conditionally execute cypher statements. In your case, you could check for if the property is null and only set the property if not null.

https://neo4j.com/labs/apoc/4.1/overview/apoc.do/

Sorry, I don't understand your second question. Can you expand, or give me an example?

View solution in original post

glilienfield · ‎09-28-2022

That looks like it is your s1 (companyId) node. You can select which property is displayed. The selection is by node label. Click on the node. You should get the details shown on the right. Click on the node’s label shown. You will be presented with a small pop up window. You can select what gets displayed at the bottom.

View solution in original post

glilienfield · ‎09-27-2022

The one comment I have is that creating the data-warehouse and its sub-categories is repeated for each row in the excel file. it does not change each time since it is not a function of the row data. As such, it can be moved above the 'load csv', so it is only executed once.

The rest of the query from line 13 on is correct for an import such as yours.

merge(d:Dataframe{name:"Data Ware House"})
merge(s1:Company{companyId:"Company Id"})
merge(s2:Cname{companyname:"Company Name"})
merge(s3:Clocation{companylocation:"Company Location"})
merge(s4:Cemail{companyemail:"Company email"})
merge(s5:Cbusinesstype{businesstype:"Company Business Type"})
merge(s1)-[r1:is_in]->(d)
merge(s2)-[r2:is_in]->(d)
merge(s3)-[r3:is_in]->(d)
merge(s4)-[r4:is_in]->(d)
merge(s5)-[r5:is_in]->(d)
with d,s1,s2,s3,s4,s5
load csv with headers from 'file:///company.csv'as row
with row,d,s1,s2,s3,s4,s5 
where row.Id is not null and row.Location is not null
merge(ci:company{companyId:row.Id})
merge(ci)-[r1ci:available_in]->(s1)
merge(cn:cname{companyname:row.Name})
merge(cn)-[r2cn:available_in]->(s2)
merge(cL:cLocation{companyLocation:row.Location})
merge(cL)-[r3cL:available_in]->(s3)
merge(cE:cEmail{companyEmail:row.Email})
merge(cE)-[r4cE:available_in]->(s4)
merge(ct:ctype{companyEmail:row.Email})
merge(ct)-[r5ct:available_in]->(s5)

Reuben · ‎09-27-2022

I got it, Thank You.

Reuben · ‎09-27-2022

Please permit me to ask another question.

(1) Assuming I have more than 100 columns and there are many null values in each column, do I need to always use " where row. (item) is not null" for each column or there is a method similar to apply or applymap for dataframes.

(2) What causes the absence of names on the node even though it is indicated? For example with reference to the example above.

Once again Thank you @glilienfield

glilienfield · ‎09-27-2022

1. Do you want to skip the entire row if any of the row properties are null? If so, you can use a predicate like this:

where all(i in keys(row) where row[i] is not null)

2. Another option is to use the coalesce() method to set properties with a null value to a default value. https://neo4j.com/docs/cypher-manual/current/functions/scalar/#functions-coalesce

3. You can use the apoc 'do' family of methods to conditionally execute cypher statements. In your case, you could check for if the property is null and only set the property if not null.

https://neo4j.com/labs/apoc/4.1/overview/apoc.do/

Sorry, I don't understand your second question. Can you expand, or give me an example?

Reuben · ‎09-28-2022

Thanks for the clarification! Please with regards to the second question, I have attached this image. This image is from the earlier question on the query. Though I assigned names to the nodes, not all the names were displayed. So, I would like to ask what I might have missed or probably did wrong.

glilienfield · ‎09-28-2022

That looks like it is your s1 (companyId) node. You can select which property is displayed. The selection is by node label. Click on the node. You should get the details shown on the right. Click on the node’s label shown. You will be presented with a small pop up window. You can select what gets displayed at the bottom.

Reuben · ‎09-28-2022

Thank you for the help!

Neo4j

Alternatives : Creating some what of a tree diagram per column