cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How do we import a csv file, when the dataset in it is having null values in few columns? I would not want to ignore those rows as I need them to be in my graph. Neo4j desktop version=1.4.5,Neo4j Browser version: 4.3.1,Neo4j Server version: 4.3.0

How do we import a csv file, when the dataset in it is having null values in few columns? I would not want to ignore those rows as I need them to be in my graph.
Neo4j desktop version=1.4.5,Neo4j Browser version: 4.3.1,Neo4j Server version: 4.3.0.

17 REPLIES 17

ddarmon
Node Clone

Take a look at this blog post.

ameyasoft
Graph Maven

Use COALESCE(row.col1, 'NA').

@ameyasoft thanks for the reply, I am bit new to coding in cypher and Neo4j too. Could you explain this function through its use cases and examples so that I'll get better clarity.
Thanks again !

The function coalesce receives an unspecified number of arguments as inputs and returns the first input that is not null. Only if all entries are null then it returns null. Some examples:

  • coalesce (null) = null
  • coalesce (null, 2) = 2
  • coalesce (null, null, 'a') = 'a'

Okay @elena.kohlwey ,will try to implement based on these. If I get any error I'll get back.Thanks!

Guys, I couldn't still resolve the issue. The problem is that the datasheet is quite sparse and also some of the data fields are having one-to-many relations with the other fields. Can't understand working on such type of data.
Aim is to find the recommendation for a particular node in Data_field_1.


I am the same person with the account name shailesh.n.

Try this:
LOAD CSV WITH HEADERS FROM "file:/xxx.csv" AS row

MERGE (a:Data {entityName: COALESCE(row.colB, 'NA'), FHIR: COALESCE(row.colC, 'NA'), attributeValue: COALESCE(row.colD, 'NA'), category: COALESCE(row.colE, 'NA'), class: COALESCE(row.col3F, 'NA')})

What about relationships between any 2 fields, should I use : MERGE (p)-[:RELATION {name:row.r1}]->(p1) or MERGE (p)-[:RELATION {name: COALESCE (row.r1,'NA')}]->(p1) ;where r1 is the relation.

But I am facing a problem ,We are getting NA values in properties on all the nodes and relationships.


And only 5 nodes and 10 relationships are created ,whereas in this case I need to get all the nodes and relationships.
please check the following link to understand the datasheet more clearly: KG_final.csv - Google Drive

Please share your LOAD CSV query that generated the above graph.

This is the query I wrote to generate the above graph, @ameyasoft .
LOAD CSV WITH HEADERS FROM "file:///KG_final.csv" AS row
MERGE (a:Data_field1 {entityName: COALESCE(row.Name1, 'NA'), FHIR: COALESCE(row.FHIR_Expression1, 'NA'), attributeValue: COALESCE(row.Value1, 'NA'), category: COALESCE(row.Category1, 'NA'), class: COALESCE(row.Class1, 'NA')})
MERGE (b:Data_field2 {entityName: COALESCE(row.Name2, 'NA'), FHIR: COALESCE(row.FHIR_Expression2, 'NA'), attributeValue: COALESCE(row.Value2, 'NA'), category: COALESCE(row.Category2, 'NA'), class: COALESCE(row.Class2, 'NA')})
MERGE (c:Issue {entityName: COALESCE(row.Issue_name, 'NA'), severity: COALESCE(row.Issue_Severity, 'NA'), severity_value: COALESCE(row.Issue_severity_value, 'NA')})
MERGE (d:Risks {entityName: COALESCE(row.Risk_name, 'NA'), severity: COALESCE(row.Risk_Severity, 'NA'), severity_value: COALESCE(row.Risk_severity_value, 'NA')})
MERGE (e:Recommendation {entityName: COALESCE(row.Rec_name, 'NA'), type: COALESCE(row.Recommendation_type, 'NA'), severity: COALESCE(row.Rec_severity, 'NA'), severity_value: COALESCE(row.Rec_severity_value, 'NA'), category: COALESCE(row.Category_Rec, 'NA'), class: COALESCE(row.Class_Rec, 'NA')})
MERGE (a)-[:RELATION1 {name:COALESCE(row.Relation1, 'NA')}]->(b)
MERGE (b)-[:RELATION2 {name:COALESCE(row.Relation2, 'NA')}]->(c)
MERGE (c)-[:RELATION3 {name:COALESCE(row.Relation3, 'NA')}]->(d)
MERGE (d)-[:RELATION4 {name:COALESCE(row.Relation4, 'NA')}]->(e)
Also please take a look into the datasheet which I share above.

Data Column Headers:
Node:Data_Field1 Attributes:Name1,FHIR_Expression1,Value1,Category1,Class1
Node:Data_Field2 Attributes:Name2,FHIR_Expression2,Value2,Category2,Class2
Node:Issue Attributes: Issue_name, Issue_Severity, Issue_severity_value
Node:Risks Attributes:Risk_name,Risk_Severity,Risk_severity_value
Node:Recommendation Attributes: Category_Rec,Rec_severity,Rec_severity_value,Class_Rec,Rec_name,Recommendation_type

Relations:
Relation1(Data_Field1->Data_Field2), Relation2(Data_Field2->Issue) ,Relation3(Issue->Risks), Relation4(Risks->Recommendation)

I used your Cypher and imported data from your .csv file. It created 45 labels, created 45 nodes, set 251 properties. There are lots of rows with no values except for recommendation/risk. In these situations, the nodes are getting created with 'NA' values. Please check your data. Here is the screen shot of the graph that is generated from your code.

Sure, I'll check whether there are any mistakes in the data itself.
Could you also share the cypher query used to generate this graph? so that I can know what's causing the exact error.

Here is the query that I used:

LOAD CSV WITH HEADERS FROM "file:///KG_final.csv" AS row
MERGE (a:Data_field1 {entityName: COALESCE(row.Name1, 'NA'), FHIR: COALESCE(row.FHIR_Expression1, 'NA'), attributeValue: COALESCE(row.Value1, 'NA'), category: COALESCE(row.Category1, 'NA'), class: COALESCE(row.Class1, 'NA')})
MERGE (b:Data_field2 {entityName: COALESCE(row.Name2, 'NA'), FHIR: COALESCE(row.FHIR_Expression2, 'NA'), attributeValue: COALESCE(row.Value2, 'NA'), category: COALESCE(row.Category2, 'NA'), class: COALESCE(row.Class2, 'NA')})
MERGE (c:Issue {entityName: COALESCE(row.Issue_name, 'NA'), severity: COALESCE(row.Issue_Severity, 'NA'), severity_value: COALESCE(row.Issue_severity_value, 'NA')})
MERGE (d:Risks {entityName: COALESCE(row.Risk_name, 'NA'), severity: COALESCE(row.Risk_Severity, 'NA'), severity_value: COALESCE(row.Risk_severity_value, 'NA')})
MERGE (e:Recommendation {entityName: COALESCE(row.Rec_name, 'NA'), type: COALESCE(row.Recommendation_type, 'NA'), severity: COALESCE(row.Rec_severity, 'NA'), severity_value: COALESCE(row.Rec_severity_value, 'NA'), category: COALESCE(row.Category_Rec, 'NA'), class: COALESCE(row.Class_Rec, 'NA')})
MERGE (a)-[:RELATION1 {name:COALESCE(row.Relation1, 'NA')}]->(b)
MERGE (b)-[:RELATION2 {name:COALESCE(row.Relation2, 'NA')}]->(c)
MERGE (c)-[:RELATION3 {name:COALESCE(row.Relation3, 'NA')}]->(d)
MERGE (d)-[:RELATION4 {name:COALESCE(row.Relation4, 'NA')}]->(e)
Added 45 labels, created 45 nodes, set 251 properties, created 49 relationships

Okay @ameyasoft , let me see if I can tweak this and get the desired results, else it should be a problem with the input data mostly.