cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Reading Relationships As Line From CSV

Please, I am trying to import a csv file into Neo4j desktop, which has a column that already defines the relationship between the two nodes in the file. Below is a simple example of what I am talking about:

Name       Action         Movie
Tom        DIRECTED       Mega Day
Harry      PRODUCED       Falling Sun

Now, I want to read in the csv file as line with cypher. Is it possible to read the relationship column as line as well, such that the nodes are automatically connected with each other via the relationship between them per line?

1 ACCEPTED SOLUTION

The "load csv" method loads each line and makes each column available, either with the column names if headers are included, or by index if not. Nothing is automatically created/connected by reading each line. You need to specify what you want to create and connect using cypher.

In your case, you are conditionally creating one of two relationships based on the Action column value. Cypher does not really have conditional workflow to support this directly, nor can you set a node label or relationship type with a variable.

One approach to implementing conditional logic is to use the family of 'do' methods in the apoc library.

Another way is to use 'call' subqueries that execute based on conditions. The following cypher uses this approach as a potential solution to your requirement:

load csv with headers from "file:///Book1.csv" as line
merge(a:Actor{name: line.Name})
merge(m:Movie{title: line.Movie})
with a, m, line
call {
    with a, m, line
    with a, m, line
    where line.Action = "DIRECTED"
    merge(a)-[:DIRECTED]->(m)
}
call {
    with a, m, line
    with a, m, line
    where line.Action = "PRODUCED"
    merge(a)-[:PRODUCED]->(m)
}

In the above query, each row of the csv file is read and assigned to the variable 'line.' Then the actor and movie nodes are merged, which will match to them if they already exist or create them if they do not exist. Each is bound to a variable, 'a' for actor and 'm' for movie.

A 'call' subquery is used to conditional execute the creation of the relationship between the actor and movie nodes based on the line.Action value. Each call subquery will execute per 'line' and only the one with 'true' condition will execute. The use of two 'with' statements is a work around, as the parser complains that the 'with' in a 'call' clause has to be 'simple', i.e. can't have a 'where' clause or operations. Including the second 'with' clause gets around this, by allowing the first 'with' clause to be 'simple.'

Result:

Does this help?

View solution in original post

5 REPLIES 5

The "load csv" method loads each line and makes each column available, either with the column names if headers are included, or by index if not. Nothing is automatically created/connected by reading each line. You need to specify what you want to create and connect using cypher.

In your case, you are conditionally creating one of two relationships based on the Action column value. Cypher does not really have conditional workflow to support this directly, nor can you set a node label or relationship type with a variable.

One approach to implementing conditional logic is to use the family of 'do' methods in the apoc library.

Another way is to use 'call' subqueries that execute based on conditions. The following cypher uses this approach as a potential solution to your requirement:

load csv with headers from "file:///Book1.csv" as line
merge(a:Actor{name: line.Name})
merge(m:Movie{title: line.Movie})
with a, m, line
call {
    with a, m, line
    with a, m, line
    where line.Action = "DIRECTED"
    merge(a)-[:DIRECTED]->(m)
}
call {
    with a, m, line
    with a, m, line
    where line.Action = "PRODUCED"
    merge(a)-[:PRODUCED]->(m)
}

In the above query, each row of the csv file is read and assigned to the variable 'line.' Then the actor and movie nodes are merged, which will match to them if they already exist or create them if they do not exist. Each is bound to a variable, 'a' for actor and 'm' for movie.

A 'call' subquery is used to conditional execute the creation of the relationship between the actor and movie nodes based on the line.Action value. Each call subquery will execute per 'line' and only the one with 'true' condition will execute. The use of two 'with' statements is a work around, as the parser complains that the 'with' in a 'call' clause has to be 'simple', i.e. can't have a 'where' clause or operations. Including the second 'with' clause gets around this, by allowing the first 'with' clause to be 'simple.'

Result:

Does this help?

Thanks so much Gary for this comprehensive answer to my question. This is exactly what I am looking for. With this, I can more easily create graphs from csv file with pre-defined relationships. Thanks a lot man!

Hi Gary. Thanks again for your suggestion. While I still think it is the solution, I tried to implement it with my actual dataset, but got an error. Here is my actual cypher statement:

load csv with headers from "file:///pixar.csv" as line
merge (p:Person{name:line.name})
merge (m:Movie{title:line.film})
with p, m, line
call {
    with p, m, line
    with p, m, line
    where line.role_type = "DIRECTED"
    merge (p)-[:DIRECTED]->(m)    
}
call {
    with p, m, line
    with p, m, line
    where line.role_type = "PRODUCED"
    merge (p)-[:PRODUCED]->(m)    
}
call {
    with p, m, line
    with p, m, line
    where line.role_type = "WAS_SCREEN_WRITER_FOR"
    merge (p)-[:WAS_SCREEN_WRITER_FOR]->(m)    
}
call {
    with p, m, line
    with p, m, line
    where line.role_type = "WAS_STORY_WRITER_FOR"
    merge (p)-[:WAS_STORY_WRITER_FOR]->(m)    
}
call {
    with p, m, line
    with p, m, line
    where line.role_type = "PERFORMED_SOUNDTRACK_FOR"
    merge (p)-[:PERFORMED_SOUNDTRACK_FOR]->(m)    
}

Running that gave me the error message below:

Neo.ClientError.Statement.SyntaxError

Query cannot conclude with CALL (must be RETURN or an update clause) (line 29, column 1 (offset: 703))
"call {"

Please, what do you think?

 
 

This solution works if you know the relationships so  you can write the query for them. But in my case the relationships are unknown in advance, so how can I modify the query to handle this? So I guess something like

load csv with headers from "file:///Book1.csv" as line
merge(a:Actor{name: line.Name})
merge(m:Movie{title: line.Movie})
with a, m, line
call {
with a, m, line
with a, m, line
merge(a)-[:line.action]->(m)
}

but this doesn't work as is. Any help appreciated!

As a corollary, my csv file also has some properties for the relationship so ideally I need something like

... merge(a)-[:line.action {property:line.property}]->(m)

where property is the property name from the csv header row, and line.property is the specified value on the data line. Is such a thing possible?

Thank you!

I face the same problem. Is there any solution for this?