Complicated Cypher Query

kashu94
Node Clone

I'm very new to neo4j and hence this question, I have my data in the format:

movie_id  cast

1     

[{'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}

where movie_id is the ID of the movie and cast contains a multiple dictionaries consisting of different casts and their information in the form of key value pairs. I want to form relationship between NODE Movie and NODE Cast (for unique 'cast_id', rest all like 'cast_name' would be node attributes. 

For example, in the above example,

Node Movie (movie_id=1) --- ACTED_BY --- Node Cast (cast_id=14, cast_name='Tom Hanks' )

Node Movie (movie_id=1) --- ACTED_BY --- Node Cast (cast_id=15, cast_name='Tim Allen' )

since there are two dictionaries ....  Please help. I'm very new to this.

1 ACCEPTED SOLUTION

My data is in the CSV format. Since I only need 'cast_id' and 'cast_name' from the dictionaries, I've converted them to separate lists as column values like for example:

movie_id  cast_id  cast_name

   1           [5,7,9]   ["Holland","Sasha","David"]

   2           [3,4]      ["Claire","Sarah"]

So, now I would be needing relationships as follows:

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=5, actor_name="Holland")

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=7, actor_name="Sasha")

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=9, actor_name="David")

Node Movie (movie_id=2) --- ACTED_BY --- Node Actor (actor_id=3, actor_name="Claire")

and so on ....

I've written this cypher query and it works. 

 

LOAD CSV WITH HEADERS FROM 'http://localhost:11001/project-8e34d75b-2470-4b61-9877-96e86541b67e/sample.csv' AS line
MERGE (m:Movie{movie_id:line.movie_id})
WITH m,line,split(substring(line.cast_id, 1, size(line.cast_id)-2), ',') AS actor_ids,split(substring(line.cast_name, 1, size(line.cast_name)-2), ',') AS actor_names
UNWIND range(0,size(actor_ids)-1) AS i
MERGE (act:Actor {actor_id:actor_ids[i]}) SET act.actor_name=actor_names[i]
MERGE (m)-[:ACTED_BY]->(act)
 
So, the length of cast_id and cast_name lists are equal in different rows.

View solution in original post

2 REPLIES 2

What have you tried so far?  Do you have a JSON file?  You might take a look at apoc.load.json to get the data into Neo4j.  

You'll need cypher to use UNWIND on the array and create a node for each dictionary in the array. 

My data is in the CSV format. Since I only need 'cast_id' and 'cast_name' from the dictionaries, I've converted them to separate lists as column values like for example:

movie_id  cast_id  cast_name

   1           [5,7,9]   ["Holland","Sasha","David"]

   2           [3,4]      ["Claire","Sarah"]

So, now I would be needing relationships as follows:

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=5, actor_name="Holland")

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=7, actor_name="Sasha")

Node Movie (movie_id=1) --- ACTED_BY --- Node Actor (actor_id=9, actor_name="David")

Node Movie (movie_id=2) --- ACTED_BY --- Node Actor (actor_id=3, actor_name="Claire")

and so on ....

I've written this cypher query and it works. 

 

LOAD CSV WITH HEADERS FROM 'http://localhost:11001/project-8e34d75b-2470-4b61-9877-96e86541b67e/sample.csv' AS line
MERGE (m:Movie{movie_id:line.movie_id})
WITH m,line,split(substring(line.cast_id, 1, size(line.cast_id)-2), ',') AS actor_ids,split(substring(line.cast_name, 1, size(line.cast_name)-2), ',') AS actor_names
UNWIND range(0,size(actor_ids)-1) AS i
MERGE (act:Actor {actor_id:actor_ids[i]}) SET act.actor_name=actor_names[i]
MERGE (m)-[:ACTED_BY]->(act)
 
So, the length of cast_id and cast_name lists are equal in different rows.