Neo4j

kashu94 · ‎11-17-2022

I'm using neo4j python driver for my project. My data is in CSV format, as show below. some fields like cast_ids and cast_names are lists, so I would like to have a connection from movie_id to corresponding casts. Additionally I was experimenting with having an edge property like 'count', which would get incremented everytime that particular NODE-RELATIONSHIP-NODE is found again. I'm not able to do so with the below code, please tell me what I'm doing wrong.

movie_id, title, genres, budget, revenue, director_name, director_id, cast_ids, cast_names

1, "Salsa", ['Drama','Music','Romance'], 889767, 887666, "Boaz Davidson", 345, [4,6,8,1], ['Robi','Carlos','Hannah','Laura']

glilienfield · ‎11-17-2022

A few comments on the code

1) the range should start at 0, since cypher lists start at index 0

2) you are probably getting higher counts for the directors than you expected due to the director code being repeated for each cast member. You can fix the second issue by moving that code to before the unwind or inserting ‘with m, row’ before the director code so the unwind is finished first.

3) I don’t think the counts make sense, as a person acts in or directs the same movie once

What results were you getting that you didn’t think were correct?

kashu94 · ‎11-17-2022

This is actually a small POC to work on right now. This use-case is similar to a bigger project that my team is currently working for. So, in our project, we would be having many such edge attributes that we'd be changing with the help of different formulas. So, what I was thinking is, I'll load the 75% of data with a new edge attribute named 'count' which would increment as that particular NODE-RELATIONSHIP->NODE is found again. And after this, I would update with 100% data and as the 75% of data is loaded again, the edge property count would get incremented, and as the remaining 25% data is also loaded this time, we would also be able to see some new nodes and relationships being created. So, this would tackle both the updates' problem (updating edge attribute and updating the graph by creation of new nodes and relationships). Our actual project's use-case requires those 2 updates to be done on the graph almost on a daily basis. So, I was trying to experiment this with the help of having a 'count' edge attribute for the same. But, I don't know why 'count' isn't incremented when I load the same data again.

And the range is starting from 1 because I'd like to discard brackets stored that are there in the string format ....

"['Hello','World']" ---> '[' , 'Hello', 'World', ']'

glilienfield · ‎11-17-2022

Are you stating that if you run the query over and over with the same data that the counts don’t increment? Do you see the director relationship’s counter incrementing?

can you provide your spreadsheet to test?

I thought the substring was removing g the brackets.

kashu94 · ‎11-17-2022

Yes, director's nodes are incrementing but not the actor's. And, I'm not able to attach a .csv file here.

movie_id	title	genres	budget	revenue	director_name	director_id	cast_ids	cast_names	cast_characters
0		86838	Seven Psychopaths	['Comedy', 'Crime']	15000000.0	19422261.0	Martin McDonagh	54472	[3, 6, 4]	['Colin Farrell', 'Sam Rockwell', 'Woody Harret']	['Marty Faranan', 'Billy Bickle', 'Charlie Coslo']
1		44154	A Touch of Zen	['Action', 'Adventure']	100000.0	80670.0	King Hu	83698	[2, 14, 4, 5 ]	['Hsu Feng', 'Sammo Hung', 'Tien Peng', 'Roy C']	['Yang Hui-ching', 'Commander Hsu's bodyguard', 'man 1', 'man 2' ]

Neo4j

Updating edge property doesn't work