Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-17-2022 02:11 AM
I'm using neo4j python driver for my project. My data is in CSV format, as show below. some fields like cast_ids and cast_names are lists, so I would like to have a connection from movie_id to corresponding casts. Additionally I was experimenting with having an edge property like 'count', which would get incremented everytime that particular NODE-RELATIONSHIP-NODE is found again. I'm not able to do so with the below code, please tell me what I'm doing wrong.
movie_id, title, genres, budget, revenue, director_name, director_id, cast_ids, cast_names
1, "Salsa", ['Drama','Music','Romance'], 889767, 887666, "Boaz Davidson", 345, [4,6,8,1], ['Robi','Carlos','Hannah','Laura']
11-17-2022 02:50 AM
A few comments on the code
1) the range should start at 0, since cypher lists start at index 0
2) you are probably getting higher counts for the directors than you expected due to the director code being repeated for each cast member. You can fix the second issue by moving that code to before the unwind or inserting ‘with m, row’ before the director code so the unwind is finished first.
3) I don’t think the counts make sense, as a person acts in or directs the same movie once
What results were you getting that you didn’t think were correct?
11-17-2022 03:22 AM - edited 11-17-2022 03:25 AM
This is actually a small POC to work on right now. This use-case is similar to a bigger project that my team is currently working for. So, in our project, we would be having many such edge attributes that we'd be changing with the help of different formulas. So, what I was thinking is, I'll load the 75% of data with a new edge attribute named 'count' which would increment as that particular NODE-RELATIONSHIP->NODE is found again. And after this, I would update with 100% data and as the 75% of data is loaded again, the edge property count would get incremented, and as the remaining 25% data is also loaded this time, we would also be able to see some new nodes and relationships being created. So, this would tackle both the updates' problem (updating edge attribute and updating the graph by creation of new nodes and relationships). Our actual project's use-case requires those 2 updates to be done on the graph almost on a daily basis. So, I was trying to experiment this with the help of having a 'count' edge attribute for the same. But, I don't know why 'count' isn't incremented when I load the same data again.
And the range is starting from 1 because I'd like to discard brackets stored that are there in the string format ....
"['Hello','World']" ---> '[' , 'Hello', 'World', ']'
11-17-2022 04:05 AM
Are you stating that if you run the query over and over with the same data that the counts don’t increment? Do you see the director relationship’s counter incrementing?
can you provide your spreadsheet to test?
I thought the substring was removing g the brackets.
11-17-2022 04:34 AM - edited 11-17-2022 04:35 AM
Yes, director's nodes are incrementing but not the actor's. And, I'm not able to attach a .csv file here.
movie_id | title | genres | budget | revenue | director_name | director_id | cast_ids | cast_names | cast_characters | ||
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 86838 | Seven Psychopaths | ['Comedy', 'Crime'] | 15000000.0 | 19422261.0 | Martin McDonagh | 54472 | [3, 6, 4] | ['Colin Farrell', 'Sam Rockwell', 'Woody Harret'] | ['Marty Faranan', 'Billy Bickle', 'Charlie Coslo'] | |
1 | 44154 | A Touch of Zen | ['Action', 'Adventure'] | 100000.0 | 80670.0 | King Hu | 83698 | [2, 14, 4, 5 ] | ['Hsu Feng', 'Sammo Hung', 'Tien Peng', 'Roy C'] | ['Yang Hui-ching', 'Commander Hsu's bodyguard', 'man 1', 'man 2' ] |
All the sessions of the conference are now available online