cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

[CSV import] PUBG dataset - Create edge between players (who killed who)

I'm expanding my horizon and currently fiddling around in Neo4j. Followed a few courses on Udemy and I thought I would master it enough to load a custom dataset 🙂

I want to load a PlayerUnknown's Battlegrounds dataset, the files are really big, I made a small version of it: Source: STACK

To show how the dataset looks like:

LOAD CSV WITH HEADERS FROM 'file:///kill_match_stats_smalll_batch.csv' AS row
WITH row.killed_by AS MurderWeapon, row.killer_name AS Murderer, toInteger(row.killer_placement) AS RankMurderer, 
     row.killer_position_x AS MurderPositionX, row.killer_position_y AS MurderPositionY, 
     row.map AS Map, row.match_id AS MatchID, toInteger(row.time) AS TimeOfDeathSec, 
     row.victim_name AS Victim, toInteger(row.victim_placement) AS RankVictim,
     row.victim_position_x AS VictimPositionX, row.victim_position_y AS VictimPositionY
RETURN MurderWeapon, Murderer, RankMurderer, MurderPositionX, MurderPositionY, Map, MatchID, TimeOfDeathSec, Victim, RankVictim, VictimPositionX, VictimPositionY
LIMIT 5;

Sometimes a players kills himself or falls to his death. The killer has in this scenario null.
To work with this I thought to replace it with system- and the reason why it killed.

So I create first the nodes, Player

// Add constraint
CREATE CONSTRAINT ON (p:Player) ASSERT p.name IS UNIQUE

// Create nodes:
LOAD CSV WITH HEADERS FROM 'file:///kill_match_stats_smalll_batch.csv' AS row
WITH row
MERGE (:Player{name:
					CASE WHEN row.killer_name IS NOT NULL 
						THEN row.killer_name
						ELSE 'System-' + row.killed_by END
			})
MERGE (:Player{name:
				CASE WHEN row.victim_name IS NOT NULL 
					THEN row.victim_name
					ELSE 'System-' + row.killed_by END
				})

Now I have the players I want to created the edges, who killed who.

// Create edges:
LOAD CSV WITH HEADERS FROM 'file:///kill_match_stats_smalll_batch.csv' AS row
WITH row
MERGE (Player)-[killed:Killed{
										`Killed With`:row.killed_by, 	
										`KillerX`:
											CASE WHEN row.killer_position_x IS NOT NULL 
											THEN row.killer_position_x
											ELSE '0' END, 
										`KillerY`:
											CASE WHEN row.killer_position_y IS NOT NULL 
											THEN row.killer_position_y
											ELSE '0' END, 
										`Map`:row.map, 
										`MatchID`:row.match_id, 
										`Time of Death`:row.time, 
										`VictimX`:
											CASE WHEN row.victim_position_x IS NOT NULL 
											THEN row.victim_position_x
											ELSE '0' END,
										`VictimY`:
											CASE WHEN row.victim_position_y IS NOT NULL 
											THEN row.victim_position_y
											ELSE '0' END
}]->(Player)
;

I tried this code but then the relationship points to itself. Does my relationship provide not enough information? Do I need to add the killer and victim name as well in the relationship. But if I do so I'm storing data multiple times.
I have the feeling I'm overlooking something. Do I first need to create the relationships and after the import match them?

1 REPLY 1

@neo4j13

The problem into your query is that the start node and the end node are the same thing.
That is, you execute:

MERGE (Player)-[killed:Killed{ ...... }]->(Player)

but without any additional information, Neo4j consider Player the same thing,
namely a node without labels and properties.

You could join your 2 query, in this way (note that now (Player)-[killed:Killed{ ...... }]->(Player) became (player1)-[killed:Killed{ ...... }]->(player2)😞

LOAD CSV WITH HEADERS FROM 'file:///kill_match_stats_smalll_batch.csv' AS row 
WITH row
MERGE (player1:Player{name:
					CASE WHEN row.killer_name IS NOT NULL 
						THEN row.killer_name
						ELSE 'System-' + row.killed_by END
			})
MERGE (player2:Player{name:
				CASE WHEN row.victim_name IS NOT NULL 
					THEN row.victim_name
					ELSE 'System-' + row.killed_by END
				})
// HERE WE RETRIEVE START AND END NODES 
// RESPECTIVELY WITH VARIABLE NAMES player1 AND player2
WITH player1, player2, row 
MERGE (player1)-[killed:Killed{
										`Killed With`:row.killed_by, 	
										`KillerX`:
											CASE WHEN row.killer_position_x IS NOT NULL 
											THEN row.killer_position_x
											ELSE '0' END, 
										`KillerY`:
											CASE WHEN row.killer_position_y IS NOT NULL 
											THEN row.killer_position_y
											ELSE '0' END, 
										`Map`:row.map, 
										`MatchID`:row.match_id, 
										`Time of Death`:row.time, 
										`VictimX`:
											CASE WHEN row.victim_position_x IS NOT NULL 
											THEN row.victim_position_x
											ELSE '0' END,
										`VictimY`:
											CASE WHEN row.victim_position_y IS NOT NULL 
											THEN row.victim_position_y
											ELSE '0' END
}]->(player2)  

Or even, if you want to split nodes and relationships,
you could change only the 2nd query (same as above but with MATCH instead of MERGE, so that we search the nodes before, and then we create the relationships)

LOAD CSV WITH HEADERS FROM 'file:///kill_match_stats_smalll_batch.csv' AS row 
WITH row
MATCH (player1:Player{name:
					CASE WHEN row.killer_name IS NOT NULL 
						THEN row.killer_name
						ELSE 'System-' + row.killed_by END
			})
MATCH (player2:Player{name:
				CASE WHEN row.victim_name IS NOT NULL 
					THEN row.victim_name
					ELSE 'System-' + row.killed_by END
				})
WITH player1, player2, row 
MERGE (player1)-[killed:Killed{
										`Killed With`:row.killed_by, 	
										`KillerX`:
											CASE WHEN row.killer_position_x IS NOT NULL 
											THEN row.killer_position_x
											ELSE '0' END, 
										`KillerY`:
											CASE WHEN row.killer_position_y IS NOT NULL 
											THEN row.killer_position_y
											ELSE '0' END, 
										`Map`:row.map, 
										`MatchID`:row.match_id, 
										`Time of Death`:row.time, 
										`VictimX`:
											CASE WHEN row.victim_position_x IS NOT NULL 
											THEN row.victim_position_x
											ELSE '0' END,
										`VictimY`:
											CASE WHEN row.victim_position_y IS NOT NULL 
											THEN row.victim_position_y
											ELSE '0' END
}]->(player2)