Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-09-2019 05:00 PM
On Friday Women’s World Cup 2019 started, and over the weekend we’ve dusted off our World Cup scraping scripts and created a Women’s World Cup Graph.
We have a hosted version of a World Cup Graph on a Neo4j Cloud instance at 5d37db5a.databases.neo4j.io. You can login with the username worldcup and password worldcup. Once you’ve logged in run :play womens-worldcup-queries
We’ve made some tweaks to the graph model that we used for Men’s World Cup 2018, so let’s have a look at our new and improved model.
Let’s start with the Tournament node. We have one Tournament node for each World Cup tournament, so there are 8 of these nodes, one for each of the tournaments from 1991 to 2019. Teams participate in these tournaments, so we create that relationship between Team and Tournament nodes.
Squads are NAMED by Teams FOR each of these tournaments, and a Person can either be in the squad or the coach for that squad.
After exploring the data, I realised that people can only ever play for one team, so we have a REPRESENTS relationships between the Person and Team nodes.
The relationship between Person and Matches has been simplified from the previous model. We’ve removed the concept of Appearance, and now have direct relationships from Player to Match. The PLAYED_IN relationship is used both for players who start a match, and those who come on as a substitute.
And finally, Teams play in Matches. We capture the result of the match on the PLAYED_IN relationships.
We have a Neo4j browser guide that you can use to import the data into your own, local Neo4j instance if you want to play along.
:play womens-worldcup
We’ve also got a hosted version of a World Cup Graph on a Neo4j Cloud instance at 5d37db5a.databases.neo4j.io. You can login with the username worldcup and password worldcup.
If you use that one you don’t need to bother with the data import and can start straight with the queries by running the following guide:
:play womens-worldcup-queries
Let’s have a look at some of the queries that we can run against this dataset.
MATCH (tournament:Tournament), (team:Team)
WITH team, collect(tournament) AS tournaments
WHERE all(t in tournaments WHERE (team)-[:PARTICIPATED_IN]->(t))
RETURN [(team)-[:PARTICIPATED_IN]->()]
MATCH (t1:Team)-[p1:PLAYED_IN]-(m:Match)<-[p2:PLAYED_IN]-(t2:Team),
(m)-[:IN_TOURNAMENT]->(tourn)
WHERE id(t1) < id(t2) AND m.stage = "Final"
RETURN tourn.name AS name, tourn.year AS year,
t1.name AS team1, t2.name AS team2,
CASE WHEN p1.score = p2.score
THEN p1.score + "-" + p2.score + " (" +
p1.penaltyScore + "-" + p2.penaltyScore + ")"
ELSE p1.score + "-" + p2.score
END AS result,
(CASE WHEN p1.score > p2.score THEN t1
WHEN p2.score > p1.score THEN t2
ELSE
CASE WHEN p1.penaltyScore > p2.penaltyScore THEN t1
ELSE t2 END END).name AS winner
ORDER BY tourn.year
MATCH (p:Person)-[:SCORED_GOAL]->(match)-[:IN_TOURNAMENT]->(tourn),
(p)-[:REPRESENTS]->(team)
RETURN p.name, team.name AS team, count(*) AS goals,
apoc.coll.sort(collect(DISTINCT tourn.year)) AS years
ORDER BY goals DESC
LIMIT 10
MATCH (p:Person)-[:SCORED_GOAL]->(match)-[:IN_TOURNAMENT]->(tourn),
(p)-[:REPRESENTS]->(team)
WITH p, team, count(*) AS goals,
apoc.coll.sort(collect(DISTINCT tourn.year)) AS years
WHERE (p)-[:IN_SQUAD]->()-[:FOR]->(:Tournament {year: 2019})
RETURN p.name, team.name AS team, goals
ORDER BY goals DESC
LIMIT 10
We hope you enjoy the dataset and if you have any questions or suggestions on what we should do next let us know in the comments or send us an email to devrel@neo4j.com.
We encourage you to take the data and either build your own APIs or applications or analysis notebooks on top of it. We’d love to hear all about your ideas
Now Available: Women’s World Cup 2019 Graph was originally published in neo4j on Medium, where people are continuing the conversation by highlighting and responding to this story.
All the sessions of the conference are now available online