cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Calculating sum of a property from a relationship

I have a set of football matches based upon the following model, with n being the football match, a and s the teams:
(s)<-[:away_team{scored:0, result: "lost"}]-(n)-[:home_team{scored:1, result: "won"}]->(a)

I am looking to return the top scorer for all the matches in the database and have come up with the following query:
MATCH (n:data)-[r:home_team]->(a:team)
With SUM(r.scored) as goals
MATCH (n:data)-[r:away_team]->(a:team)
return SUM(r.scored) + goals AS goals, a.name
Order by goals desc

However this returns inaccurate numbers of goals as it sums for example- 9, 8 etc. down to 0 instead of just 9. Is my model correct, or does the query need changing? Many thanks.

1 ACCEPTED SOLUTION

It's possible but tricky and uses some fairly advanced cypher ...
Run the queries one at a time ...

// query 1 - create dummy data
create
(t1:Team {name:"CircleFC"}),
(t2:Team {name:"OlympicoFC"}),
(t3:Team {name:"GoalsFC"}),
(t4:Team {name:"StadiumFC"}),
(t1)<-[:Home {pts:0}]-(m1:LeagueMatch {fixture:101})-[:Away {pts:3}]->(t2),
(t1)<-[:Away {pts:3}]-(m2:LeagueMatch {fixture:202})-[:Home {pts:0}]->(t3),
(t1)<-[:Home {pts:3}]-(m3:LeagueMatch {fixture:303})-[:Away {pts:0}]->(t4),
(t4)<-[:Home {pts:3}]-(m4:LeagueMatch {fixture:404})-[:Away {pts:0}]->(t2),
(t4)<-[:Away {pts:3}]-(m5:LeagueMatch {fixture:505})-[:Home {pts:0}]->(t3)

// query 2 - set all to be zero!
match (team:Team)-[r]-(m:LeagueMatch)
set r.consecWins = 0

// query 3 - each win is possibly the start of a new run of wins
match (team:Team)-[r]-(m:LeagueMatch)
where r.pts = 3
set r.consecWins = 1

// query 4 -
match (team:Team)-[r]-(m:LeagueMatch)
with team, r order by m.fixture
with team, collect(r) as matchLinksPerTeam
// loop over each game for each team ...
unwind range(1,size(matchLinksPerTeam)) as i
// taking each pair of consecutive games
with matchLinksPerTeam[i-1] as prevMatchLink, matchLinksPerTeam[i] as currMatchLink
set currMatchLink.consecWins =
case
// winning streak is one game longer ...
when (prevMatchLink.pts = 3 and currMatchLink.pts = 3)
then prevMatchLink.consecWins + 1
else
// leave as is ...
currMatchLink.consecWins
end

// query 5 - show results
match (team:Team)-[r]-(m:LeagueMatch)
return team.name, max(r.consecWins)

View solution in original post

6 REPLIES 6

I am looking to return the top scorer for all the matches in the database and have come up with the following

To get the team who scored the most total goals across all their games ...

match (n:data)-[r]-(a:team)
return a.teamName, sum(r.scored) as totalGoalsPerTeam
order by totalGoalsPerTeam desc limit 1

note I guessed a.teamName to be a property of the node, and also this query will only ever return 1 row so if two teams both scored an equal highest number of goals then 1 of the teams would be missing in the output!

Thanks for your reply, it worked successfully and also allowed me to return the team that conceded the fewest goals. I also am looking to return the team that had the most consecutive losses. The n:data nodes contain a property time, with epoch time. Any ideas how to go about querying for consecutive losses based on the epoch time property please?

Given how your data is stored, it's not so easy ...

If consecutive games were stored as chains in the graph it would be easier but you don't have that.

For each team, you could get the list of linked games and order by then epoch time, then use an unwind statement to iterate over the games and try to keep count of consecutive looses and store the counter on the link to the game something like that ...

Thanks again for your reply. I attempted to use the unwind function after listing the games and it wouldn't return as consecutive. I updated the model by adding the property 'result' with either "win or loss" to each relationship, so I could use that property and then potentially order by the time?

It's possible but tricky and uses some fairly advanced cypher ...
Run the queries one at a time ...

// query 1 - create dummy data
create
(t1:Team {name:"CircleFC"}),
(t2:Team {name:"OlympicoFC"}),
(t3:Team {name:"GoalsFC"}),
(t4:Team {name:"StadiumFC"}),
(t1)<-[:Home {pts:0}]-(m1:LeagueMatch {fixture:101})-[:Away {pts:3}]->(t2),
(t1)<-[:Away {pts:3}]-(m2:LeagueMatch {fixture:202})-[:Home {pts:0}]->(t3),
(t1)<-[:Home {pts:3}]-(m3:LeagueMatch {fixture:303})-[:Away {pts:0}]->(t4),
(t4)<-[:Home {pts:3}]-(m4:LeagueMatch {fixture:404})-[:Away {pts:0}]->(t2),
(t4)<-[:Away {pts:3}]-(m5:LeagueMatch {fixture:505})-[:Home {pts:0}]->(t3)

// query 2 - set all to be zero!
match (team:Team)-[r]-(m:LeagueMatch)
set r.consecWins = 0

// query 3 - each win is possibly the start of a new run of wins
match (team:Team)-[r]-(m:LeagueMatch)
where r.pts = 3
set r.consecWins = 1

// query 4 -
match (team:Team)-[r]-(m:LeagueMatch)
with team, r order by m.fixture
with team, collect(r) as matchLinksPerTeam
// loop over each game for each team ...
unwind range(1,size(matchLinksPerTeam)) as i
// taking each pair of consecutive games
with matchLinksPerTeam[i-1] as prevMatchLink, matchLinksPerTeam[i] as currMatchLink
set currMatchLink.consecWins =
case
// winning streak is one game longer ...
when (prevMatchLink.pts = 3 and currMatchLink.pts = 3)
then prevMatchLink.consecWins + 1
else
// leave as is ...
currMatchLink.consecWins
end

// query 5 - show results
match (team:Team)-[r]-(m:LeagueMatch)
return team.name, max(r.consecWins)

Both the query and the model need changes. I'm not sure what the :data label is, is that a player, or something else?

In your first MATCH, you appear to be summing all goals scored in all matches against all teams, which almost certainly isn't what you want. You should start with a particular n node, and then sum only the home_team relationships. In the WITH clause, you would then carry over goals, and whatever n you started with, so that in the SECOND match, you're matching from the same n node.