cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Optimising nested COLLECT aggregations in Cypher Query

My graph is for representing Backlogs, Stories, Sprints, Tasks Users, and Resources. A Backlog can have Stories, which can have tasks, which can be associated with Sprints and Users. I would like to return a nested structure of Backlog -> Stories ->Task, and also show the Sprint & Users associated with Tasks.

For a small Backlog with 11 Stories and 40 tasks, I'm getting 1,972 db hits taking 1208ms.
Seems quite sluggish, and i think the issue is with the nested COLLECT aggregations. I'm hoping someone might have some suggestions to improve performance.

My query is below along with the profiled execution plan

MATCH (backlog:Backlog{id:'2957822f34504862ae51e8f69980a15f'})
OPTIONAL MATCH (backlog)-[:BACKLOG_HAS_STORY]->(story)
OPTIONAL MATCH (story)-[:STORY_HAS_TASK]->(task
OPTIONAL MATCH (task)-[:TASK_HAS_RESOURCE]->(resource)
OPTIONAL MATCH (sprint)-[:SPRINT_HAS_TASK]->(task)
OPTIONAL MATCH (owner)-[:ACCOUNT_OWNS_STORY]->(story)
WITH backlog,story,COLLECT(resource{.*}) as resources, task, sprint,owner
WITH backlog,story,COLLECT(task{.*,resources:resources,sprint:sprint{.*}}) as tasks,owner
WITH backlog, COLLECT(story{.*,owner:owner{.id,.firstName,.lastName},tasks: tasks}) as stories
RETURN backlog{.*,stories:stories}

1 ACCEPTED SOLUTION

In this case it will be more convenient to use pattern comprehensions, which is like doing an optional match and collecting the results, and it deals with cardinality issues that you're likely seeing as a result of your back-to-back optional matches.

Maybe something like this:

MATCH (backlog:Backlog{id:'2957822f34504862ae51e8f69980a15f'})
OPTIONAL MATCH (backlog)-[:BACKLOG_HAS_STORY]->(story)
WITH backlog, story, 
 [(story)-[:STORY_HAS_TASK]->(task) | task {.*, resources:[(task)-[:TASK_HAS_RESOURCE]->(resource) | resource {.*}], 
  sprint:[(sprint)-[:SPRINT_HAS_TASK]->(task) | sprint {.*}]}] as tasks
WITH backlog, story {.*, owner:[(owner)-[:ACCOUNT_OWNS_STORY]->(story) | owner {.id, .firstName, .lastName}], tasks} as story
WITH backlog, collect(story) as stories
RETURN backlog {.*, stories} as backlog

Technically you could skip the OPTIONAL MATCH from backlog to stories and use that for the top-level nesting of the pattern comprehension, that's up to you.

View solution in original post

2 REPLIES 2

In this case it will be more convenient to use pattern comprehensions, which is like doing an optional match and collecting the results, and it deals with cardinality issues that you're likely seeing as a result of your back-to-back optional matches.

Maybe something like this:

MATCH (backlog:Backlog{id:'2957822f34504862ae51e8f69980a15f'})
OPTIONAL MATCH (backlog)-[:BACKLOG_HAS_STORY]->(story)
WITH backlog, story, 
 [(story)-[:STORY_HAS_TASK]->(task) | task {.*, resources:[(task)-[:TASK_HAS_RESOURCE]->(resource) | resource {.*}], 
  sprint:[(sprint)-[:SPRINT_HAS_TASK]->(task) | sprint {.*}]}] as tasks
WITH backlog, story {.*, owner:[(owner)-[:ACCOUNT_OWNS_STORY]->(story) | owner {.id, .firstName, .lastName}], tasks} as story
WITH backlog, collect(story) as stories
RETURN backlog {.*, stories} as backlog

Technically you could skip the OPTIONAL MATCH from backlog to stories and use that for the top-level nesting of the pattern comprehension, that's up to you.

Wow ! I had no idea about pattern comprehensions. It reduced my query down to 8ms and is far more elegant. Thanks so much @andrew.bowman.