cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Best practice to fetch data using APOC

I have a requirement to get direct employee count and total employee count of an org for a manager. the same can be achieved by using cypher match and with clause. This data updated in every 1 hour and there is huge number of records. So here direct cypher query may give degrade in performance. So Is there any alternate way to get this result either using APOC or Unwind. Kindly share your ideas.

1 ACCEPTED SOLUTION

match (m:Employee) 
// use degree
with m, size((:Employee)-[:REPORTS_TO]->(m)) as direct_reportees 
// cheaper to get the subgraph size with an apoc procedure
// no full expand just shortest paths
match shortestPath( (all:Employee)-[:REPORTS_TO*]->(m) )
// don't aggregate on property
with m,direct_reportees,count(*) as all_reportees 
return m.name as manager,direct_reportees,all_reportees

for your counting: apoc.neighborhood.tohop.count

this can work too but is less efficient

match (all)-[:REPORTS_TO*]->(m)
with m, count(distinct all), direct_reportees

View solution in original post

7 REPLIES 7

The query shouldn't take that long, perhaps you can ask here for help on optimizing it.

If it really takes this long, you can pre-compute intermediate results (e.g. counts) either on insert or continuously in the background and then use the aggregated values.

Hi Michale,
Thank you for the reply.

The below is my cypher query where i got all direct report and all report to for all the manager.
match(e:Employee)-[:REPORTS_TO]->(m:Employee) with m,count(e) as direct_reportees match (all:Employee)-[:REPORTS_TO*]->(m) with m.name as manager,direct_reportees,count(all) as all_reportees return manager,direct_reportees,all_reportees

Here the manager and employee data get refreshed frequently. so instead of using this direct query, Can we use APOC iterate or unwind something? If yes, can you please help with the code snippet. Kindly suggest.
Thank you.

Also suggest any other way to write the same query to get better performance.
thanks

match (m:Employee) 
// use degree
with m, size((:Employee)-[:REPORTS_TO]->(m)) as direct_reportees 
// cheaper to get the subgraph size with an apoc procedure
// no full expand just shortest paths
match shortestPath( (all:Employee)-[:REPORTS_TO*]->(m) )
// don't aggregate on property
with m,direct_reportees,count(*) as all_reportees 
return m.name as manager,direct_reportees,all_reportees

for your counting: apoc.neighborhood.tohop.count

this can work too but is less efficient

match (all)-[:REPORTS_TO*]->(m)
with m, count(distinct all), direct_reportees

Michael, Thank you for solution.
Here for shortest path i am getting below error.

The shortest path algorithm does not work when the start and end nodes are the same. This can happen if you
perform a shortestPath search after a cartesian product that might have the same start and end nodes for some
of the rows passed to shortestPath. If you would rather not experience this exception, and can accept the
possibility of missing results for those rows, disable this in the Neo4j configuration by setting
cypher.forbid_shortestpath_common_nodes to false. If you cannot accept missing results, and really want the
shortestPath between two common nodes, then re-write the query using a standard Cypher variable length pattern
expression followed by ordering by path length and limiting to one result.

``So i need to set the below configuration change is the only solution or there is any alternative?`
cypher.forbid_shortestpath_common_node=false

please suggest here, thank you in advance.

You can just use MATCH (all:Employee) WHERE all <> m before the shortest path

but the alternative shown at the bottom of my message might work equally as well

Thank you Michael for the detailed info.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online