cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Understanding Optimization with Embedded Java API - Ex: Java for loop vs. UNWIND

tj_a
Node Link

Hey all,

Having some trouble understanding the functional difference on whether some logic is performed utilizing Java vs. Cypher when creating custom procedures/functions.

Take for example:

with apoc.map.fromValues(["clientID", 14702]) as innerQueryParamsMap 
unwind $nodeLabelList as nodeLabel 
with innerQueryParamsMap, nodeLabel 
call apoc.cypher.run("match (n:" + nodeLabel + ")-[*]->(c:client {clientID: $clientID})
                      with c.clientID as clientID, n.rowHashValue as rowHashValue 
                      order by clientID, rowHashValue 
                      return clientID, apoc.util.md5(collect(rowHashValue)) as combinedNodeHashValue", innerQueryParamsMap) yield value as row 
with nodeLabel, row.combinedNodeHashValue as combinedNodeHashValue 
return nodeLabel, combinedNodeHashValue 
order by nodeLabel, combinedNodeHashValue

The above cypher is a small part of a custom proc I'm creating.
Using unwind over a list of node labels, the logic essentially returns a hash over a single property over the entirety of that node.

Since the list iteration is in cypher, this required me to use apoc.cypher.run to dynamically filter by label on node (n). This is then executed with one call through the provided GraphDatabaseService.

This iteration logic could essentially be done in Java itself using a for, and replacing the node label through code. This would allow me to take out the apoc.cypher.run procedure call.
However, this would require either multiple .execute() calls through the API.
Or create multiple cypher statements ending with ";" within a single String that gets executed once (not sure if this is possible).

Functionally there's no difference between the 2 ways from what I see. But I assume the first option is better for performance and optimization, and with the second I would run into performance problems with a large enough List iteration?

I've tried to find best practices on if/when it's acceptable to pull logic into the Java code itself (ease of use for myself, beginner at Cypher). I wasn't able to find any.

Any guidance would be much appreciated!

1 ACCEPTED SOLUTION

You cannot have dynamic labels in cypher so, unwind won't help you there.

So you need your for loop.

UNWIND would help you if you wanted to iterate over a list of paramters (which themselves could be maps/arrays/scalar values).

There, using unwind might be beneficial as you can pass in parameters and only ONE cypher statement is executed. Not N. So you only go through the parser, rewriter, checker and planner one time instead of N times.

View solution in original post

3 REPLIES 3

You cannot have dynamic labels in cypher so, unwind won't help you there.

So you need your for loop.

UNWIND would help you if you wanted to iterate over a list of paramters (which themselves could be maps/arrays/scalar values).

There, using unwind might be beneficial as you can pass in parameters and only ONE cypher statement is executed. Not N. So you only go through the parser, rewriter, checker and planner one time instead of N times.

Hey Michael - thanks for the response!

Taking my above code example: I'm utilizing UNWIND + apoc.cypher.run to utilize dynamic label matching instead of a JVM for loop equivalent.

My main confusion is due to the use of a procedure within unwind. I wasn't sure if there was any to the above, as the call to apoc.cypher.run would still hit the DB (n) times and have to go through the parser/rewriter/checker/planner (n) times for each procedure call. Or am I misunderstanding this?

Yes unfortunately the procedure has to do that to run the custom cypher statements.