Neo4j

benjamin_rood · ‎06-08-2022

I have a query optimisation problem... What I want is effectively the same as the problem in computer science of managing multiple concurrent workers when you only care about the first successful result that you get back (fastest), and then you want to signal immediately "stop" to the other workers.

I have a query which wants to check if there exists at least one potential path/walk from a set of nodes Κ to a unique node (α) (which I know the node id of). There will be between zero and ∣Κ∣ possible walks between any (κ) ∈ Κ and the target node (α)

Writing this query in Cypher is no problem, here's one simple way:

OPTIONAL MATCH p=shortestPath((κ)-[*]->(α)) RETURN p IS NOT NULL

The issue I am having is this: The query will do ∣Κ∣ "searches" matches, and then return ∣Κ∣ rows with TRUE or FALSE for all ∣Κ∣ "searches". This is wasteful for my needs. I want the query to stop and return TRUE on the first successful path match, or FALSE if all ∣Κ∣ "searches" fail.

Is there an APOC proceedure that could help me do this?

Edit: The closest kind of thing I can find is apoc.cypher.runTimeboxed, I just need something like this but instead of timing out, exiting on the first match.

glilienfield · ‎06-08-2022

I did not find anything that will allow you to execute them in parallel and interrupt ongoing executions. You could try executing them sequentially and stopping after one is found. The performance gain will depend on how early the shortest path is found, so it will vary for each set K and the order in which they are processed. Anyway, it would be an interesting implementation to try.

The following is just an idea. You could use apoc.periodic.iterate to iterate through your elements of 'K'. The first parameter would be a query to get all the elements of 'K' to iterate over. The second query is executed for each element, which would be the call to the shortest path. You need a mechanism to stop the execution and return the results and the shortest path is found. You could do this with a node you create and pass as a parameter. The node would have a property like 'continue', which would be set to true when you create the node. Your iterate query (second query) would first check this property to determine if it is true and execute the shortestPath and skip when false. The continue property is also set to false when a shortest path is found, so all remaining elements of 'K' will be skipped.

When it is true, you also add the shortest path results to a property of the node, so that they can be extracted once the periodic.iterate procedure returns, at which point you would also delete the node. You can add the array of path relationships to the node, the array of path nodes, and the length of path as properties.

You can use the apoc.do.when produced to conditionally execute the shortest path and update the node. I never used them, but there is something in apoc called virtual nodes. Maybe you can use one of these instead of creating a real node.

Just some brainstorming.

benjamin_rood · ‎06-08-2022

Yes, after sleeping on it I was thinking along these same lines; iterating and stopping on the first "hit" will definitely have a noticeable saving on overall cost of the query.

Thanks for the tips, I will let you know how it goes and report the final approach.

bennu_neo · ‎06-09-2022

Hi @benjamin_rood ,

Can you try something like this?

MATCH(alpha {id : $myAlphaId})
with alpha
MATCH(k)
WHERE k *my condition for k's*
with a, collect(k) as bl
call apoc.path.expandConfig(alpha, {
   limit : 1,
   optional : true,
   terminatorNodes : bl,
   uniqueness : 'NODE_GLOBAL',
   bfs : true
}) yield path
return path

Lemme know if it works.

Bennu

Oh, y’all wanted a twist, ey?

benjamin_rood · ‎06-12-2022

Hi Bennu, that sounds like a great choice for path finding, but I don't know how it would work as an optimisation, i.e. to terminate path on the first successful match found...? Ah! It's the LIMIT argument!
Okay, I will give this a go!

benjamin_rood · ‎06-13-2022

Sadly "limit: 1" isn't working. If there are matching > 1 paths found, it returns them all. Even if I change the return value to "RETURN path IS NOT NULL", I still get multiple rows. The goal is to stop as soon as the possible match is found. DFS search seems to be slightly faster in this case, too.

bennu_neo · ‎06-13-2022

Hi @benjamin_rood

That's weird! I just checked with random data and kinda worked.

Can you share the entire query you are trying and maybe sharing the results as well 😄 ?

Oh, y’all wanted a twist, ey?

Neo4j

Optimisation: Stop query on first successful match