cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Query to return all possible paths return infinite no of paths

im using the below graphs to identify all possible paths between start to end nodes.
there are 6 possible paths, that' i've identified, between start and end. maybe i've missed. around 15 or 20 max is what i guess.

`
im using the below query to identify the shortest path, whish returns two shortests paths which is path 3 and 4 marked in the graph.

match (end:Device {ip : "x.x.58.129"}),
path = allShortestPaths((Device {ip : "x.x.58.2"}) -[*..30]- (end)
return path

but what i want is to return all the possible paths(can include duplicate nodes), which is like 6 paths marked in the graph image above. for that purpuse im using the query

match (end:Device {ip : "x.x.58.129"}),
path = (Device {ip : "x.x.58.2"}) -[*..30]- (end)
return path

so what i'm expecting is 20 paths, max, but suprisingly im getting around 12,000 paths. the query is taking forever to return ( maybe more than 15 mins ), also the returned JSON sizes upto 235 MB of size. im puzzeled what might go wrong here.

1 ACCEPTED SOLUTION

Hi @murali.venugopal ,

The performance will degrade on a big database because the planner will behave as you said. Filtering at the end will increase the overall time execution and ram usage. So, as I commented before, if you are aiming for the best performance possible you may like considering the APOC approach.

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, collect(end) as tN
CALL apoc.path.expandConfig(start, {
    terminatorNodes : tN,
    uniqueness : 'RELATIONSHIP_PATH',
    relationshipFilter : 'Device|Interface '
} )
yield path
return path

Bennu

Oh, y’all wanted a twist, ey?

View solution in original post

12 REPLIES 12

Hi @murali.venugopal ,

Notice that you are never telling to the planner that it has to stop first time he visit the end node. Take that into consideration.

Do you have any cypher statement that may allowed us to recreate your example?

Bennu

Oh, y’all wanted a twist, ey?

@bennu.neo ,
thanks for you reply.
i even tried the below query to achive less number of paths. it didn't work. the same scenario.

match (start:Device {ip : "x.x.58.2"}), (end:Device {ip : "x.x.58.129"}),
path = (start) -[*..30]- (end)
return path

so, i'm attaching the cypher statements to recreate my db, as you've asked. please verify.
apoc_export.txt (16.5 KB)

Hi @murali.venugopal !

As I suggested before, with no condition on your traversing, you are executing a default Neo4J Trail traverse so relationship cann't be traversed twice, but you can repeat nodes, including intermediate and end node.

If you are planning to repeat intermediate nodes but no the end node, you can try:

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, end
MATCH path = (start) -[*..]- (end)
with nodes(path) as ns
WHERE 1=size([m in ns where m = end])
return ns

So you get 22 paths. If any node cann't be repeated, it's a bit trickier but achievable with list comprehension.

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, end
MATCH path = (start) -[*..]- (end)
with nodes(path) as ns
WHERE ALL (n IN ns  WHERE 1=size([m in ns where m = n]))
return ns

By the way, if you are aiming for the best performance possible, I suggest you to check the apoc.path procedures

So you can transform your queries into:

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, collect(end) as tN
CALL apoc.path.expandConfig(start, {
    terminatorNodes : tN,
    uniqueness : 'RELATIONSHIP_PATH'
} )
yield path
return path

And

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, collect(end) as tN
CALL apoc.path.expandConfig(start, {
   terminatorNodes : tN,
   uniqueness : 'NODE_PATH'
} )
yield path
return path

Bennu

Oh, y’all wanted a twist, ey?

@bennu.neo thanks for your effort to help me. it worked. thanks a lot.

@bennu.neo ,
the same query im trying to filter Label with the below query, but it returns no path. Now that i've more Labels relating the source and destination with different Labels like say Vlan, i want to filter out those paths. So that my paths will contain only Device or Interface nodes in it.

MATCH (start:Device), (end:Device)
WHERE start.ip = '107.56.58.2' and end.ip = '107.56.58.129'
WITH start, end
MATCH path = (start) -[:Device|Interface*..30]- (end)
with nodes(path) as paths
WHERE 1=size([m in paths where m = end])
RETURN paths

im restricting the paths to contain only Device or Interface type nodes. im not sure if the syntax is correct. please suggest.

Hi @murali.venugopal !

Label conditions can't be applied inside a relationship tag [].

Create and all condition for your nodes like and all(n in ns where n:Device or n:Interface) if you wanna stick on pure Cypher. With APOC it can easily achieved with the property labelFilters.

Regards,

Harold

Oh, y’all wanted a twist, ey?

@bennu.neo,
Im using the query to filter only Device and Interface nodes.

MATCH (start:Device), (end:Device) 
	WHERE start.ip = '100.66.58.2' and end.ip = '100.66.58.129' 
	WITH start, end 
		MATCH path = (start) -[*..20]- (end) with nodes(path) as paths 
		WHERE 1=size([m in paths where m = end]) 
		AND ALL (n in paths where n:Device or n:Interface)
	RETURN paths

this works fine with a minimal count of nodes. but with the actual dataset, the query take for ever to execute. so, i had to manually remove the Labels other than Device and Interface to execute the query. which is not possible in production environments, where i'll have VLans and Pseudowires and many more Labels.

i think the query planner takes all the available paths without the Device/Interface filter and then removes the paths that doesn't match the condition.

please try to provide your inputs.

thanks

Hi @murali.venugopal ,

The performance will degrade on a big database because the planner will behave as you said. Filtering at the end will increase the overall time execution and ram usage. So, as I commented before, if you are aiming for the best performance possible you may like considering the APOC approach.

MATCH (start:Device), (end:Device)
where start.ip ends with '58.2' and end.ip ends with '58.129'
WITH start, collect(end) as tN
CALL apoc.path.expandConfig(start, {
    terminatorNodes : tN,
    uniqueness : 'RELATIONSHIP_PATH',
    relationshipFilter : 'Device|Interface '
} )
yield path
return path

Bennu

Oh, y’all wanted a twist, ey?

@bennu.neo ,
Thanks for the help. But i see in the documentation traversal frame as depricated, which is belive APOC is using to explore the path.

Infact, i first started using APOC, and then after suggestion from @glilienfield , i wanted to skip APOC and stick with pure cypher. What's your opinion on that?

Hi @murali.venugopal !

Traversal API is signed as deprecated from version 5.x that is the next major. A newer version of this API will be available so eventually, libraries like APOC will get the chance to update their procedures.

In general, stick on pure cypher as much as you can but don't chain yourself into it if there are some other ways to do the jobs. There's no so many people doing website on pure Javascript tho.

Best regards,

Bennu

Oh, y’all wanted a twist, ey?

@bennu.neo ,
Thanks for your input. Im planning to stick with pure cypher only.
But now, i'm facing some problems with the paths identified earlier with the query

MATCH (start:Device), (end:Device) 
	WHERE start.ip = '77.101.58.2' and end.ip = '77.101.58.129' 
	WITH start, end 
		MATCH path = (start) -[*..20]- (end) with nodes(path) as paths 
		WHERE 1=size([m in paths where m = end]) 
		AND ALL (n in paths where n:Device or n:Interface)
	RETURN paths

im getting several paths. i can have duplicate nodes across paths, but cannot have duplicate within a single path. that'll ruin my very purpose of path identification.

eg. the above query returns 8 paths. the 3rd path for example has same node Device with ip : "77.101.58.16" multiple times.

Line 9: "ip": "77.101.58.2",
Line 39: "ip": "77.101.58.3",
Line 69: "ip": "77.101.58.16",
Line 99: "ip": "77.101.58.4",
Line 129: "ip": "77.101.58.15",
Line 159: "ip": "77.101.58.16",
Line 189: "ip": "77.101.58.129",

as you can see, 77.101.58.16 ip is coming 2 times in the path. which i want to avoid. please help me to remove those duplicate nodes within a path.
im attaching the json for the 3rd path.
path3.txt (3.8 KB)

You could try extending your approach of eliminating paths that have only one instance of the 'end' node to all nodes along the path. I think the following will do so. Sorry, I don't have data to test.

MATCH path = (start:Device)-[*..20]-(end:Device)
WHERE start.ip = '77.101.58.2' and end.ip = '77.101.58.129'
with nodes(path) as paths
WHERE ALL (n in paths where n:Device or n:Interface)
AND ALL(n in paths where size([x in paths where x=n])=1)
RETURN paths