Neo4j

lsnd · ‎06-18-2022

I started to learn Neo4j a few days ago.

I'm using it to find best path and make some analyzes.

The logic is a Person (id, name) can go to a Restaurant (id, name) via some Street (id, name). The connection between them have a cost. PS: All streets have a connection between them. For example:

(Person {id: 1})-[CONNECTION {cost:10}]->(Street {id: 1})
(Person {id: 1})-[CONNECTION {cost:11}]->(Street {id: 2})
(Street {id: 1})-[CONNECTION {cost:4}]->(Street {id: 2})
(Street {id: 2})-[CONNECTION {cost:11}]->(Restaurant {id: 1})
(Street {id: 2})-[CONNECTION {cost:7}]->(Restaurant {id: 2})

I am using Dijkstra to find all best path to all Restaurant for a specific Person. But the problem is that I can't set the maximum depth, and I would like to limit a maximum of 3 streets. How could I do that?

CALL gds.graph.project(
  'Person-Street-Restaurant',    
  ['Person', 'Street' 'Restaurant'],   
  'CONNECTION',
  {
    relationshipProperties: 'cost'
  }
)

MATCH (source:Person{id:1})
CALL gds.allShortestPaths.dijkstra.stream('Person-Street-Restaurant', {
    sourceNode: source,
    relationshipWeightProperty: 'cost'
})
YIELD sourceNode, targetNode, totalCost, nodeIds
WHERE 'Restaurant' IN LABELS(gds.util.asNode(targetNode))
RETURN
    gds.util.asNode(sourceNode).name AS sourceNodeName,
    gds.util.asNode(targetNode).name AS targetNodeName,
    totalCost,
    [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS nodeNames,
    SIZE(nodeIds) AS hops
ORDER BY totalCost

Image example

The best path is: P->S#3->S#4->S#5->R, total cost 7. But if I limit to 2 streets, it should be P->S#1->S#2->R, total cost 10. PS: In this example is easy because we have fewer connections, but in real case we have a lot and all streets have connection between them.

I also tried to find all paths by MATCH p1(Person)-[:CONNECTION*3..3]->(Restaurant), but no success.

lsnd · ‎06-18-2022

I created this solution, but it's very slow when matching relationships 3..3.

MATCH (c: Person{id: 1}), (r: Restaurant)
MATCH p3 = (c)-[r3:CONNECTION*3..3]->(r)
WITH r, p3,
REDUCE(path = "", n IN NODES(p3) | path + n.name + "->") AS p3Path,
REDUCE(total = 0, r IN RELATIONSHIPS(p3) | total + r.cost) AS p3Total
WITH r, MIN(p3Total) AS p3TotalMin, COLLECT({path: p3Path, total: p3Total}) AS paths
UNWIND paths AS p
WITH r, p, p3TotalMin
WHERE p3TotalMin = p.total
RETURN r.name, p3TotalMin, p.path

bennu_neo · ‎06-19-2022

Hi @lsnd ,

Interesting use case! If you ask me, 2 things should be done:

1. In your case, I may use an specific type of relationship for the link between Persons and Streets (P_CONNECTION) as well for the link between Restaurants and Streets (R_CONNECTION). If you use one type of connection, you may find your self on awful cycles when traversing long paths i.e Going from the person to the street and from that street to a different person. Yes, you can add some conditions related to the cardinality of the Persons and Restaurant nodes on a path, but It will be kinda slow.

2. Based on a scenario where you change this relationships. Can you try this pipeline on your db? (you will need apoc on your db tho)

MATCH (c: Person{id: 2})
CALL apoc.path.expandConfig(c, {
    minLevel : 3,
    maxLevel : 3,
    relationshipFilter : 'P_CONNECTION>|CONNECTION|R_CONNECTION>',
    labelFilter : "/Restaurant",
    uniqueness : "NODE_PATH"
}) yield path
UNWIND relationships(path) as r
WITH r, startNode(r) as source, endNode(r) as target
WITH gds.alpha.graph.project('test', source, target, {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target)
    },{ 
        properties: r { cost : toInteger(r.cost) }         
}) AS g
RETURN
  g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels


MATCH (source:Person{id:2})
CALL gds.allShortestPaths.dijkstra.stream('test', {
    sourceNode: source,
    relationshipWeightProperty: 'cost'
})
YIELD sourceNode, targetNode, totalCost, nodeIds
WHERE 'Restaurant' IN LABELS(gds.util.asNode(targetNode))
RETURN
    gds.util.asNode(sourceNode) AS sourceNodeName,
    gds.util.asNode(targetNode) AS targetNodeName,
    totalCost,
    [nodeId IN nodeIds | gds.util.asNode(nodeId)] AS nodeNames,
    SIZE(nodeIds) AS hops
ORDER BY totalCost

You can configure the number of hops valid on your run with the min and max level of your apoc expansion before the projection.

Bennu

Oh, y’all wanted a twist, ey?

lsnd · ‎06-19-2022

Hi @bennu_neo,

Thank you so much for replying and helping.

I'm following your instructions, but what relationship would you use for Streets and Streets?

People and Street: P_CONNECTION
Street and Restaurant: R_CONNECTION
Street and Street: ?

Because someone can go to a street, next to another street, then to the restaurant.

lsnd · ‎06-19-2022

@bennu_neo , alright! I left connections between Street as CONNECTION, but when I try to execute your first code, to project the graph, it stuck on loading.

Maybe because I have a lot of data? Maybe because I'm testing on Neo4j Desktop?

MATCH (p: Person) RETURN COUNT(p): 1
MATCH (s: Street) RETURN COUNT(s): 693
MATCH (r: Restaurant) RETURN COUNT(r): 236
MATCH p=()-[:P_CONNECTION]->() RETURN COUNT(p): 668
MATCH rc=()-[:R_CONNECTION]->() RETURN COUNT(rc): 154105
MATCH c=()-[:CONNECTION]->() RETURN COUNT(c): 458452

bennu_neo · ‎06-19-2022

Hi @lsnd!

Yes. You can let the Connection relationship between streets. What's the count of r up to line 9?

Hoe much ram u have configured for the transaction heap?

Oh, y’all wanted a twist, ey?

lsnd · ‎06-19-2022

Hi @bennu_neo!

Unfortunately, I can't count rows up to line 9 as well. It stuck on loading.

I am using the default configuration from Neo4j.

dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=1G
dbms.memory.pagecache.size=512m
dbms.tx_state.memory_allocation=ON_HEAP

Maybe this is the problem! 😅

bennu_neo · ‎06-19-2022

Hello @lsnd,

I like trying to fit complex problems on small ram, but yeah, if you can give more power to your db instance it can help a lot, let's try with initial and max heap size set on 2gb.

Which direction have P_CONNECTION and R_CONNECTION?

I assume they go from Person to Street and from Street to Restaurant respectively.

Oh, y’all wanted a twist, ey?

lsnd · ‎06-19-2022

Hi @bennu_neo!

I set initial and max heap size on 2gb, but the problem persists, and when I execute the query the database seems to stop to work.

Yes, you're right! I imported them from CSV files.

LOAD CSV WITH HEADERS FROM 'file:///streets.csv' AS line
MERGE (s:Street {name: line.name, id: toInteger(line.id)})

LOAD CSV WITH HEADERS FROM 'file:///restaurants.csv' AS line
MERGE (r:Restaurant {name: line.name, id: toInteger(line.id)})

MERGE (Person{id: 1})

LOAD CSV WITH HEADERS FROM 'file:///street_costs.csv' AS line
MATCH (s1: Street {id: toInteger(line.street_src_id)})
MATCH (s2: Street {id: toInteger(line.street_dst_id)})
MERGE (s1)-[:CONNECTION {cost: toInteger(line.cost)}]->(s2)

LOAD CSV WITH HEADERS FROM 'file:///street_restaurant_costs.csv' AS line
MATCH (s: Street {id: toInteger(line.street_id)})
MATCH (r: Restaurant {id: toInteger(line.restaurant_id)})
MERGE (s)-[:R_CONNECTION {cost: toInteger(line.cost)}]->(r)

LOAD CSV WITH HEADERS FROM 'file:///person_street_costs.csv' AS line
MATCH (c: Person {id: 1})
MATCH (s: Street {id: toInteger(line.street_id)})
MERGE (c)-[:P_CONNECTION {cost: toInteger(line.cost)}]->(s)

Neo4j

How to find shortest path limiting the distance from source?