cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Preparing OSM data for routing

I've been trying to follow the GraphConnect 2018 video on loading OSM data into a routable graph (https://neo4j.com/graphconnect-2018/session/neo4j-spatial-mapping) - all goes well until I try the cypher shown at 21:26. If I run the cypher exactly as shown (including 'LIMIT 100' on the match), won’t that only setup a [:ROUTE] relationship for 100 intersections? Regardless, if I try to batch process the job via apoc.periodic.iterate, it seems to crash the neo4j server (nothing obvious in the logs, just the cypher executed followed by

… in separate thread

and then

INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started

Any ideas on how to execute this across all matching nodes? I've tried invoking the procedure using the different parameters given as an example on the repo readme:

CALL spatial.osm.routeIntersection(x,false,false,false)

but get the same result. I've even tried running

CALL spatial.osm.routeIntersection(x,true,true,true)

which according to the docs creates the relationship minus the distance property, but that too causes a server crash if run for more than 100 nodes.

Any help appreciated, thanks!

9 REPLIES 9

In the presentation I showed versions of the queries that had LIMIT in them and did not use apoc.periodic.iterate only because they were nicer to show visually, but in building the graph I certainly used the periodic.iterate versions all the time, as you suspected.

The symptoms you describe sound like it is likely you are running out of memory. I know I needed to tweak memory settings to make the most of my RAM, but also the apoc.periodic.iterate settings were important to get the best performance and memory usage. I don't have records of the exact tweaking I did, but I do have a copy of the notes I took for the queries I ran:

Here are the queries relevant to building the routing graph:

//
// Identify (:OSMNode) instances that are intersections (connected INDIRECTLY to more than one (:OSMWayNode) and on ways or relations that are also streets.
//

MATCH (n:OSMNode)
  WHERE size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2
  AND NOT (n:Intersection)
WITH n LIMIT 100
MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
      (wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
  WHERE exists(wt.highway) AND NOT n:Intersection
SET n:Intersection
RETURN COUNT(*);

// Periodic iterate

CALL apoc.periodic.iterate(
'MATCH (n:OSMNode) WHERE NOT (n:Intersection)
 AND size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2 RETURN n',
'MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
       (wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
   WHERE exists(wt.highway) AND NOT n:Intersection
 SET n:Intersection',
{batchSize:10000, parallel:true});

MATCH (i:OSMNode) RETURN 'OSM Nodes' AS type, count(i)
UNION
MATCH (i:OSMPathNode) RETURN 'Nodes on paths' AS type, count(i)
UNION
MATCH (i:PointOfInterest) RETURN 'Points of interest' AS type, count(i)
UNION
MATCH (i:Intersection) RETURN 'Intersections' AS type, count(i);


// Produced 50k intersections in 185s for NY
// US-NE took 45 minutes to produce 789505
// San Francisco took 16s to process nodes Intersections

// San Francisco
//╒════════════════════╤══════════╕
//│"type"              │"count(i)"│
//╞════════════════════╪══════════╡
//│"OSM Nodes"         │2880804   │
//├────────────────────┼──────────┤
//│"Nodes on paths"    │235730    │
//├────────────────────┼──────────┤
//│"Points of interest"│3124      │
//├────────────────────┼──────────┤
//│"Intersections"     │53744     │
//└────────────────────┴──────────┘

//
// Find and connect intersections into routes
//

MATCH (x:Intersection) WITH x LIMIT 100
  CALL spatial.osm.routeIntersection(x,true,false,false)
  YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
  ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);

// With Periodic Iterate:

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:100, parallel:false});

// San Francisco took 103s to perform 54k committed operations

// If there are errors, repeat with smaller batch size to better cope with StackOverFlow

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:10, parallel:false});

// Now find Routable nodes from the PointOfInterest search and link them to the route map

MATCH (x:Routable:OSMNode)
  WHERE NOT (x)-[:ROUTE]->(:Intersection) WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
  YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
  ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);

// With periodic iterate

CALL apoc.periodic.iterate(
'MATCH (x:Routable:OSMNode)
   WHERE NOT (x)-[:ROUTE]->(:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:10, parallel:false});

// SF took 16s to do 1538 committed operations

// The algorithm makes self relationships, so delete with

MATCH (a:Intersection)-[r:ROUTE]->(a) DELETE r RETURN COUNT(*);

// SF had a 402 self relationships

// Now to get an idea of the distribution of route distances

MATCH (a:Intersection)-[r:ROUTE]->() RETURN 'All routes' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 25 RETURN '>25m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 50 RETURN '>50m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 100 RETURN '>100m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 250 RETURN '>250m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 RETURN '>500m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 5000 RETURN '>5000m' AS type, COUNT(*) AS count;

// SF
//╒════════════╤═══════╕
//│"type"      │"count"│
//╞════════════╪═══════╡
//│"All routes"│86315  │
//├────────────┼───────┤
//│">25m"      │55662  │
//├────────────┼───────┤
//│">50m"      │40227  │
//├────────────┼───────┤
//│">100m"     │18992  │
//├────────────┼───────┤
//│">250m"     │3976   │
//├────────────┼───────┤
//│">500m"     │1174   │
//├────────────┼───────┤
//│">5000m"    │59     │
//└────────────┴───────┘

// To improve inner-city routing we can optionally remove some of the longer ones which might be falsely detected

MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 DELETE r RETURN COUNT(*);

Many thanks Craig, will take another look at it this week and try some of your suggestions.

Are there any relevant memory settings I can look at besides server page-cache and heap size? With generous settings for both that have worked with other large imports, I'm still getting the same server crash, even when I try your small batch iterator using batchSize:10 (and even batchSize:1)

Hi!
Did you figure this out? Having the exact same problem when running the query:

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:10, parallel:false});

I've tried both with and without batching, on small and large dataset - same results everytime: the server just shuts down without any obvious hints in the logs.

I wasn't able to resolve it, and unfortunately haven't looked at the project since.

Craig might be able to offer you some suggestions?

Hey all - I was running into the same problem with the updated 0.2.3 importer - database crashing regularly. I dug into it for a while and realized that none of the :NEXT relationships had distance properties, which meant that the routeIntersection procedure couldn't work. Not sure why that threw an error, but I added the distance property to all of the :NEXT relationships using the query Craig posted online here Slide 28.

That can be done before or after adding the intersection labels to the graph. I ran a couple batches without `apoc.periodical' of 1000 and it worked fine, so it's crunching now on the whole model.

Thanks everybody (especially Craig) for working on this!

As @waterdoggy has pointed out, it is necessary to follow the same procedures I originally used, as the current code is not production code designed to handle all contingencies, but was built specifically for that demo. It would be great to refine and improve these utilities and procedures for much more general purpose usage, but that will take time. I have recently started porting the various spatial libraries to Neo4j 4.0 and the changes are quite large, so that will take time, but hopefully will lead to a general cleanup as well.

I successfully run the Create Intersection and Distance query. If I run the intersection query with LIMIT 100 it works.

MATCH (x:Intersection) WITH x LIMIT 100
  CALL spatial.osm.routeIntersection(x,true,false,false)
  YIELD fromNode, toNode, fromRel, toRel, distance
WITH fromNode, toNode, fromRel, toRel, distance
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
  ON CREATE SET r.distance = distance
RETURN COUNT(*);

But it only creates a few Routes, about 330, so I tried running it with apoc.periodic.iterate (tried 10, 100, 1000, 10000 Batchsizes). This query just doesnt terminate, I waited multiple hours.
Running the following query manually a few times (I have 180.000 Intersections) the queries produce arount 300k Route relations.

MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance
 WITH fromNode, toNode, fromRel, toRel, distance
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance
 RETURN count(*)

But the amount of Intersections without Route relations never gets zero.

After reading this https://aura.support.neo4j.com/hc/en-us/articles/1500011138861-Using-apoc-periodic-iterate-and-under... I came up with the following Query:

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) RETURN id(x) as id',
'MATCH (x) WHERE id(x) = id 
  CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:100, parallel:false});

This does the job! But I dont know if the Route Graph should only include one direction ? In my case a Graph with 300k Route relations has been created, but there are only Oneway directions.