cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Memory issues: Add new edge between nodes if there is a path of length k between them

Hi,

I am trying to add new edges between nodes which have paths of length 2. This is what I did:

Match path=((a:person)-[*2]-(b:person))
With a, b, Count(path) as weight
Merge (a)-[e:co_authors]->(b)
Set e.weight=weight

The number of person nodes I have in my database is 100001 and I found that the number of such paths of length 2 between Person nodes is 37817286.

I get a out of memory error:
Neo.TransientError.General.OutOfMemoryError: There is not enough memory to perform the current task. Please try increasing 'dbms.memory.heap.max_size' in the neo4j configuration (normally in 'conf/neo4j.conf' or, if you you are using Neo4j Desktop, found through the user interface) or if you are running an embedded installation increase the heap by using '-Xmx' command line flag, and then restart the database.

How do I fix the memory heap size?

Thanks,
Lavanya

Please keep the following things in mind:

  1. did you search for what you want to ask before posting?
  2. please use tags for additional info
  3. use a self-descriptive title

Please format code + Cypher statements with the code </> icon, it's much easier to read.

Please provide the following information if you ran into a more serious issue:

  • neo4j version, desktop version, browser version
  • what kind of API / driver do you use
  • screenshot of PROFILE or EXPLAIN with boxes expanded (lower right corner)
  • a sample of the data you want to import
  • which plugins / extensions / procedures do you use
  • neo4j.log and debug.log
8 REPLIES 8

You'll likely want to batch your writes via APOC Procedures

You may also want to add a predicate to prevent mirrored results (which would result in two relationships being created per pairing).

For example, maybe something like this:

CALL apoc.periodic.iterate("MATCH (a:person) RETURN a",
 "MATCH path = (a)-[*2]-(b:person)
 WHERE id(a) < id(b)
 WITH a, b, count(path) as weight
 MERGE (a)-[e:co_authors]->(b)
 SET e.weight=weight", {batchSize:5000}) YIELD batches, total, errorMessages
 RETURN batches, total, errorMessages

This will process in batches of 5000 persons at a time, though you may need to adjust your batchSize, depending on average number of coauthor relationships you expect per person.

Thanks for the response. I checked the code on my big graph and realised that (since there are no multiple edges)

CALL apoc.periodic.iterate("MATCH (a:person) RETURN a",
 "MATCH path = (a)-[*2]-(b:person)
 WHERE id(a) < id(b)
 WITH a, b, length(path) as pathlength
 MERGE (a)-[e:co_authors]->(b)
 SET e.weight=pathlength", {batchSize:5000}) YIELD batches, total, errorMessages
 RETURN batches, total, errorMessages

add one edge between persons for each path of length 2 between persons - This is what I wanted, although I posed the question differently:
add one edge between persons if there is a path of length 2 between persons.

I think there's something wrong with that query. You have: WITH a, b, length(path) as pathlength, but because you're using *2 for your var-length pattern, the length will always be 2.

Note in the previous version of the query you were using count(path), which is the number of paths found between the two nodes. This is also an aggregation function, meaning the non-aggregation variables become distinct, which would fix your cardinality problem (when using count(path), you will only ever get 1 row between an a and b node).

If length(path) is really how you want to calculate the weight, then you will need a different way to ensure a and b are distinct:

WITH DISTINCT a, b, length(path) as pathlength

I am trying to replace each path of length 2 as an edge. So I will have multiple rows with same a and b. That's why I am skipping the word "DISTINCT"

Ah, I misread your last update then, my mistake.

I think there is still an issue with

MATCH path = (a:person)-[*2]-(b:person)
 WHERE id(a) < id(b)
 WITH a, b, path, length(path) as pathlength
 MERGE (a)-[e:co_authors]->(b)
 SET e.weight=pathlength

since it is still not creating a unique edge for each path of length 2.

Ah, you need to use CREATE instead of MERGE for this, otherwise it will find and use the existing relationship and not create a new one.

Looks like you found that a second before me, looks like you're all set!

got it now at last with:

MATCH path = (a:person)-[*2]-(b:person)
 WHERE id(a) < id(b)
 WITH a, b, path, length(path) as pathlength
 CREATE (a)-[e:co_authors]->(b)
 SET e.weight=pathlength

world of difference between "merge" and "create"