Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-21-2021 11:25 AM
I am struggling trying to refactor Jaccard Similarity algorithms previously running successfully in Neo4j 3.4 to the new Node Similarity algorithm in Neo4j 3.5.26 and GDS 1.1.1. There was never a memory issue prior to using the GDS plugin, now it is blocking our progress and motivating us to look elsewhere for scale. Here are the particulars:
|graphName|nodeCount|relationshipCount|
|myJadeThemeGraph|2670295|187|
CALL gds.nodeSimilarity.stream('myJadeThemeGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING, Theme1, Theme2 limit 10
My result:
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (130 GiB) exceeds current free memory (24 GiB).
Again, executing Jaccard prior to GDS worked fine. Now gds requires huge amounts of memory to do the same calculations.
04-21-2021 01:12 PM
I have reduced the size of the projection even further by executing:
CALL gds.graph.create.cypher(
'myJadeThemeGraph',
'MATCH (n) WHERE n:Guest AND n.member_tier= "Jade" OR n:Theme RETURN id(n) as id',
'MATCH (n:Guest)-[pt:PLAYS_THEME]->(m:Theme) where n.member_tier = "Jade" and pt.weight > 10
RETURN id(n) AS source, id(m) as target, type(pt) as type, pt.weight as weight'
)
The projection is reduced to:
Node Count: 2594
Relationship Count: 187
Running the same gds.nodeSimilarity.stream() as before, memory requirements still exceedingly high - even employing TopK
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (24 GiB).
04-21-2021 05:10 PM
Such a small graph shouldn't be triggering that error message - can you update to GDS 1.1.6 (the latest 3.5 compatible branch)?
You'll also want to make sure you don't have other in-memory graphs hanging around - you can use CALL gds.graph.list()
to make sure you're not using up memory there, and drop them if they are there.
04-21-2021 10:21 PM
Thanks Alicia. We will upgrade to GDS 1.1.6 and increase our HEAP allocations as well. I will update you on status when complete.
04-22-2021 10:16 AM
Well, the upgrade to GDS 1.1.6 using same size graph projection:
Node Count: 2594
Relationship Count: 187
Calling gds.nodeSimilarity.stream:
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
Resulted in following error (note i really reduced the potential return by using topK):
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (31 GiB).
04-22-2021 11:58 AM
Hm - that seems like a bug. I've created an issue with the engineering team and we'll keep you posted.
In the meantime, you can override the memory guards by specifying sudo:TRUE
in your algo config:
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1, sudo:TRUE })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
That should disable the guardrails - and if you actually do have enough memory, it will run fine, if not ... then you may OOM the database.
04-22-2021 01:03 PM
Thanks Alicia. If there is any information that you and your team needs, please reach out. Will try the sudo parameter and get back to you.
04-22-2021 01:10 PM
Taking the "memory guards" off, gds.nodeSimilarity.stream() completed in 30ms.
Does that confirm a "bug?"
04-22-2021 01:11 PM
.... pretty sure it's a bug then. We've created a card, and I'll reach out if we have any more questions!
All the sessions of the conference are now available online