Neo4j

accounts2 · ‎07-12-2020

Hi,

I'm hoping someone can point me to a resource for a strategy or documentation on overwriting relationships created by gds nodeSimilarity.

My use case is a product database with new products constantly being added. I'd like to compute the similarity between products, and then rerun the similarity algorithm either when new products are added or daily.

Is the best strategy to delete all relationships and rewrite them as new products are added? Or is there a better method I can use to overwrite the similarity between products?

Thanks in advance.

MuddyBootsCode · ‎07-18-2020

Hello, welcome to the community.

I'm sure you've already been through the docs but in case you haven't here's a link to the Node Similarity page https://neo4j.com/docs/graph-data-science/current/algorithms/node-similarity/

My initial thought would that it would just be easier to run the similarity algorithm daily and then also remove duplicate relationships. Both operations are fairly computationally expensive but that's kind of the brute force way to go about it. You can remove duplicate relationships like the similarity ones with:

match ()-[r]->() 
match (s)-[r]->(e) 
with s,e,type(r) as typ, tail(collect(r)) as coll 
foreach(x in coll | delete x)

You can of course specify relationships that you're targeting as well. I know it's not an elegant solution but unless Neo4j adds an option to limit relationships at some point it seems to be the simplest way.

Neo4j

Overwriting Relationships using nodeSimilarity in gds strategy