cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to delete duplicate relationships after applying Node Similarity Algorithm

After running node similarity algorithm, I created the SIMILAR_TO relationship between two nodes. Since the node similarity algorithm will always produce two way relationships between the node pairs, how do I write cypher to keep only one relationship and delete the other ? The image below is an example of the two-way relationships generated by the node similarity algorithm.

9 REPLIES 9

You could do something like

MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2

just one note - node similarity doens't necessarily create similarity relationships that are symmetric; if you have topN set, you may end up with a relationship in one direction, but not the other - but this cypher would only pick up the bidirectional relationships.

Thank you Alicia ! This is going to be great help to my project 🙂

I have the same issue. and the problem with your query is that it will return result twice and delete both relations. I don't have a solution yet. I have tried quite a few paths now but i can not find the right path forward.

this I have tested, and no one with the wanted result. some of them remove all, some remove most of the relations.

MATCH (n1)-[r1:SIMILAR_TO]->(n2)-[r2:SIMILAR_TO]->(n1)
DELETE r2
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)
//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)
CALL apoc.periodic.iterate(
  "MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a) RETURN r",
  "DELETE r",
  {batchMode: "SINGLE", parallel:false})

MATCH (a)-[r:SIMILAR_TO]->(b)-[r2:SIMILAR_TO]->(a)
WITH a, b, r.score AS score, COLLECT(r)[1..] AS unwanted
FOREACH(x IN unwanted | delete x)

I start with a graph that looks like this:

I then use one of the proposed algorithms

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
WHERE r.score = r2.score
with s,n,type(r) as t, tail(collect(r)) as coll
foreach(x in coll | delete x)

and this is the results is not as expected. For some nodes all relationships has been removed, for some nodes all is intact and some nodes have as i want it to be, only one relation

Now i see a typo in my previous post, it says
MATCH (s)-[r:SIMILAR_TO]->(n), (a)<-[r2:SIMILAR_TO]-(n)
but should be
MATCH (s)-[r:SIMILAR_TO]->(n), (s)<-[r2:SIMILAR_TO]-(n)

will test it and come back with my findings

Now nothing happens with that query that i had a typo in before.

The query below removes 277 relationships of total 392 where all is symmetric so it should only be 196 that should be removed. Why does it some time remove both relationships and sometime not.

//remove all dual link of similarity
MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)

result in query, needed to add the 8610083 to the query to show that it had no relations

This seems to be working:

MATCH (s)-[r:SIMILAR_TO]-(n)
with s,n,type(r) as t, collect(r) as coll 
foreach(x in tail(coll) | delete x)

This removed most relationships, not double links! Not a solution

lingvisa
Graph Fellow

Is there a way to configure the similarity algorithms to not create double links, so that it doesn't need to be deduplicated afterwards? For large graph, these double links slow down and increase memory usage.

For clarification, did you use the topN or topK parameter?

Because for topK the result is not symmetric anymore.