cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Text Similarity: Compare text property of one node to all other nodes and create relationship

I have a graph database which will be populated with nodes containing text messages. Every time a node is saved, I need to calculate the similarity with respect to other nodes. the similarity metric can be any of these [https://neo4j.com/docs/labs/apoc/current/misc/text-functions/#text-functions-text-similarity] available within APOC. When the similarity is more than (say) 0.5, the query should establish a relationship SIMILAR_TO among those nodes compared.

My graph looks kind of like this:

As of now, this is a learning project/PoC.
I am looking for a cypher query or a stored procedure.
Can someone give me pointers on how to structure the query and anything else I must know before doing this?

I am aware that the complexity will increase exponentially as the nodes increase. But for now, I am not worrying about that.

I am using Neo4j version: 4.0.3 and python driver to create nodes.

Thanks.

2 REPLIES 2

You can just when you create your node, after insertion do the comparision and create the relationship.

CREATE (m:Message {...})
MATCH (o:Message) 
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

Thank you Micheal.

This is the error I was getting.

Neo.ClientError.Statement.SyntaxError

WITH is required between CREATE and MATCH (line 2, column 1 (offset: 48)) "MATCH (o:Message)"

I added a WITH

CREATE (m:Message {...})
WITH m #Edit
MATCH (o:Message) 
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

This creates a relationship of the created node with itself too.

So then I matched the node first and then created the node like this:

MATCH (o:Message) 
WITH o
CREATE (m:Message {...})
WITH o,m, apoc.text.distance(m.Text, o.Text) as similarity
WHERE similarity > 0.5
CREATE (m)-[:SIMILAR {similarity:similarity}]->(o)

But this creates 2 additional nodes which I don't seem to get how that would happen.