cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Node similarity algorithm Clustering

Can anyone help me !!
i want to group similar nodes in the same cluster but i don't know how i can do this because node similarity algorithm compare two nodes and return the value of similarity
Please i need help

12 REPLIES 12

Hi,

I think the challenge begins with establishing the definition of a similarity score.

There are an almost infinite number of ways to produce a similarity score/value between two nodes? Some examples off the top of my head, 1. an evaluation of the node labels, and number of properties and exact match comparisons 2. comparison of the two nodes relationships (both have relationships to other nodes?) 3. or more specific like if a node property contains a chemical structure descriptor and a chemical structure comparison is needed (e.g. Tanimoto score) Or a combination of several comparisons to use a score as appropriate for the situation, etc.

Can you tell us more about the specific use case and what you had in mind?

Also, I wonder what will you do with the similarity scores? Do you envision simply returning a list of 'similar" nodes given one input node? or would it be useful to create new relationships, for example (a)-[r:IS_SIMILAR_TO->(b) and maybe assign a r.score property to the relationship? Just some example thoughts, not comprehensive for sure.

Thanks a lot. I have a list of clients who have relationships with each other I want to know the clients who have the same behavior

To follow up on the "group assignment" question, there are two obvious ways, I can think of at the moment.

  1. create new relationships, and perhaps assign each relationship a score value and a group number
  2. assign a group (e.g. integer) property to each node

There are pros and cons to each approach.
#1 might add a lot of new relationships to the graph, if new to cypher (and using promiscuous queries) it might be more challenging to query. neo4j can handle it though.
#2 (as I defined it) would only allow a node to be a member of one group

note: I've used the weakly connected components algorithm to identify graph islands and assign every member of each graph island a unique "group" number, for my use case I'm interested in knowing if I have any graph islands.

I'm not sure graph island is a standard term, so I'll provide an example. There are seven graph islands in this viz below.

Note: this is example is a public dataset...

i want to put similar nodes in a cluster

which algorithm you use to separe each group please

I added this link above... Note, this identifies islands, not similarity.
weakly connected components algorithm

thanks but it's not my case

Have you defined what you mean by node similarity and can calculate it?

If yes, have you considered using either of my two suggestions as approaches for grouping nodes?

  1. create new relationships, and perhaps assign each relationship a score value and a group number
  2. assign a group (e.g. integer) property to each node

suppose that i have 100 nodes i want to group similar nodes in a cluster i see the node similarity algorithm but it compare two nodes and return the value of similarity . the first issue i don't have the number of clusters that's mean i should use unsupervised algorithm.

I'm not sure this is what you want (as you imply using ML), but it may still be of interest for comparison. There are several well know "similarity" algorithms, I would add that these do not take into account properties of each node

Similarity Algorithms Note: updated link to GDS

i want to use ML to extract the customers that have the same behavior

i don't know if node similarity algorithms can help me to resolve this issue or not