cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to have Similarity algo use the node properties?

So I have been searching and watching youtube videos on how to enhance ML with features derived from graphs and want to try it out. I have a node that is a person and that person has some properties (age, education, rent_own, etc) and I currently have have connected to one node called 'h1n1_vax_yes' and another node called 'flu_vax_yes'. Both of these connections indicate that the person has taken either one or both of those vaccines. What I am attempting to do is to use a Similarity algo to find how similar person nodes are to each for all the nodes that took the h1n1 vaccine based on the properties, which i have 35 of (similar to what is shown here at 11:00- https://www.youtube.com/watch?v=LWw94LVhfLk&list=WL&index=2&t=651s) . Looking at the examples, it shows the property of the edge being used as weight not the node properties (https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/). Is there a way to do this or do I have to create a node for all 35 properties and the person? What would be a recommended approach to helping in adding more features to my dataset so that I could improve my predictions?

Perhaps my Google ninja search skills are not up to par in finding this answer....

-Using the latest GDS libs and Neo4j
-Below is basic schema

2 REPLIES 2

If you look at the examples for Cosine Similarity in the docs, you'll see an example of using node properties for similarity calculations:

  MATCH (c:Cuisine)
 WITH {item:id(c), weights: c.embedding} AS userData
 WITH collect(userData) AS data
 CALL gds.alpha.similarity.cosine.stream({
  data: data,
  skipValue: null
 })
 YIELD item1, item2, count1, count2, similarity
 RETURN gds.util.asNode(item1).name AS from, gds.util.asNode(item2).name AS to, similarity
 ORDER BY similarity DESC

The collect takes the node properties (c.embedding) and uses those to calculate similarities between cuisine nodes.

Awesome! Thanks @alicia.frame I must have missed that. Is there a way to have it look at all columns without having to specify each one? In my case, I have 35 columns right now but once I OneHotEncode them, I am going to have a lot more.

Similar to how when you create a node from a csv you can tell it to put all fields as properties? Like below:

CREATE (n:Node)
SET v += row