Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-26-2022 07:15 AM
I am trying to run the weighted Jaccard algorithm on my graph (following the Neo4j documentation as reference)
The code:
CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Person1, gds.util.asNode(node2).name AS Person2, similarity
ORDER BY Person1
The code above runs perfectly. However, I want to to filter node1 and node2 to only show results for the nodes that I required. I tried entering a "WHERE node1.name = 'Chair1' " right after my YIELD statement. However, it spews an error. How do I add a WHERE statement to only get the result for nodes that I want and not all of them. (Even in the documentation: Node Similarity - Neo4j Graph Data Science I see that duplicate pairs i.e. Alice-Dave and Dave-Alice are returned).
Solved! Go to Solution.
01-26-2022 07:32 AM
Hello @parthiv3215
CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name
Regards,
Cobra
01-26-2022 07:32 AM
Hello @parthiv3215
CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name
Regards,
Cobra
01-26-2022 07:49 AM
Thank you so much. This works perfectly!
05-15-2022 10:30 PM
Hello, I have a more conceptual question on exactly this kind of solution that I can't find the solution on the official documentation. Running that exact query that Cobra gently provided, what is happening under the hood?
Is gds :
A) calculating ALL the similarities between every node1 and node2 and then filtering the results only for Chair1?
OR
B) Is gds ONLY calculating the results between Chair1 and every other node?
I'd need behaviour B to happen for me, but after some testing with the airport databases it seems that the execution time is shorter without the WHERE clause than with, so my nose tells me that it may be behaviour A. Is there a way to force behaviour B?
05-17-2022 08:59 AM
For the above code snippet, GDS is calculating all the similarities and post-filtering the results (the WHERE
is applied to the result stream from the node similarity algorithm).
More sophisticated filtering for Node Similarity & KNN will be coming in the 2.1 release, so stay tuned, @carlo.martinotti89
05-17-2022 07:34 PM
That's what I suspected! Welp, unfortunate but by now the database is small enough that I can afford doing the complete calculation every time. Thank you for the reply and I'll definitely stay tuned for more releases 🙂
All the sessions of the conference are now available online