Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-08-2021 02:51 PM
So I was a big fan of Neo4j v 3.4.12 and the state of graph algorithms there; in particular Jaccard Similarity. Ever since upgrading, the gds library has had Jaccard in Alpha status.
Question for Alicia and her team: Are you moving everyone over to Node Similarity? And Jaccard will eventually be deprecated?
Thanks.
03-08-2021 09:52 PM
Check this link especially recent update by abk.
03-08-2021 11:13 PM
I think we've found that the node similarity procedure solved better solved the problem that users had than the Jaccard one.
With Jaccard you had to build up lists of arrays before computing, whereas with node similarity it computes it based on the graph structure. And the majority of users can solve their problems with node similarity.
Do you have some old code that uses Jaccard n Graph Algos and you're trying to translate it to GDS? Perhaps I can help you translate it if you share the query.
03-09-2021 03:04 PM
Mark: Here is the query. Determining similarity via Jaccard of "Song" nodes....
MATCH (Guest{member:'Purple'})-[:PLAYS_SONG]->(t:Song)
WITH {item:id(t), categories: collect(id(Guest))} as userData
WITH collect(userData) as data
CALL algo.similarity.jaccard(data, {similarityCutoff:0.2, write:true, writeRelationshipType:'SIMILAR_PURPLE', writeProperty:'score_purple'})
YIELD nodes, similarityPairs, stdDev, p25, p50, p75, p90, p95
RETURN nodes, similarityPairs, stdDev, p25, p50, p75, p90, p95
03-09-2021 11:53 AM
We do consider Jaccard part of the algorithms library, but as you correctly guess, and @markhneedham explained - for performance at scale, Node Similarity or KNN are better choices than Jaccard or Cosine similarity. Node similarity uses the jaccard similarity metric, but it leverages neighboring nodes instead of properties or lists.
We don't have any plans to deprecate or remove Jaccard, but we're not currently working on promoting it to the beta tier. Do you have a use case that can't be addressed with Node Similarity?
03-09-2021 03:48 PM
Also, since now i will be using gds 1.1.1 and using graph projections, I don't seem to be able to create a graph projection where i reduce the size of the projection filtering on node properties. I see the template below, but i seem unable to
03-09-2021 03:49 PM
...continuing. using the template below, i seem unable to filter the projection based upon node properties:
CALL gds.graph.create(
'my-graph', {
City: {
properties: {
stateId: {
property: 'stateId' SAY WHERE 'STATEID' = "CA"
},
population: {
property: 'population'
}
}
}
},
'*'
)
YIELD graphName, nodeCount, relationshipCount;
03-10-2021 03:06 AM
So for your first query, you can compute the similarity between all guests like this:
CALL gds.nodeSimilarity.stream({
nodeProjection: ['Guest', 'Song'],
relationshipProjection: "PLAYS_SONG"
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1) AS node1,
gds.util.asNode(node2) AS node2,
similarity
And then if you wanted to create a graph first it'd look like this instead:
Create graph:
CALL gds.graph.create(
'myGraph',
['Guest', 'Song'],
"PLAYS_SONG"
);
Run algorithm:
CALL gds.nodeSimilarity.stream("myGraph")
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1) AS node1,
gds.util.asNode(node2) AS node2,
similarity
03-10-2021 05:31 AM
Thank you very much Mark. One other issue:
In our graph model, there exists > 2.6 million Guests. To reduce the size of a graph projection and reduce the memory footprint, we wish to filter the Guests based on a particular node property (call it 'tier'). I have been reviewing the documentation and the only possible method is to use a parameter identifying the value of 'tier' that we wish to filter? Is this correct or is there another way?
Thank you again.
All the sessions of the conference are now available online