Neo4j

FourMoBro · ‎10-19-2020

Background:
Neo4j Community edition 4.0.0
APOC 4.0.0.16
GDS 1.3.4

I am looking for information on how to run a similarity analysis between two 'lists' of nodes all at once, rather than one at time.

My schema looks similar to this:

(Node1 {type:'A'})-[:rel1]->(Node2)-[:rel2]->(Node3)-[:rel3]->(Node4)-[:rel4]->(Node5 {name:'xxx'})
(Node1 {type:'B'})-[:rel1]->(Node2)-[:rel2]->(Node3)-[:rel3]->(Node4)-[:rel4]->(Node5 {name:'xxx'})

I can do a one off similarity analysis using gds.alpha.similarity.jaccard to see how similar the Node4 contents compare. The problem is, I have about 100 different Node1s with type 'A' to compare with about 100 Node1s of type 'B'. I would like to do this as "one" procedure, with the results output to a table to visualize, or possibly saving the results back to the database.

Try to think of this problem as comparing 2 different Bills of Material (Node1) used to manufacture an assembly (Node5) at different revisions.

Can someone please advise? Thanks.

UPDATE Found it here: https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/jaccard/ by Table 5.279. I was missing the WHERE p1 <> p2 clause which was causing the query to run forever.

Neo4j

Find all similarity permutations between two sets of nodes