cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Graph Data Science Library: Jaccard similarity

The Graph Data Science Library is a boon to a non-developer data architect like me.

Is it possible to compare the similarity of nodes with one label in their interaction with nodes with a different label using the Graph Data Science Library? It has the appearance of only being able to recognise relationships with nodes with the same label. Documentation on the format of the code in order to be able to adapt the automatically generated code, if it exists, is not obvious to me.

If it is not currently possible because it is an alpha tier algorithm, can I ask neo4j staff to put this on the development backlog, please?

The context is that, I am wanting to compare Business Processes according to the Entity Types they reference (create, read or update).

Last year, I posted an appeal for help with links to data etc using the algo which was met with some support but no success, sadly.

All feedback gratefully received.
Yours aye,
Douglas

2 REPLIES 2

This sounds like you're looking for nodeSimilarity - you can specify your source node type (business process?) and target node type (entity type?). Check out the example in our documentation -- does that match your use case?

Alicia,
Thank you for coming back to me. Firstly, I should make a correction to my previous post - I was using the Neuler app; so my request was that the alpha tier functionality be adapted to allow bipartite queries.

Second,and to answer your question:
Yes, the example you have is a bipartite graph and I struggled with the syntax to the same degree as before. On the positive side I have used the gds.alpha.similarity.jaccard algo (see below) which produces the results as a table.
So my original query now morphs into three new ones:

  1. what is the syntax to include a similarityCutoff parameter?
  2. what is the syntax to apply importance to types of relationships? In my case, a process that Creates a node (entity in relational terms) carries more weight than an Update and, in turn a Read relationship. So I would have weights of 8, 4 and 1 respectively.
  3. how do I get the results to display as a graph?

By the way, what is the difference between algo.nodeSimilarity and similarity.jaccard?
Kind regards,
Douglas

MATCH (p1:Logical_Business_Process)-[:read|:write|:read_write]-(e1:Entity_Type)
WITH p1, collect(id(e1)) AS p1entity_type
MATCH (p2:Logical_Business_Process)-[:read|:write|:read_write]-(e2:Entity_Type) WHERE p1 <> p2
WITH p1, p1entity_type, p2, collect(id(e2)) AS p2entity_type
RETURN p1.Name AS from,
p2.Name AS to,
gds.alpha.similarity.jaccard(p1entity_type, p2entity_type) AS similarity
ORDER BY similarity DESC