Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-04-2020 01:21 AM
So i'm using the "PIMA INDIANS DIABETES DATASET".
And i made these type nodes:
I want to find the similarity between all the persons whose age is between 21-25 and who have been diagnosed with diabetes.
I want my answer something like this:
BMI similarity: 0.82
BP similarity: 0.67
I have seen all the graph algorithms but i didn't find anything relevant.
can we achieve this using Neo4j?
Ps. All the examples i have seen uses similarity between same type of relation.
Solved! Go to Solution.
02-06-2020 07:21 AM
You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.
For the musical intrument example, if we add in a Place
node and a LIVES_IN
relationship, you could use a cypher projection like this:
CALL algo.nodeSimilarity.stream(
'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id',
'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})
02-04-2020 11:05 AM
Welcome to the community. Can you describe more of what you mean by similarity or give a link to an example you have considered?
02-04-2020 09:06 PM
Ok so i have a dataset which contains the following columns.
1.ID
2.AGE
3.BMI
4.BP
5.INSULIN
6. OUTCOME
I Took Id as a node and added age as it's property.
Then i made separate nodes of all other columns like BMI, BP, INSULIN etc.
I have made relationship such that each "ID" has nodes to connected to their "BMI", "BP","INSULIN" etc values.
Now my query is this:
"Find the mean of BMI of all the persons whose age is <=25?"
Is creating a dedicated node set for each column the most efficient way?
02-05-2020 07:38 PM
Our node similarity algorithm calculates the similarity of nodes based on their neighboring nodes (think of a (:Person)-[:LIKES]->(:Instrument)
graph -- we measure how similar Person
nodes are based on the number of the same Instrument
s they like vs. the number of different ones.
If you wanted to use that algorithm, you would need to the things you want to measure similarity on (eg. outcomes) into nodes. If you have a schema where Person
is a node label with age
and id
attributes, and Outcome
is a node label with a description
attribute you could use nodeSimilarity in this way:
CALL algo.nodeSimilarity.stream(
'MATCH(p:Person) WHERE p.age < 25 RETURN id(n) as id',
'MATCH (p:Person)-[:HAS_OUTCOME]->(o:Outcome) RETURN id(p) as source, id(o) as target',
{graph:'cypher')
In your reply to @nsmith_piano, you're asking about a mean value. Check out our documentation on aggregating functions here: https://neo4j.com/docs/cypher-manual/current/functions/aggregating/ .
02-06-2020 01:36 AM
Thanks for the info. As you mentioned, the node similarity calculates similarity for only one type of relationship "LIKES" in your example. Like 'A like guitar and piano", "B likes keyboard and guitar". So they are 50% similar. What i want is "A likes guitar and lives at London", "B likes piano and lives at Mumbai"., so "A and B are 50% similar as they like same instrument but stay at different place. I know we can do this by measuring similarity to relation "LIKES" once, and then with "LIVES" once. But what if i want to compare using two relations at the same time? Btw, sorry if i framed the question wrong. I was just confused.
02-06-2020 07:21 AM
You can combine multiple node and relationship types for the purpose of running an algorithm -- either by pre-loading a named graph (see section 2.3.4 loading multiple relationship types and node labels), or by using a cypher projection that references the nodes and relationships you want to consider.
For the musical intrument example, if we add in a Place
node and a LIVES_IN
relationship, you could use a cypher projection like this:
CALL algo.nodeSimilarity.stream(
'MATCH(n) WHERE n:Person or n:Instrument or n:Place RETURN id(n) as id',
'MATCH (s:Person)-[]->(t) RETURN id(s) as source, id(t) as target',
{graph:'cypher', direction:'outgoing'})
02-06-2020 10:39 PM
Solved my issue.Thanks a lot!
02-10-2020 08:16 AM
Hey Alicia, great solution!
How can we return the node label instead of node id?
04-06-2021 11:52 AM
Hi Alicia! I have done this in a similar structure, but the algorithm takes too long. I'm using four labels of nodes (Client and data from them: range of income, age, business line, etc.) and three types of relationships in a named graph (using gds), what could be happening?
04-06-2021 12:36 PM
Can you share your code? And how many nodes / relationships are in your graph?
04-06-2021 12:43 PM
Of course! Thank you!
Here I create the named graph and execute the algorithm:
--Client job graph
CALL gds.graph.create("client-job-graph", ["Client", "BusinessLine", "EconomicActivity","MonthlyIncome"],
["HAS_BUSINESS_LINE", "HAS_ACTIVITY", "HAS_MONTHLY_INCOME"]) YIELD nodeCount, relationshipCount;
CALL gds.nodeSimilarity.write("client-job-graph", {
writeRelationshipType: "SIMILAR_J",
writeProperty: "score_j",
degreeCutoff: 3,
topK: 5
})
In the named graph there's 528,739 nodes and 1,586,139 relationships, almost all of the nodes are Client nodes, since the other ones are sort of categories.
02-10-2020 09:10 AM
You can use the asNode
function -- in the YIELD
statement, return the nodeId, and then you can use algo.asNode
to access labels and attributes. For example:
CALL algo.nodeSimilarity.stream('Person | Instrument', 'LIKES', {
direction: 'OUTGOING'
})
YIELD node1, node2, similarity
RETURN algo.asNode(node1).name AS Person1, algo.asNode(node2).name AS Person2, similarity
ORDER BY similarity DESCENDING, Person1, Person2
05-29-2020 04:35 AM
Hi Alicia
I think nodeSimilarity is now deprecated, I tried to run this cypher projection with jaccard similarity but i get an error "Procedure call does not provide the required number of arguments: got 3 expected 2."
04-14-2020 09:42 AM
I have similar question.
How can we apply node similarity based on edge property value?
I have graph in which stock names are node.
Dates are node.
And price links node with dates.
So how to apply node similarity for different stocks?
06-10-2020 09:15 AM
@mangesh.karangutkar Node Similarity has not been deprecated: https://neo4j.com/docs/graph-data-science/current/algorithms/node-similarity/
The error message you received from jaccard indicates that you've provided more inputs that it expects. The jaccard function expects a pair of inputs (the two nodes being compared); perhaps that's the issue. I would look to the docs for more information on the syntax: https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/jaccard/
02-03-2021 06:46 AM
Please Could you help me...how to find out node similarity algorithm between nodes without relationships.
Thank you .
02-03-2021 07:38 AM
Nodes can then be just considered as classes the way we treat them in OOPS. You can write your own algorithms either for finding or comparing similarities between two classes/nodes.
But then that's your design and you need to tailor the algorithm as per your needs. If you need more help you need to be more verbose/specific on what exactly you want
Thanks
Sameer
04-06-2021 12:35 PM
If you don't have any relationships, you'd need to use node properties to calculate similarity on - check out KNN or Cosine Similarity. Those can create relationships between nodes that have similar properties, but no relationships.
04-09-2021 12:08 PM
Does it execute too slowly? Or not at all? Usually the first thing I recommend is adding a degree cutoff and setting topK, but you've done that already.
You can take a peek at the debug logs to check on progress - as NodeSimilarity runs, it will print the percentage of each stage that's complete.
One thing you can try is to first run WCC on your client-job-graph
and then run node similarity on individual components - this breaks the problem up and makes it much faster.
04-12-2021 01:31 PM
It executes slowly, it does finish but after an hour or hour and a half. I will be trying your suggestions and comment on the results, thanks a lot, Alice!
10-28-2021 08:00 AM
Hello, I have a strange use case but probably is related to this topic. If not, let me know if a new topic is needed.
I've some object Item (orange), String (blue) and Condition (grey) that defines my items.
## DEMO OBJ
MERGE(it1:Item {name: "AAA"})
MERGE(it2:Item {name: "BBB"})
MERGE(st1:String {value: "stringAAA"})
MERGE(st2:String {value: "stringBBB"})
MERGE(it1)-[:uses]->(st1)
MERGE(it2)-[:uses]->(st1)
MERGE(it1)-[:uses]->(st2)
MERGE(it2)-[:uses]->(st2)
MERGE(p1:Part {id: 1, type: "And", value: "-"})
MERGE(sp11:Part {id: 2, type: "All", value: "-"})
MERGE(sp12:Part {id: 3, type: "Set", value: "-"})
MERGE(p1)-[:then]->(sp11)
MERGE(p1)-[:then]->(sp12)
MERGE(p2:Part {id: 4, type: "And", value: "-"})
MERGE(sp21:Part {id: 5, type: "All", value: "-"})
MERGE(sp22:Part {id: 6, type: "Set", value: "-"})
MERGE(p2)-[:then]->(sp21)
MERGE(p2)-[:then]->(sp22)
MERGE(it1)-[:from]->(p1)
MERGE(it2)-[:from]->(p2)
## CREATE GRAPH
CALL gds.graph.create(
'myGraph',
['Item', 'String', 'Part'],
{
uses: {
type: 'uses'
},
from: {
type: 'from'
},
then: {
type: 'then'
}
}
);
## SHOW SIMILARITY
CALL gds.nodeSimilarity.stream('myGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Item1, gds.util.asNode(node2).name AS Item2, similarity
ORDER BY similarity DESCENDING, Item1, Item2
## WRITE SIMILARITY BACK
CALL gds.nodeSimilarity.write('myGraph', {
writeRelationshipType: 'SIMILAR',
writeProperty: 'score'
})
YIELD nodesCompared, relationshipsWritten
How do you suggest to approach this use case?
Thanks
All the sessions of the conference are now available online