Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-09-2020 01:39 PM
Hi - Neo4j beginner here.
Can the Graph Data Science 'community detection' routines operate on graphs with multiple relationships between each pair of nodes?
For example, if we have a graph in whch each pair of nodes n1 and n2 may be connected via one or more different relationships, like:
(n1)-[:LIVES_NEAR {weight: 0.80}]->(n2)
(n1)-[:HAS_INTERESTS_LIKE {weight: 0.55}]->(n2)
(n1)-[:HAS_JOB_LIKE {weight: 0.65}]->(n2)
... and we want the clustering algorithm to take all of these relationships into account when assigning these nodes to communities. (Would this be an example of a "multigraph"?)
Can we pass all of these relationships into a GDS community detection algorithm, either through wildcards or the pipe operator? I'm guessing yes, but wanted to confirm. And will the algorithm be more likely to assign a pair of nodes to the same community if they have more relationships (and possibly higher relationship weights) between them?
Or do we need to first combine this multigraph into a single graph via some aggregation scheme, and then pass the resulting single graph into the GDS algorithm?
09-16-2020 11:36 AM
Do (n1) and (n2) have a same node label? If so, I am pretty sure you can project all the relationship types between the same node type in memory and run the clustering algorithm. If (n1) and (n2) have different labels, you may want to think of a label that's common to them first. Clustering algorithms typically need to access the similarity between nodes first and so it makes sense to have the nodes of the same label.
09-16-2020 12:03 PM
The GDS library does support multigraphs, so you should see different results when you use many relationships between a pair of nodes instead of just a single one.
You could use either the wildcard or an array of relationships to project more relationship types:
CALL gds.graph.create('type_min','*',
['PRESENTED_FOR','LIKES','LOVES'])
or
CALL gds.graph.create('type_min','*','*')
What you could also try to do is to reduce the multigraph to a single graph and aggregate the relationship weights:
CALL gds.graph.create('min_aggregation','*','*',
{relationshipProperties: {weight: {property: 'weight',
aggregation: 'SUM'}}})
Try out different versions and see what works best for you. In any case, the number of relationships between a pair of nodes should influence the results of community detection. I haven't played around a lot to tell you exactly what will happen, but you can get a quick idea by running the stats mode of community detection algorithms like for example:
CALL gds.louvain.stats()
To get a rough feeling of how different configurations affect the results. If you could share your findings with us, that would be also awesome!
All the sessions of the conference are now available online