Neo4j

depire · ‎08-30-2020

Dear all,
I try to use seedProperty in wcc algo.
My goal is to compile wcc only for a subgraph, in order to reduce time.

Assume that A->B and C->D
After computing wcc algo, i obtain:
ID CLUSTER
A 0
B 0
C 2
D 2

and to rerun the wcc algo only on subgraph with nodes (C and D), but without seedProperty, I obtain
ID CLUSTER
A 0
B 0
C 0
D 0

Of course, it is wrong because A, B,C and D are not in the same component.
To correct this, I add the option "seedProperty", but I have an error.

I obtain a error Failed to invoke procedure gds.wcc.write: Caused by: java.lang.NullPointerException.

I give you an example.
CREATE (a:LEU{LEID:"A"})
CREATE (b:LEU{LEID:"B"})
CREATE (c:LEU{LEID:"C"})
CREATE (d:LEU{LEID:"D"})

MATCH (parent:LEU {LEID:"A"})
MATCH (child:LEU {LEID:"B"})
MERGE (parent)-[l:REL]->(child)
RETURN parent.LEID, child.LEID

MATCH (parent:LEU {LEID:"C"})
MATCH (child:LEU {LEID:"D"})
MERGE (parent)-[l:REL]->(child)
RETURN parent.LEID, child.LEID

// create graph
CALL gds.graph.create.cypher('graph', 'MATCH (n:LEU) RETURN id(n) AS id',
'MATCH (p:LEU)-[r:REL]->(c:LEU) RETURN id(p) AS source, id(c) AS target')
YIELD graphName, nodeCount, relationshipCount, createMillis

//make wcc
CALL gds.wcc.write('graph', { writeProperty: 'cluster' })
YIELD nodePropertiesWritten, componentCount

//delete graph
CALL gds.graph.drop('graph') YIELD graphName

// display info
MATCH (p:LEU) RETURN p.LEID, p.cluster
// A 0
// B 0
// C 2
// D 2

// Create subgraph
CALL gds.graph.create.cypher('graph', 'MATCH (n:LEU) WHERE n.cluster = 2 RETURN id(n) AS id',
'MATCH (p:LEU)-[r:REL]->(c:LEU) WHERE p.cluster = 2 and c.cluster = 2 RETURN id(p) AS source, id(c) AS target')
YIELD graphName, nodeCount, relationshipCount, createMillis

//make wcc
CALL gds.wcc.write('graph', { writeProperty: 'cluster' })
YIELD nodePropertiesWritten, componentCount

//delete graph
CALL gds.graph.drop('graph') YIELD graphName

//display info
MATCH (p:LEU) RETURN p.LEID, p.cluster

CALL gds.wcc.write('graph', { seedProperty: 'cluster', writeProperty: 'cluster' })
YIELD nodePropertiesWritten, componentCount

If you have any idea, pease help me.

Best regards,
Alexandre

alicia_frame · ‎08-30-2020

The first time you try to WCC without the seed property, you're overwriting the original community ID with the new community ID. Because communities are indexed starting at 0, you overwrite the original community ID with 0. It's not that the algorithm thinks they're in the same component, but that it's not aware that you already have a community 0.

You're on the right track with using seedProperty but you need to load your cluster IDs as a seed property:

CALL gds.graph.create.cypher('graph', 
'MATCH (n:LEU) WHERE n.cluster = 2 RETURN id(n) AS id, n.cluster as seedProperty',
'MATCH (p:LEU)-[r:REL]->(c:LEU) WHERE p.cluster = 2 and c.cluster = 2 RETURN id(p) AS source, id(c) AS target');

You're getting an NPE because there's no seed property loaded into the second graph. If you use the above command, you'll get the results you expect.

depire · ‎08-31-2020

Thanks for the answer.

I have nevertheless a problem.
If I run the algo the first time, I receive in the cluster 0 {A, B} and in the cluster 2 {C, D}. Perfect.
If I rerun the algo only on subgraph with C and D, I find the id cluster 2. So it is OK.

But if I delete the REL between C and D, I receive that the id cluster is always 2 for C and D. Although C and D will not be in the same cluster.

How do you solve my problem ?

My goal is to recompute locally cluster id, if only some relatsionhips are modified; is it feasible with neo4j ?

Best regards,

alicia_frame · ‎09-01-2020

Seeding is intended to tell the algorithm: "these are the existing communities, use these IDs as a starting point to calculate communities for new data" -- it won't change the community IDs that you've passed in as seed properties, those are viewed as truth.

If you're removing relationships, it's probably best to just run WCC again -- the algorithm is super fast to run.

If you want to delete relationships and re-run WCC on subgraphs, I would recommend you write the second round of communities into a second property (eg cluster_2), and if you want to merge them, you can post process with Cypher.

nick_diquattro · ‎09-14-2021

Hi @alicia.frame1 , this is so close to solving a problem I'm having. Do you have an example of the post-processing/merging code?

Basically, I would like to persist the communityID from one run of WCC to the next with handling of splits (e.g., a single community being identified as two in the 2nd wcc run).

I'm thinking of labelling a particular node using degree centrality as the "flag bearer" of the old id, but open to any suggestions / not sure how to make this happen.

Thanks so much!

Neo4j

Gds.wcc.write + seedProperty