cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Count The Number of Isolated Clusters in Subgraph

Hi everyone.

I have this subgraph and I would like to count the number of isolated clusters.

I have tried using the weakly connected components algorithm from the GDS library but it returns a high number of components. I don't know if I am doing a mistake in the cypher projection. Basically, I want to return a subgraph of transactions between entities.

CALL gds.graph.create.cypher(
    'my-cypher-graph',
    'MATCH path = (t1:Payment)-[:HOP*3..10]->(t2:Payment) 
    WHERE (t1)<-[:MAKES]-(:User)<-[:AT]-(t2) 
    AND datetime(t1.createdAt)<datetime(t2.createdAt) 
    AND (((toFloat(t1.amount)-toFloat(t2.amount))/toFloat(t1.amount))<0.25) 
   AND datetime(t1.createdAt) > datetime("2020-01-01") 
   AND datetime(t2.createdAt) > datetime("2020-01-01") 
   UNWIND nodes(path) as t MATCH (e)-[:MAKES]->(t) 
   WHERE e:Client OR e:Commerce WITH t  LIMIT 500  
    RETURN distinct id(t) as id',
    'MATCH (t:Payment)-[:HOP]->(t2:Payment) 
    WHERE (t)<-[:MAKES]-(:User)<-[:AT]-(t2) 
    AND datetime(t.createdAt)<datetime(t2.createdAt) 
   AND (((toFloat(t.amount)-toFloat(t2.amount))/toFloat(t.amount))<0.25) 
   AND datetime(t.createdAt) > datetime("2020-01-01") 
   AND datetime(t2.createdAt) > datetime("2020-01-01") 
   RETURN distinct id(t) AS source, id(t2) AS target', {validateRelationships:false})

I thought that using the GDS library could be the approach but I am open to a different solution
Thanks!

1 ACCEPTED SOLUTION

Hi,

I think something along these lines will give you the group count for a subgraph (query output), I mark isolated graphs across the entire graph, (more on that later), but here is a quick sketch that I think will give you the simple count for the number groups in a query/subgraph, I did some testing on in one of my graphs... The componentCount value is what you are looking for I think.

call gds.wcc.stats(
{
    nodeQuery: 'match (n:EnzymeClass) return id(n) as id',
    relationshipQuery:'MATCH (a:EnzymeClass)-->(b:EnzymeClass) RETURN id(a) as source, id(b) as target'
}
)
YIELD   componentCount,
  createMillis,
  computeMillis,
  postProcessingMillis,
  componentDistribution,
  configuration

Background. I always want to keep a careful eye on this aspect in the complete graph, so I mark nodes with a group id, like this.

call gds.wcc.write(
{
        nodeQuery: 'match (n) return id(n) as id',
    relationshipQuery:'MATCH (a)-->(b) RETURN id(a) as source, id(b) as target',
    writeProperty:'group',
    consecutiveIds:true
}
)
YIELD nodePropertiesWritten
return nodePropertiesWritten;

Then to determine the number of isolated clusters

match (n)
return max(n.group)

and the follow up question I'm curious about is, what do the clusters look like? are most of the nodes in one group? So I may also run a few follow up queries like

match (n)
return n.group, count(n) as group_size
order by group_size desc
limit 50

View solution in original post

1 REPLY 1

Hi,

I think something along these lines will give you the group count for a subgraph (query output), I mark isolated graphs across the entire graph, (more on that later), but here is a quick sketch that I think will give you the simple count for the number groups in a query/subgraph, I did some testing on in one of my graphs... The componentCount value is what you are looking for I think.

call gds.wcc.stats(
{
    nodeQuery: 'match (n:EnzymeClass) return id(n) as id',
    relationshipQuery:'MATCH (a:EnzymeClass)-->(b:EnzymeClass) RETURN id(a) as source, id(b) as target'
}
)
YIELD   componentCount,
  createMillis,
  computeMillis,
  postProcessingMillis,
  componentDistribution,
  configuration

Background. I always want to keep a careful eye on this aspect in the complete graph, so I mark nodes with a group id, like this.

call gds.wcc.write(
{
        nodeQuery: 'match (n) return id(n) as id',
    relationshipQuery:'MATCH (a)-->(b) RETURN id(a) as source, id(b) as target',
    writeProperty:'group',
    consecutiveIds:true
}
)
YIELD nodePropertiesWritten
return nodePropertiesWritten;

Then to determine the number of isolated clusters

match (n)
return max(n.group)

and the follow up question I'm curious about is, what do the clusters look like? are most of the nodes in one group? So I may also run a few follow up queries like

match (n)
return n.group, count(n) as group_size
order by group_size desc
limit 50