cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

lablePropogation algorithm: keeps running(infinite loop?), how do I know the details of the subgraph

Hello All,

I am trying to build a clustering algorithm for products sold for one of the data sets at my work. To begin with, I have used the Northwind Graph to test out the code template.

The below picture gives an idea of the schema, size and complexity of example data base


2X_b_b1ad85f877f6059bd3075edb04ebea74014afd84.png

The code used is as follows:

CALL algo.labelPropagation.stream(
"MATCH (p:Product) RETURN id(p) AS id",
"MATCH (p1:Product)<-[:ORDERS|PURCHASED*]-()-[:PURCHASED|ORDERS*]->(p2:Product)
WHERE id(p1) < id(p2)
RETURN id(p1) AS source, id(p2) AS target, count(*) as weight",
{graph: "cypher",iterations:4})

YIELD nodeId, label
MATCH (p:Product) WHERE id(p) = nodeId
MERGE (sp:SuperCategory {name: "SuperCategory-" + label})
MERGE (p)-[:IN_SUPER_CATEGORY]->(sp)
RETURN nodeId,p.productName, label

It can be seen that I am trying to find the cluster of similar products purchased by a customer in two or more different orders. The result is:

The products have been grouped into two super categories with labels 74 and 76.

I wanted to do the same to a database which has these numbers:

It is quite a complex data base and I used the following code:

MATCH (a:Account)-[:ENROLLED]->(:YearMonth{ym:201002})
CALL algo.labelPropagation.stream(
"MATCH (r:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a) RETURN id(r) AS id",
"MATCH (r1:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a)-[:ORDERED|CONTAINS|ISIN*]->(r2:Regimen)
WHERE id(r1) < id(r2)
RETURN id(r1) AS source, id(r2) AS target, count(*) as weight",
{graph: "cypher",iterations:2})
YIELD nodeId, label
with nodeId, label order by label
RETURN nodeId, label

The first match statement projects a very small portion of the big graph for the algorithm whose size I am not sure how to extract from CQL. But I have run the code for 3 hours and it is still running.

I have these questions:

  1. Is there a way to find the information of the sub-graph being used in algorithm, like the number and albels of nodes and relationships.
  2. What could be causing the code to run for so long? how do I check if its not running in an infinite loop?

database version: 3.4.9
graph algorithm version: 3.5.0.1
graph algorithm version: 3.4.8.0 (I had to sue a bigger desktop to run the second database which has a older version)

Thank you,
Thrilok

0 REPLIES 0