Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-06-2018 02:49 PM
Hello All,
I am trying to build a clustering algorithm for products sold for one of the data sets at my work. To begin with, I have used the Northwind Graph to test out the code template.
The below picture gives an idea of the schema, size and complexity of example data base
The code used is as follows:
CALL algo.labelPropagation.stream(
"MATCH (p:Product) RETURN id(p) AS id",
"MATCH (p1:Product)<-[:ORDERS|PURCHASED*]-()-[:PURCHASED|ORDERS*]->(p2:Product)
WHERE id(p1) < id(p2)
RETURN id(p1) AS source, id(p2) AS target, count(*) as weight",
{graph: "cypher",iterations:4})
YIELD nodeId, label
MATCH (p:Product) WHERE id(p) = nodeId
MERGE (sp:SuperCategory {name: "SuperCategory-" + label})
MERGE (p)-[:IN_SUPER_CATEGORY]->(sp)
RETURN nodeId,p.productName, label
It can be seen that I am trying to find the cluster of similar products purchased by a customer in two or more different orders. The result is:
The products have been grouped into two super categories with labels 74 and 76.
I wanted to do the same to a database which has these numbers:
It is quite a complex data base and I used the following code:
MATCH (a:Account)-[:ENROLLED]->(:YearMonth{ym:201002})
CALL algo.labelPropagation.stream(
"MATCH (r:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a) RETURN id(r) AS id",
"MATCH (r1:Regimen)<-[:ISIN|CONTAINS|ORDERED*]-(a)-[:ORDERED|CONTAINS|ISIN*]->(r2:Regimen)
WHERE id(r1) < id(r2)
RETURN id(r1) AS source, id(r2) AS target, count(*) as weight",
{graph: "cypher",iterations:2})
YIELD nodeId, label
with nodeId, label order by label
RETURN nodeId, label
The first match statement projects a very small portion of the big graph for the algorithm whose size I am not sure how to extract from CQL. But I have run the code for 3 hours and it is still running.
I have these questions:
database version: 3.4.9
graph algorithm version: 3.5.0.1
graph algorithm version: 3.4.8.0 (I had to sue a bigger desktop to run the second database which has a older version)
Thank you,
Thrilok
All the sessions of the conference are now available online