Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-12-2022 03:27 AM
Hi,
I'm trying to project a graph in order to perform a Label Propagation LPA community detection algorithm and am coming up with this error:
Failed to invoke procedure `gds.graph.project.cypher`: Caused by: java.lang.IllegalArgumentException: Failed to load a relationship because its source-node with id 7 is not part of the node query or projection. To ignore the relationship, set the configuration parameter `validateRelationships` to false.
I'm trying to update an older GA syntax but running into difficulties. Here is the original code:
CALL algo.labelPropagation.stream(
'MATCH (p:Publication) RETURN id(p) as id',
'MATCH (p1:Publication)-[r1:HAS_WORD]->(w)<-[r2:HAS_WORD]-(p2:Publication)
WHERE r1.occurrence > 5 AND r2.occurrence > 5
RETURN id(p1) as source, id(p2) as target, count(w) as weight',
{graph:'cypher',write:false, weightProperty : "weight"}) yield nodeId, label
with label, collect(algo.asNode(nodeId)) as nodes where size(nodes) > 2
MERGE (c:PublicationLPACommunity {id : label})
FOREACH (n in nodes |
MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
)
return label, nodes
And the projection code I'm trying looks like this:
CALL gds.graph.project.cypher(
'publicationsAndDocuments',
'MATCH (n) WHERE n:Publication OR n:Document RETURN id(n) AS id, labels(n) AS labels',
'MATCH (n1)-[r1:HAS_WORD]->(w)<-[r2:HAS_WORD]-(n2) RETURN id(n1) AS source, id(n2) AS target', {validateRelationships:TRUE})
YIELD
graphName AS graph, nodeQuery, nodeCount AS nodes, relationshipCount AS rels
It works if I set the validateRelationships to FALSE - but I need these to perform the community detection. I'm not sure if this is correct - or close to correct at this stage. As you can see, I've not yet managed to fit the 'weighting' of or the word occurrences part.
Would be grateful for any help on this matter
Solved! Go to Solution.
08-12-2022 06:50 AM - edited 08-12-2022 07:32 AM
To facilitate your work you should first create a new type of relationships:
CALL apoc.periodic.iterate("
MATCH (d:Document)-[:HAS_WORD]->(w)<-[:HAS_WORD]-(p:Publication)
RETURN d, p, count(w) AS total
", "
MERGE (d)-[r:COOCCURRED]->(p)
SET r.total = total
", {batchSize: 10000, parallel: true}
)
Then, create the graph projection:
CALL gds.graph.project(
'publicationsAndDocuments',
['Publication', 'Document'],
{COOCCURRED: {orientation: 'UNDIRECTED'}}
)
YIELD graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipCount AS rels
Then, you can execute Label Propagation:
CALL gds.labelPropagation.write(
'publicationsAndDocuments',
{writeProperty: 'community'}
)
YIELD communityCount, ranIterations, didConverge
Then, you create a UNIQUE CONSTRAINT for Community nodes on id property:
CREATE CONSTRAINT constraint_Community_id IF NOT EXISTS FOR (n:Community) REQUIRE n.id IS UNIQUE;
Finally, you create Community node:
CALL apoc.periodic.iterate("
MATCH (n)
WHERE n:Document OR n:Publication
RETURN n
", "
MERGE (c:Community {id: n.community})
MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
", {batchSize: 10000, parallel: true}
)
It should work and perform better. In case of trouble with parallel configuration, you can set it to false.
08-12-2022 05:59 AM
Hello @stephflint 😊
The error message means that you are trying to add a relation but at least one of its nodes has not been projected. That's why you must set the validateRelationships parameter to false if you are fine with this or modify the node/relation projections.
Can you share your data model?
CALL db.schema.visualization()
What sub-graph are your trying to project?
Regards,
Cobra
08-12-2022 06:16 AM
Hi @Cobra , sure here is the data model:
The sub-graph - I'm not entirely sure(!?). What I'm trying to do is create a graph projection of Publication and Document nodes that contain the same keyword, then perform community detection with LPA (Label Propagation) taking the weights (number of occurrences) into account (>5). So in the end I would have the relationship IN LPA COMMUNITY between either a Publication or Document and the community: i.e:
(n) -> [:IN_LPA_COMMUNITY ]-> (c)
My issue is that I don't understand how to create this projecting like in the GA version in GDS - I haven't yet been able to figure it out from the documentation / videos / tutorials
08-12-2022 06:50 AM - edited 08-12-2022 07:32 AM
To facilitate your work you should first create a new type of relationships:
CALL apoc.periodic.iterate("
MATCH (d:Document)-[:HAS_WORD]->(w)<-[:HAS_WORD]-(p:Publication)
RETURN d, p, count(w) AS total
", "
MERGE (d)-[r:COOCCURRED]->(p)
SET r.total = total
", {batchSize: 10000, parallel: true}
)
Then, create the graph projection:
CALL gds.graph.project(
'publicationsAndDocuments',
['Publication', 'Document'],
{COOCCURRED: {orientation: 'UNDIRECTED'}}
)
YIELD graphName AS graph, nodeProjection, nodeCount AS nodes, relationshipCount AS rels
Then, you can execute Label Propagation:
CALL gds.labelPropagation.write(
'publicationsAndDocuments',
{writeProperty: 'community'}
)
YIELD communityCount, ranIterations, didConverge
Then, you create a UNIQUE CONSTRAINT for Community nodes on id property:
CREATE CONSTRAINT constraint_Community_id IF NOT EXISTS FOR (n:Community) REQUIRE n.id IS UNIQUE;
Finally, you create Community node:
CALL apoc.periodic.iterate("
MATCH (n)
WHERE n:Document OR n:Publication
RETURN n
", "
MERGE (c:Community {id: n.community})
MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
", {batchSize: 10000, parallel: true}
)
It should work and perform better. In case of trouble with parallel configuration, you can set it to false.
08-12-2022 07:16 AM
@Cobra cool. I can see how this would help with identifying the connections - one thought, will it affect the LPA (or other algorithms) if there are already new connections ? I'm essentially first trying to replicate what worked on the old GA version in the new GDS version. So this adds a bit more complexity for understanding how to make the LPA stream call to visualise communities - or did it make it simpler ? I'm new to this - it's a learning curve at the moment...
08-12-2022 07:20 AM
I updated my previous answer with all the queries you need. GDS algorithms only use what is in the Cypher projection so my solution should be simpler and better:)
08-12-2022 07:28 AM
@Cobra amazing - thank you. The last part, however fails to commit any operations or complete batches
08-12-2022 07:32 AM
I forget a }, I updated the answer 🙂
CALL apoc.periodic.iterate("
MATCH (n)
WHERE n:Document OR n:Publication
RETURN n
", "
MERGE (c:Community {id: n.community})
MERGE (n)-[:IN_LPA_COMMUNITY]->(c)
", {batchSize: 10000, parallel: true}
)
All the sessions of the conference are now available online