Neo4j

lucca_zenobio · ‎06-29-2021

I want to clone a graph and set the nodes attributes of the new copy based on a array of dictionary. I did the following query:

MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
WITH collect(output) as nodes, rootStudent
UNWIND nodes as node
UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;

It works perfectly, but the problem is that when the array is large

[{skill: "name 1", value: 0.9999999863}, {skill: "name 2", value: 0.3}, ...]

I got a MemoryPoolOutOfMemoryError. I am using neo4j Aura and I can't change the neo4j.conf file. Is there anyway that I can optimize this query? Can i split it in multiple parts?

david_allen · ‎07-07-2021

Look into apoc.periodic.iterate and break the query up into two pieces -- the first enumerates what you have to do and the second takes that action on batches. This query probably won't work exactly, but it'll give you the right general process:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output, rootStudent AS node",

"UNWIND
[{skill: "Skill 1", value: 0.9999999863}, {skill: "Skill 2", value: 0.3}]
as score
    WITH DISTINCT score, node, rootStudent
    WHERE node.name = score.skill
    SET node.id = apoc.create.uuid(), node.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 1, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

Note I got rid of one of your UNWINDs. The first query feeds a stream of results to the second mutating query.

lucca_zenobio · ‎07-11-2021

Hi @david.allen ! Thanks for the answer.

I tried the following:

CALL apoc.periodic.iterate(
"MATCH (rootSkill:Skill{name: 'null'}),
(rootStudent:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})
CALL apoc.path.subgraphAll(rootSkill, {relationshipFilter:'PARENT>|DEPENDS_ON>'})
YIELD nodes, relationships
CALL apoc.refactor.cloneSubgraph(
  nodes,
  [rel in relationships WHERE type(rel) = 'PARENT' OR type(rel) = 'DEPENDS_ON'],
  { standinNodes:[[rootSkill, rootStudent]],
  skipProperties:['id'] })
YIELD input, output, error
RETURN output as item, rootStudent",

"UNWIND
[{skill: 'Skill 1', value: 0.9999999863}, {skill: 'Skill 2', value: 0.3}]

as score
    WITH DISTINCT score, item, rootStudent
    WHERE item.name = score.skill
    SET item.id = apoc.create.uuid(), item.score = score.value
    WITH rootStudent
    MATCH (student:Student{id: '9f828c12-5134-409a-846e-bbc5a6463bff'})-[relation:PARENT]->(s:Skill)
    CREATE (student)-[learned:LEARNED {version: 5, created_at: timestamp()}]->(s)
    DELETE relation
    RETURN DISTINCT learned;
", 
    { batchSize: 1000, parallel: false });

it ran with no errors but no node was created.
I tried to return each node from the cloneSubgraph output and pass to the next iterate to process and update accordantly. Any ideas why it doesn't work? Thanks

Neo4j

How to clone a graph and set attribute of new subgraph efficiently?