cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Decoupling two queries

y-pankaj
Node Link

I want to run two independent queries in the same query. First I want to create nodes using the nodesData object and then I want to create some relationships using edgesData. Running the two queries one by one gives expected result but combining them produces multiple edges between nodes.

session.run(
          // create nodes
        ` WITH $nodesData as value
          UNWIND value as data
          CALL apoc.create.node([data.class], data)
          YIELD node

          // breakpoint. Running above query and then below query gives
          // expected result.
          WITH $edgesData as value
          UNWIND value AS data  
          MATCH (n {newtId: data.source}), (m { newtId: data.target})  
          WITH n, m, data
          CALL apoc.create.relationship(n,data.class,data,m) 
          YIELD rel  
          RETURN rel
            `,
          { nodesData: nodesData, edgesData: edgesData }
        )

I suspect that this might be due to how data is carried over between the statements. Such as, maybe I can't use `WITH $edgesData as value` just after `YEILD node`. Maybe I should somehow drop the records then use `WITH $edgesData as value` statement. But I am not sure.

What's the issue here?
Also, if possible please share some resource explaining how data is organised/ carried between statements in Neo4j.
Thanks.
1 REPLY 1

The issue is one of cardinality: Cypher operations yield rows. Cypher operations execute per row. This is a critical understanding to keep in mind, that the data you're generating (and operations executing!) in the second query is dependent upon the data in the first query.

Since you're yielding > 1 rows from the first part of the query, subsequent operations are executing per row, redundantly. That's unnecessarily multiplying out not only the work that is being done, but the results yielded at the end.

So the question is, how do we make the data independent? We can aggregate, so we collect the nodes into a single row of nodesData or nodeCount (cardinality resets to a single row), and then subsequent operations in the second part of the query only happen once, and no operations or results get multiplied by the input rows. Then you collect the edgesData (or count into relCount), and can return that if needed. And if you want more clear separation (as well as protection from cases where either $nodesData or $edgesData is empty), then use subqueries around each:

CALL {
  UNWIND $nodesData as data
  CALL apoc.create.node([data.class], data) YIELD node
  WITH count(node) as nodeCount // needed to protect against empty parameter list
  RETURN nodeCount // subqueries must return something
}
WITH nodeCount // only a single row at this point from the earlier aggregation
CALL {
  UNWIND $edgesData AS data  
  // you should be using labels or this will be really really slow!
  MATCH (n {newtId: data.source}), (m { newtId: data.target})  
  WITH n, m, data
  CALL apoc.create.relationship(n,data.class,data,m) YIELD rel  
  WITH count(rel) as relCount
  RETURN relCount
}
RETURN nodeCount, relCount