Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-22-2019 12:12 PM
Hello,
I am using Neo4j 3.5.4, macOS/Unix version, in the cypher-shell.
I would like to get some help in understanding why these queries create list results that are formatted differently. I ask because I’ve written many queries that return list results in Format 1 with queries written like Code 2. Given the similarity between Code 1 & 2, I don’t understand why one returns one list format and the other another list format.
// ========================= Code 1 ============
</> // INPUT
MATCH (a1:SOURCE)<-[r1:A_IS|B_IS {OUTPUT:"UP"}]-(b:TARGET{ID:"303"})
optional match (b:TARGET)-[r2]->(a2: SOURCE)
// BODY
with a1, a2, r1
// OUTPUT
return a1.NODE_NAME as NODE1, collect(distinct a2.NODE_NAME) as NODE2List ORDER BY NODE1; </>
// ———————— Output Format 1 —————————
+------------------------------------------+
| NODE1 | NODE2List |
+------------------------------------------+
| "AAA" | ["BBB", "CCC", "DDD"] |
+------------------------------------------+
// ========================= Code 2 ============
</> // INPUT
MATCH (a1:SOURCE)<-[r1:A_IS|B_IS {OUTPUT:"UP"}]-(b:TARGET{ID:"79"})
optional match (b:TARGET)-[r2]->(a2: SOURCE)
// BODY
with a1, a2, r1, collect(distinct a2.NODE_NAME) as NODE2List
// OUTPUT
return a1.NODE_NAME as NODE1, NODE2List ORDER BY NODE1; </>
// ———————— Output Format 2 —————————
+-----------------------------+
| NODE1 | NODE2List |
+-----------------------------+
| "AAA" | ["BBB"] |
| "AAA" | ["CCC"] |
| "AAA" | ["DDD"] |
+-----------------------------+
04-22-2019 02:30 PM
Hello,
The reason you see a difference is because the two different collects() are performed with respect to different grouping keys.
When you aggregate, the combination of non-aggregation variables becomes the grouping key, the thing that you are collecting with respect to.
In code 1, your aggregation in the return is:
return a1.NODE_NAME as NODE1, collect(distinct a2.NODE_NAME) as NODE2List
You're collecting with respect to NODE1
, the projection of a1.NODE_NAME
. As there is only a single distinct value of a1.NODE_NAME
, the collection happens with respect to this, the entire list for this single value (though it would be more efficient, if NODE_NAME
is meant to be unique on :SOURCE nodes, to just collect with respect to a1
and delay the property access until later).
In code 2, your aggregation is:
with a1, a2, r1, collect(distinct a2.NODE_NAME) as NODE2List
Your grouping key is the distinct combination of a1
, a2
, and r1
. Collecting the unique NODE_NAME property of a2
along with a2
itself...this alone guarantees you will have a separate row per a2. It will help if you see what is returned at this point to help you understand how this aggregation looks along with its grouping key. Look at the result for this:
MATCH (a1:SOURCE)<-[r1:A_IS|B_IS {OUTPUT:"UP"}]-(b:TARGET{ID:"79"})
optional match (b:TARGET)-[r2]->(a2: SOURCE)
RETURN a1, a2, r1, collect(distinct a2.NODE_NAME) as NODE2List
On each row you will see a distinct combination of a1, a2, and r1, and the NODE2List collection will always be a single element with the NODE_NAME property value of the a2
node for that row. Since there will only ever be a single a2
node per row, the collection of a2's NODE_NAME property will always be the property of that single node for that row. In order to collect over multiple values, you have to ensure your grouping key is correct, where the nodes you want to aggregate over either aren't in scope when you aggregate, or that they themselves are aggregated in some additional aggregation.
The easiest way to go about this is to think about what you're really trying to get at the end: The list of node names for all of the a2 nodes for each a1 node. Whatever is on that far side of the for each should be your grouping key, in this case a1
.
04-23-2019 07:19 AM
Andrew buddy, I get it! Thanks so much for your response.
04-23-2019 07:25 AM
What I am shooting for actually is the most compact way of seeing the a2 information. I was hoping that Output Format 1 would be the most compact display of information. In some cases it is. However, if the list is big enough, then it doesn't seem to matter much. Super long horizontal list vs. super long vertical list.
All the sessions of the conference are now available online