Neo4j

BairDev · ‎07-21-2021

I still don't quite understand, how the number or rows are determined by a WITH/RETURN statement in cypher.

What I guess:

Uniq data give one row. For example: if you have 10 user in your DB and you ask for them with MATCH (u:User) RETURN u; you will get 10 row. If you add a count like RETURN u, COUNT(u);, the result will still be 10 rows, the actual count will be 1 for each row. But what does uniq mean exactly?
For example: WITH [1, 2, 4, 2, 3] as nums UNWIND nums as num return num; gives 5 rows. Because you have 5 elements (could be nodes, relationships, maps etc).
If you use some aggregation method in the RETURN statement, the elements, which go into the aggregation method, are grouped by some grouping key. This is another part or the RETURN statement, like u in RETURN u, COUNT(u).
Now I know how to be careful with aggregation in most cases. But sometimes the results are not grouped too aggressively, but they are multiplied. I don't understand the following case:

**Please note that the ids are actually numbers instead of ***.

Query:

MATCH (u:User)-->(personalTypeMeasurement:PersonalTypeMeasurement)<--(typeValue:PersonalTypeValue) WHERE ID(u) = *****
    AND ID(personalTypeMeasurement) = *****
WITH u, personalTypeMeasurement, typeValue
MATCH (u)-->(belongsTo:Household) WHERE ID(belongsTo) = *****
WITH u, personalTypeMeasurement, belongsTo, typeValue
MERGE (personalTypeMeasurement)-[:BELONGS_TO {createdAt: datetime.transaction('Europe/Berlin')}]->(belongsTo)
WITH personalTypeMeasurement, ID(personalTypeMeasurement) AS id, belongsTo, typeValue, u // adding u or not does not change anything
RETURN personalTypeMeasurement{.*, id, belongsTo, typeValue}

Result:

"personalTypeMeasurement"                                             │
╞══════════════════════════════════════════════════════════════════════╡
│{"typeValue":{"name":"heatingMid","value":4,"startAt":"2021-06-30T08:3│
│7:46[Europe/Berlin]"},"id":***,"belongsTo":{"name":"Zweitwohnsitz","hh│
│Member":1,"hhArea":56,"hhApartmentsHeatSupply":"","hhHouseType":"singl│
│eHouse"},"sector":"heating","createdAt":"2021-07-21T08:37:58.286405000│
│[Europe/Berlin]","ghgDomain":"housing","supportedType":"heating"}     │
├──────────────────────────────────────────────────────────────────────┤
│{"typeValue":{"name":"heatingMid","value":4,"startAt":"2021-06-30T08:3│
│7:46[Europe/Berlin]"},"id":***,"belongsTo":{"name":"Zweitwohnsitz","hh│
│Member":1,"hhArea":56,"hhApartmentsHeatSupply":"","hhHouseType":"singl│
│eHouse"},"sector":"heating","createdAt":"2021-07-21T08:37:58.286405000│
│[Europe/Berlin]","ghgDomain":"housing","supportedType":"heating"}

I have only exactly one (personalTypeMeasurement:PersonalTypeMeasurement) in the game, it is even checked against its id. But why do I have two elements in the result, two rows? Would a DISTINCT help in this case? How can I improve multi-step-queries like this one in order to get the most precise result?

Bennu · ‎07-26-2021

Hi @BairDev !

About your query, why don't you try printing out everything (Return *) after every WITH. You will eventually have 2 rows, Maybe 2 typeValue's? Maybe two nodes with more than one relationship between them?

We will discover on the next episode (Just kidding)!

IHoping to be useful,

H

andrew_bowman · ‎07-27-2021

Harold is correct, there is some MATCH operation that is returning two rows of results, which may be multiplying out your previous rows.

Remember the Cypher operations produce rows, and they execute per row.

Do a PROFILE of your query, ensure all elements are expanded, and add the profile plan here. What you're looking for is where the rows between operations go from 1 to 2. That will show which expand operation found two results instead of 1. With two expand results, results from prior rows would duplicate, since the data is in common, and it would cause all subsequent operations to execute twice (once per input row). Either case would ultimately have the same effect in giving you two rows of the same results.

Given your query, there are only a few places where two rows could come from. One is from your first MATCH, (maybe you literally have a duplicated :PersonalTypeValue node attached), or there are multiple relationships present at certain points of the query (especially where you don't provide a type). There's also the possibility that your MERGE matched to two different :BELONGS_TO relationships between those nodes, though if I were to guess, multiple relationships between nodes would be the culprit.

Remember, Neo4j is interested in finding all possible paths that adhere to the patterns in your MATCHes. If there are two relationships between u and personalTypeManagement, for example, that would explain what we're seeing here, and a DISTINCT in the subsequent WITH clause would fix the issue.

BairDev · ‎10-28-2021

This is a beginner question, but how do I print out smth in a query? Just using the part of the query including the WITH in question only, so shortening the actual query?

Bennu · ‎10-28-2021

Hi @BairDev !

Yes. Whenever it happens to me (not knowing what's happening) I just review result step by step.

So

MATCH (u:User)-->(personalTypeMeasurement:PersonalTypeMeasurement)<--(typeValue:PersonalTypeValue) WHERE ID(u) = *****
    AND ID(personalTypeMeasurement) = *****
WITH u, personalTypeMeasurement, typeValue
Return *

then,

MATCH (u:User)-->(personalTypeMeasurement:PersonalTypeMeasurement)<--(typeValue:PersonalTypeValue) WHERE ID(u) = *****
    AND ID(personalTypeMeasurement) = *****
WITH u, personalTypeMeasurement, typeValue
MATCH (u)-->(belongsTo:Household) WHERE ID(belongsTo) = *****
WITH u, personalTypeMeasurement, belongsTo, typeValue
Return *

Eventually you will find the duplicity problem.

Bennu

Neo4j

Understanding RETURN, number of rows