cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Count all relationships when there are two MATCH patterns

Hello, everyone!

I want to know how many relationships does a query return when there are two MATCH patterns, and I couldn't figure out how to do it.

For example:

MATCH (a)-[r1]->(b), (c)-[r2]->(d)
WHERE a.name <> d.name AND b.region = c.region AND a.name <> b.name AND c.name <> d.name (and other conditions...)
RETURN DISTINCT a.name AS origin, b.name AS halfway1, 
    c.name AS halfway2, d.name AS destination,
    count(distinct r1) AS qty1, count(distinct r2) AS qty2
ORDER BY (qty1*qty2) DESC

It returns something like that for me (I'm working with the results on Pandas dataframes):

qty1 origin halfway1 qty2 halfway2 destination
0 32 CITY1 CITY2 12 CITY3 CITY4
1 18 CITY3 CITY2 15 CITY3 CITY4
2 7 CITY5 CITY1 31 CITY1 CITY2

At first, I was just concatenating the first three columns with the last three and dropping rows with the lowest number when both cities are equal to get the sum of all relationships. For example, there are 12 trips from CITY3 to CITY4 in the first row and 15 trips from CITY3 to CITY4 in the second row. So, I consider 15 trips.

However, I possibly lose some relationships doing it, since some of those discarded 12 trips may not be included in the 15 trips I kept.

I hope I made myself clear after all this prelude. In the end, all I want to know is how to return the count of all relationships that query returns to me (both in r1 and r2).

5 REPLIES 5

It's odd to me that your counts are differing here. Are you sure that the CITY3 and CITY4 from row 0 are the same cities as those on row 1? Is name unique on city nodes? If not it's possible you're looking at differing cities with the same name. You might want to include the id() of the city nodes to make sure, and if your intent is that some nodes should not be the same as other nodes, do the comparison of the nodes themselves and not their properties (i.e. WHERE a <> d)

Also you really should be using labels in your queries. If you only have :City nodes in your graph and no other types then it should be okay, otherwise you'll want to make sure you're not doing all nodes scans.

Hello, Andrew!

I'm sure they are the same cities. I forgot to include there details, but yeah, there are only :City nodes in my graph. I used comparisons like the one you mentioned (a <> d instead of a.name <> d.name) just in case, and had the same result.

My point here is that it makes sense that those counts differ, because of the other conditions. I should have included more information about this.
I'm using a time constraint. I have the datetime on each relationship and there's a condition that r2 must happen between 6-24 hours after r1.

So, for example, in the first row, there were 32 trips from CITY1 to CITY2 that had a possibility of another sequel trip, from CITY3 (in the same region of CITY2) to CITY4. And the sum of these possibilities from CITY3 to CITY4 that happened exactly after CITY1->CITY2 was 12. It may not include all freights from CITY3 to CITY4, only those that happened after a trip from CITY1 to CITY2.

Then, in the second row, there were 18 trips from CITY3 to CITY2 that had possibilities of sequel 6-24 hours after, from CITY3 (the same city from the beginning, because it happens to be in the same region of CITY2) to CITY4. The 15 CITY3->CITY4 trips here are not necessarily the same as the 12 trips of the first row. They can actually be completely different trips, that happened in different datetimes.

Probably this explanation is also somewhat confusing. But I tried my best

Maybe make two matches and union them, which would make the set distinct before counting etc. ?

Hello, Thomas!

Unfortunately, I can't, because I have constraints on both relationships at the same time. Or probably I didn't understand your suggestion. Sorry if it's the latter.

In the end, it's all about the possibility (or not) of returning a number for all unique relationships that query gives.
If I simply used RETURN count(distinct r1) and count(distinct r2), that would give the answer I need, but then unfortunately there would be trips that appear on both r1 and r2 that would be counted twice.