Neo4j

matt_goodwin491 · ‎01-20-2022

I have a graph model similar to this:

(:PERSON)-[:LIVED_IN]->(:CITY)

I would like to know the total number of people who have lived in more than one city. Therfore an example would be:
Dave has lived in Sydney
Dave has lived in London
John has lived in Paris
Wendy has lived in New York

The result of this query would return 1 as just Dave has lived in more than 1 city.

My dataset of people is greater than 1million in total therefore performance is of interest.

Im aware that I could perform the following Cypher Query, however this will load all the Person nodes from the database and with such a large dataset, this takes along time to execute.

MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY) 
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)

Is there a more efficient way to find out the information I'm interested in?

dana_canzano · ‎01-20-2022

@matt.goodwin491

But

MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY) 
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)

does not actually run since l is not included in the WITH clause of line 2. For example I get

neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
             WITH p, count(l) as rels
             WHERE rels > 1
             RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"

and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.

As to optimization, provided a :Person node which has a :LIVED_IN relationship points to and only to a :City node, i.e. a :LIVED_IN rel is not used to also connect to a :Country node then you can

match (n:Person)
where  size( (n)-[:LIVED_IN]->()) > 1

and this size( (n)-[:LIVED_IN]->()) will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave might have

View solution in original post

dana_canzano · ‎01-20-2022

@matt.goodwin491

But

MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY) 
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)

does not actually run since l is not included in the WITH clause of line 2. For example I get

neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
             WITH p, count(l) as rels
             WHERE rels > 1
             RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"

and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.

As to optimization, provided a :Person node which has a :LIVED_IN relationship points to and only to a :City node, i.e. a :LIVED_IN rel is not used to also connect to a :Country node then you can

match (n:Person)
where  size( (n)-[:LIVED_IN]->()) > 1

and this size( (n)-[:LIVED_IN]->()) will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave might have

koji · ‎01-20-2022

Hi @matt.goodwin491

How about this Cypher

MATCH (p:PERSON)-[:LIVED_IN]->(:CITY), (p)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)

This Cypher is same as this.

MATCH (:CITY)<-[:LIVED_IN]-(p:PERSON)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)

Neo4j

Sum of multiple specific relationships per node for large data set