Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-20-2022 01:11 PM
I have a graph model similar to this:
(:PERSON)-[:LIVED_IN]->(:CITY)
I would like to know the total number of people who have lived in more than one city. Therfore an example would be:
Dave has lived in Sydney
Dave has lived in London
John has lived in Paris
Wendy has lived in New York
The result of this query would return 1 as just Dave has lived in more than 1 city.
My dataset of people is greater than 1million in total therefore performance is of interest.
Im aware that I could perform the following Cypher Query, however this will load all the Person nodes from the database and with such a large dataset, this takes along time to execute.
MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)
Is there a more efficient way to find out the information I'm interested in?
Solved! Go to Solution.
01-20-2022 02:41 PM
But
MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)
does not actually run since l
is not included in the WITH
clause of line 2. For example I get
neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"
and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.
As to optimization, provided a :Person
node which has a :LIVED_IN
relationship points to and only to a :City
node, i.e. a :LIVED_IN
rel is not used to also connect to a :Country
node then you can
match (n:Person)
where size( (n)-[:LIVED_IN]->()) > 1
and this size( (n)-[:LIVED_IN]->())
will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave
might have
01-20-2022 02:41 PM
But
MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l)
does not actually run since l
is not included in the WITH
clause of line 2. For example I get
neo4j@neo4j> MATCH (p:PERSON)-[l:LIVED_IN]->(:CITY)
WITH p, count(l) as rels
WHERE rels > 1
RETURN count(l);
Variable `l` not defined (line 4, column 14 (offset: 93))
"RETURN count(l);"
and if it was included I would expect it would actually return 1 value/row and it would be 2 For Dave has 2 relationships named :LIVED_IN.
As to optimization, provided a :Person
node which has a :LIVED_IN
relationship points to and only to a :City
node, i.e. a :LIVED_IN
rel is not used to also connect to a :Country
node then you can
match (n:Person)
where size( (n)-[:LIVED_IN]->()) > 1
and this size( (n)-[:LIVED_IN]->())
will take advantage of the metadata included with every node. For example for each node there is metadata which records the number of relationships by type and direction to any other node. As such for the :Person node for {name:'David'} the metadata reports there are 2 LIVED_IN relationships of outgoing direction. And as such this optimization would cause the query not to iterate over all the relationships Dave
might have
01-20-2022 03:02 PM
How about this Cypher
MATCH (p:PERSON)-[:LIVED_IN]->(:CITY), (p)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)
This Cypher is same as this.
MATCH (:CITY)<-[:LIVED_IN]-(p:PERSON)-[:LIVED_IN]->(:CITY)
RETURN count(distinct p)
All the sessions of the conference are now available online