Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-24-2021 07:14 AM
I'm trying to count the amount of rows that Neo4j will return but the count (or the query) is very slow.
Version 1 (70 sec):
MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WHERE order.timestamp >= 1632434400 AND size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) <= 20
WITH order
MATCH (order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(product:Product)
RETURN COUNT(product);
Version 2 (68 sec.):
MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WITH size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) AS amount
WHERE order.timestamp >= 1632434400 AND amount <= 20
RETURN SUM(amount)
Using Neo4j 4.4 community with about 800000 orders and about 17000000 order lines.
Is there a more efficient way to count the rows?
These are the indexes:
CREATE INDEX idx_order_torder_id FOR (n:Order) ON (n.order_id);
CREATE INDEX idx_order_timestamp FOR (n:Order) ON (n.timestamp);
CREATE INDEX idx_person_person_id FOR (n:Person) ON (n.person_id);
CREATE INDEX idx_product_product_id FOR (n:Product) ON (n.product_id);
The amount of rows are equal to 4269011.
The EXPLAIN plan:
Solved! Go to Solution.
01-10-2022 07:14 AM
Thank you for your answer martin3. I tried your query but got an error because you can't use size on a node (product variable in this case). Instead of using size, I used count. And because every order line has one product, i can skip the counting of the relation order lines to products:
MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order
MATCH (order)<-[:HAS_ORDER]-(orderLine:OrderLine)
WITH COUNT(orderLine) as productCount
WHERE productCount <= 20
RETURN SUM(productCount);
This query took 0m17.342s
But i managed to snoop some seconds with the following query:
MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order, size((order)<-[:HAS_ORDER]-(:OrderLine)) AS amount
WHERE amount <= 20
RETURN SUM(amount);
This query took 0m15.675s
12-24-2021 07:43 AM
Maybe this can increase speed:
MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order
MATCH (order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(product:Product)
WITH size(product) AS amount
WHERE amount <= 20
RETURN SUM(amount)
I guess it depends on your data. Maybe you can also refactor your data to increase speed, like this:
MATCH (order)-[:HAS_PRODUCT]->(product:Product)
It seems like your data is imported from a relational database with a many-to-many joining table, that maybe is not necessary in neo4j.
01-10-2022 07:14 AM
Thank you for your answer martin3. I tried your query but got an error because you can't use size on a node (product variable in this case). Instead of using size, I used count. And because every order line has one product, i can skip the counting of the relation order lines to products:
MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order
MATCH (order)<-[:HAS_ORDER]-(orderLine:OrderLine)
WITH COUNT(orderLine) as productCount
WHERE productCount <= 20
RETURN SUM(productCount);
This query took 0m17.342s
But i managed to snoop some seconds with the following query:
MATCH (order:Order)
WHERE order.timestamp >= 1632434400
WITH order, size((order)<-[:HAS_ORDER]-(:OrderLine)) AS amount
WHERE amount <= 20
RETURN SUM(amount);
This query took 0m15.675s
All the sessions of the conference are now available online