Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-13-2020 11:19 PM
Hi Everyone,
I am facing a very common issue with the cypher. when I am trying to run the below query, I am not getting any output and it running forever.
match (t3:tx)-[o2:exit]->(a2:add) where t3.id in [list which contains 40k records] return distinct t3.id as id,t3.time as time, sum(o2.value);
here total number of nodes in tx are around 200000000 and total number of nodes in add around 300000000
I have created the index on tx's id and add's id
could anyone please tell me how to query this much of data
04-14-2020 12:52 AM
Hi Kunal,
Please try EXISTS predicate instead of IN operator.
Also it will be good if you share Explain of the query.
04-14-2020 01:57 AM
How to Use Exist predicate on List of ids ??
04-14-2020 02:15 AM
Ah.. Sorry my bad
I was resolving a issue in SQL and made the above statement.. Please ignore
04-14-2020 03:29 AM
Please find Explain of my query:
And my query for this plan is:
EXPLAIN
match (t1:tr)-[o]-(a1:add) where a1.add= 'xyz' with distinct t1.id as ids-----(these IDS are around 50k in count)
Match (t3:tr)-[o2:exit]->(a2:add) where t3.id in (ids)
with distinct t3.id as id, t3.time as time,o2 limit 950000
return id,time, sum(o2.value)
this query is working with LIMIT but once i remove this limit it start running for forever.
what wrong i am doing in query ?
04-14-2020 10:42 AM
What version of Neo4j are you running? I am only guessing at your schema based on what's in the query.
A couple of thoughts:
I see there's index seek in the explain, which is good, assuming it's on add.add and tr.id. If not, add.
Assuming I am right in interpreting the intention of the query to be: find all (:tr)
labeled nodes that have a []
(undefined) relationship to (a1:add)
nodes with property a1.add
value = 'xyz' AND those same (:tr)
nodes have an [o2:exit]
relationship to other (t3:tr)
nodes, return t3.id as id, t3.time as time, sum(o2.value)
. If so, your query pattern should be able to be all inclusive in one MATCH
something like this:
MATCH (t3:tr)<-[o2:exit]-(:tr)-->(a1:add)
WHERE a1.add='xyz'
RETURN DISTINCT t3.id as id, t3.time as time, sum(o2.value)
// Compare `EXPLAIN` plans.
(:tr)-->(a1:add)
e.g. (:tr)-[:RELATIONSHIP_HERE]->(a1:add)
. This came from the original query (t1:tr)-[o]-(a1:add)
. To summarize:->
and <-
when you don't want a bi-directional search.[EXISTS]
suggestion. Or see the Cypher documentation. There's some other approaches we might try if this doesn't work out or is wrong.Look at your java heap and database pagecache settings to make sure the query has adequate resources to run. Best to size pagecache to hold the database structure, and adjust heap so there are no long gc pauses. You can do this the old way by hand, or try the Halin graph app from Neo4j Labs if your running in Neo4j Desktop, or the online version if you can get out to the internet. Neo4j is a Graph Database. Meaning the same general resource starvation rules apply as they do with any other kind of database.
Try running the query in cypher-shell
from the command line. Large data sets can be overwhelming for the browser to process and render. cypher-shell
removes the overhead incurred by the web based Neo4j browser trying to format the returned data. You can find cypher-shell
in the same bin directory as neo4j. Open a terminal session and cd to the bin directory if you're using Neo4j Desktop to run cypher-shell
. If you've validated that the query has enough RAM and CPU resources when running, then it's issue around the query itself if cypher-shell
does not return.
I know they're fun, but not everyone needs to see bouncing bubbles.
04-14-2020 11:06 AM
Hi @dan.flavin1 thank you for your detailed answer . please find below query which is working in some case and stuck in most of the cases .
match (t1:tr)-[o]-(a1:add) where a1.add= 'xyz' with distinct t1
Match (t1)-[o2:exit]->(a2:add)
with t1.id as id, t1.time as time,o2.value as value
return id,time, sum(value)
here what m i doing, First of all i am passing a address getting all the related nodes(we have 2 type of rels with add, 1 is in , 2nd is exit).
then for these nodes getting all the add with exit relationship
04-14-2020 12:05 PM
Yup... try new pattern. Cypher queries do not have to be just (node)-[rel]->(node). They can, and should be as expressive as possible in most cases.
04-14-2020 12:28 PM
This is the only pattern avaialble in data model to get desired result
04-23-2020 05:32 PM
The query here doesn't look too bad. The DISTINCT operation may take awhile with a large number of matches from the first line. How many :tr nodes are connected to each :add node?
You should project the properties from t1 after you calculate the sum aggregation, not before and not at the same time. That should ensure that the projection only happens once t1 is distinct (as a side-effect of being the grouping key of the aggregation:
match (t1:tr)--(a1:add)
where a1.add= 'xyz'
with distinct t1
Match (t1)-[o2:exit]->(a2:add)
with t1, sum(o2.value) as sum
return t1.id as id, t1.time as time, sum
Depending on the number of :tr nodes per :add node, you might be able to drop the DISTINCT.
05-23-2020 02:41 AM
Hey @andrew.bowman
apologies for late reply
number of node connected to :add node are varies for different :add nodes .
in current scenario 50k tr nodes are connected but in some case they are around 200k as well.
i tried below approach as well but same result
match (t1)-[:IN|EXIT]-(a1:add) where a1.add= 'xyz' WITH collect(DISTINCT t1) AS tx
CALL apoc.coll.partition(tx,10000) YIELD value AS txBatches
UNWIND txBatches AS batch
WITH collect(batch) as trans
CALL apoc.cypher.mapParallel2(
"Match (_)-[o2:EXIT]->(a) return _.id as id, sum(o2.value) as Balance", {parallel:True}, trans, 24,720) YIELD value
RETURN value.id, value.Balance
please let me know is there any further change we can do:
05-23-2020 03:34 AM
05-23-2020 09:31 AM
Yes ,
I am having indexes on all nodes
All the sessions of the conference are now available online