Neo4j

12kunal34 · ‎04-13-2020

Hi Everyone,

I am facing a very common issue with the cypher. when I am trying to run the below query, I am not getting any output and it running forever.

match (t3:tx)-[o2:exit]->(a2:add) where t3.id in [list which contains 40k records] return distinct t3.id as id,t3.time as time, sum(o2.value);

here total number of nodes in tx are around 200000000 and total number of nodes in add around 300000000
I have created the index on tx's id and add's id
could anyone please tell me how to query this much of data

intouch_vivek · ‎04-14-2020

Hi Kunal,

Please try EXISTS predicate instead of IN operator.
Also it will be good if you share Explain of the query.

12kunal34 · ‎04-14-2020

How to Use Exist predicate on List of ids ??

intouch_vivek · ‎04-14-2020

Ah.. Sorry my bad
I was resolving a issue in SQL and made the above statement.. Please ignore

12kunal34 · ‎04-14-2020

Please find Explain of my query:

And my query for this plan is:

EXPLAIN 
match (t1:tr)-[o]-(a1:add) where a1.add= 'xyz' with distinct t1.id as ids-----(these IDS are around 50k in count)
Match (t3:tr)-[o2:exit]->(a2:add)  where t3.id in (ids)
with distinct t3.id as id, t3.time as time,o2  limit 950000
return  id,time, sum(o2.value)

this query is working with LIMIT but once i remove this limit it start running for forever.

what wrong i am doing in query ?

dan_flavin1 · ‎04-14-2020

What version of Neo4j are you running? I am only guessing at your schema based on what's in the query.

Query Pattern

A couple of thoughts:

I see there's index seek in the explain, which is good, assuming it's on add.add and tr.id. If not, add.
Assuming I am right in interpreting the intention of the query to be: find all (:tr)labeled nodes that have a [] (undefined) relationship to (a1:add) nodes with property a1.add value = 'xyz' AND those same (:tr) nodes have an [o2:exit] relationship to other (t3:tr) nodes, return t3.id as id, t3.time as time, sum(o2.value). If so, your query pattern should be able to be all inclusive in one MATCH something like this:

    MATCH (t3:tr)<-[o2:exit]-(:tr)-->(a1:add) 
    WHERE a1.add='xyz'
    RETURN DISTINCT t3.id as id, t3.time as time, sum(o2.value)
    // Compare `EXPLAIN` plans.

In 2. above, It would be good to add an explicit relationship type to (:tr)-->(a1:add) e.g. (:tr)-[:RELATIONSHIP_HERE]->(a1:add). This came from the original query (t1:tr)-[o]-(a1:add). To summarize:

It is advised to use relationship direction -> and <- when you don't want a bi-directional search.
Define explicit relationship types.
Variables are not assigned to nodes and relationships if they're not used in a where or return. It's just good practice.

A similar pattern to the above query can be run with the [EXISTS] suggestion. Or see the Cypher documentation. There's some other approaches we might try if this doesn't work out or is wrong.

Execution Environment

Look at your java heap and database pagecache settings to make sure the query has adequate resources to run. Best to size pagecache to hold the database structure, and adjust heap so there are no long gc pauses. You can do this the old way by hand, or try the Halin graph app from Neo4j Labs if your running in Neo4j Desktop, or the online version if you can get out to the internet. Neo4j is a Graph Database. Meaning the same general resource starvation rules apply as they do with any other kind of database.
Try running the query in cypher-shell from the command line. Large data sets can be overwhelming for the browser to process and render. cypher-shell removes the overhead incurred by the web based Neo4j browser trying to format the returned data. You can find cypher-shell in the same bin directory as neo4j. Open a terminal session and cd to the bin directory if you're using Neo4j Desktop to run cypher-shell. If you've validated that the query has enough RAM and CPU resources when running, then it's issue around the query itself if cypher-shell does not return.

I know they're fun, but not everyone needs to see bouncing bubbles.

12kunal34 · ‎04-14-2020

Hi @dan.flavin1 thank you for your detailed answer . please find below query which is working in some case and stuck in most of the cases .

match (t1:tr)-[o]-(a1:add) where a1.add= 'xyz' with distinct t1
Match (t1)-[o2:exit]->(a2:add)
with  t1.id as id, t1.time as time,o2.value as value
return id,time, sum(value)

here what m i doing, First of all i am passing a address getting all the related nodes(we have 2 type of rels with add, 1 is in , 2nd is exit).
then for these nodes getting all the add with exit relationship

dan_flavin · ‎04-14-2020

Yup... try new pattern. Cypher queries do not have to be just (node)-[rel]->(node). They can, and should be as expressive as possible in most cases.

12kunal34 · ‎04-14-2020

This is the only pattern avaialble in data model to get desired result

andrew_bowman · ‎04-23-2020

The query here doesn't look too bad. The DISTINCT operation may take awhile with a large number of matches from the first line. How many :tr nodes are connected to each :add node?

You should project the properties from t1 after you calculate the sum aggregation, not before and not at the same time. That should ensure that the projection only happens once t1 is distinct (as a side-effect of being the grouping key of the aggregation:

match (t1:tr)--(a1:add) 
where a1.add= 'xyz' 
with distinct t1
Match (t1)-[o2:exit]->(a2:add)
with  t1, sum(o2.value) as sum
return t1.id as id, t1.time as time, sum

Depending on the number of :tr nodes per :add node, you might be able to drop the DISTINCT.

12kunal34 · ‎05-23-2020

Hey @andrew.bowman

apologies for late reply
number of node connected to :add node are varies for different :add nodes .
in current scenario 50k tr nodes are connected but in some case they are around 200k as well.

i tried below approach as well but same result

match (t1)-[:IN|EXIT]-(a1:add) where a1.add= 'xyz' WITH collect(DISTINCT t1) AS tx
CALL  apoc.coll.partition(tx,10000) YIELD value AS txBatches
UNWIND txBatches AS batch
WITH collect(batch) as trans
CALL apoc.cypher.mapParallel2(
   "Match (_)-[o2:EXIT]->(a) return _.id as id, sum(o2.value) as Balance",  {parallel:True}, trans, 24,720) YIELD value
RETURN value.id, value.Balance

please let me know is there any further change we can do:

Cobra · ‎05-23-2020

Hello @12kunal34,

Just to know, did you use UNIQUE CONSTRAINTS on all your nodes?

Regards,
Cobra

12kunal34 · ‎05-23-2020

Yes ,
I am having indexes on all nodes

Neo4j

Query running so long and not returning any output

Query Pattern

Execution Environment