cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Query all code at once can not but split into pieces to run work

Peter_Lian
Node Clone

I'm trying to query the following code in Neo4j Desktop but it can not work due to spent too much time (The data have been loaded into Neo4j successfully with 277 nodes)

###################################

MATCH (e:Process)

WHERE e.user = "No User"

MATCH (n:User)

WHERE n.user = "No  User"

MATCH (p:Process)

MATCH (l:Event)

WHERE p.parentID = "No ID (Process)" AND p.pid = l.pid AND l.eventid = 1 

MATCH(k:Event)

MATCH (file:File)

WHERE NOT file.eventid = 11

MATCH (createordeleteregistry:Createordeleteregistry)

WHERE NOT createordeleteregistry.eventid = 12

MATCH (deletefile:Deletefile)

WHERE NOT deletefile.eventid = 23

MATCH (registry:Registry)

WHERE NOT registry.eventid = 13

DETACH DELETE e,n,p,k,file,createordeleteregistry,deletefile,registry

 

###################################

However, after I split the above code into ten pieces to run, i.e.,

 

######## (First piece) #########

MATCH (e:Process)

WHERE e.user = "No User"

DETACH DELETE e

###########################

 

######## (Second piece) #########

MATCH (n:User)

WHERE n.user = "No  User"

DETACH DELETE n

###########################

#########(Ten piece) #######

MATCH (registry:Registry)

WHERE NOT registry.eventid = 13

DETACH DELETE registry

###########################

 

it becomes very fast (each pieces finish within 1 second)

 

What caused of this different and how to just query overall instead of split into pieces? 

 

In fact, the similar question comes up. If I query all code then Java Heap Space error shows. Such problem can be solved if I just split my query code into piece. This confused me a lot, I want to query all code instead of split into pieces 

Thanks.

1 ACCEPTED SOLUTION

When you have two matches in a query, such as ‘match(a:LabelA) match(b:LabelB) return a, b’, the result will be the Cartesian product of the two results. This means is the firsts match produces N rows and the second match produces M row, the result together will produce NxM number of rows. This is because the second match is executed for each row of the first match. 

Typically you don’t write cypher with matches like this that are independent, but where the second match extends data from the first. 

in your case you have 11 independent match statements. The final result will me the caretesian product of each match. Many of the matches have specific criteria, so they most likely result in one row, but many have ‘not’ predicates, so these probably result in many rows. This is the root cause of your problem, which is resulting in memory issues and the query never finishing. 

As you observed, you don’t have this problem when the queries are executed separately, since the query is not trying to generate the Cartesian product of the individual results.  If you must run these unrelated match’s together to delete the data, then isolate them by calling each in their own subquery. For example:

call{

match(a:LabelA)

where <insert condition>

delete a

}

call {

match(b:LabelB)

where <insert condition>

delete b

}

Since the call subqueries don’t have return statements, a Cartesian product is avoided. 

you may also be able to fix it by placing each delete after each match clause, instead of all at the end. I think you will need a ‘with’ clause between each ‘delete’ and ‘match’, since you are chaining queries. As a work around and to avoid the Cartesian product, it may work to use something like ‘with 1 as number’, so only one row of data is passed on   

I believe the call subquery approach is more understandable. 

View solution in original post

2 REPLIES 2

When you have two matches in a query, such as ‘match(a:LabelA) match(b:LabelB) return a, b’, the result will be the Cartesian product of the two results. This means is the firsts match produces N rows and the second match produces M row, the result together will produce NxM number of rows. This is because the second match is executed for each row of the first match. 

Typically you don’t write cypher with matches like this that are independent, but where the second match extends data from the first. 

in your case you have 11 independent match statements. The final result will me the caretesian product of each match. Many of the matches have specific criteria, so they most likely result in one row, but many have ‘not’ predicates, so these probably result in many rows. This is the root cause of your problem, which is resulting in memory issues and the query never finishing. 

As you observed, you don’t have this problem when the queries are executed separately, since the query is not trying to generate the Cartesian product of the individual results.  If you must run these unrelated match’s together to delete the data, then isolate them by calling each in their own subquery. For example:

call{

match(a:LabelA)

where <insert condition>

delete a

}

call {

match(b:LabelB)

where <insert condition>

delete b

}

Since the call subqueries don’t have return statements, a Cartesian product is avoided. 

you may also be able to fix it by placing each delete after each match clause, instead of all at the end. I think you will need a ‘with’ clause between each ‘delete’ and ‘match’, since you are chaining queries. As a work around and to avoid the Cartesian product, it may work to use something like ‘with 1 as number’, so only one row of data is passed on   

I believe the call subquery approach is more understandable. 

Appreciate, it work! 

This really give me a bigggg help.