cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Inconsistent run time for query

My query time varies from 1min to 5min to 20min when running. Is there a reason for this inconsistency? Is it because of the rand() in the first MATCH?

MATCH (a:Tops) WITH a ORDER BY rand() LIMIT 1
MATCH   (a) -[s1:CC_SCORE]- (b:Bottoms),
(a) -[s2:CC_SCORE]- (c:Shoes),
(a) -[s3:CC_SCORE]- (d:Bags),
(a) -[s4:CC_SCORE]- (e:Jewelry),
(b:Bottoms) -[s5:CC_SCORE]- (c:Shoes),
(b:Bottoms) -[s6:CC_SCORE]- (d:Bags),
(b:Bottoms) -[s7:CC_SCORE]- (e:Jewelry),
(c:Shoes) -[s8:CC_SCORE]- (d:Bags),
(c:Shoes) -[s9:CC_SCORE]- (e:Jewelry),
(d:Bags) -[s10:CC_SCORE]- (e:Jewelry)
RETURN toFloat(s1.score) + toFloat(s2.score) + toFloat(s3.score) + toFloat(s4.score) 
+ toFloat(s5.score) + toFloat(s6.score) + toFloat(s7.score) + toFloat(s8.score) 
+ toFloat(s9.score) + toFloat(s10.score) AS totalScore, a, b, c, d, e
ORDER BY totalScore DESC LIMIT 1;```

- neo4j version 3.3.4
- using py2neo in jupyter notebook
- `PROFILE` image attached
9 REPLIES 9

Can you share the PROFILE output of your query? It seems that the attachment didn't make it.
I also don#t see a rand() in your query() but yes, it could totally affect the volume of data processed depending on how you use the results.

You should add relationship directions
and possibly a label for a.

What is CC score?

This is a global query. So it might depend also on your configured memory and graph size.
What is your heap/page-cache config?
And are you using community or enterprise (which comes with Neo4j Desktop).

Sorry not sure why the PROFILE plan image didn't come through the first time... Also I edited the query text, for some reason the first line with the rand() was hidden. CC_SCORE is a pairwise score (0-1) between nodes. Only relationships higher than .99 were added to the graph. Where can I find the heap/page-cache config? I am using the community edition but I believe my company does have enterprise.

page-cache and heap are configured in neo4j.conf which depending on your system is either in $NEO4J_HOME/conf or /etc/neo4j/neo4j.conf

as you can see in your profile, it touches quite a lot of data

there might be several ways to optimize the query, one could be to limit cardinality earlier by summing duplicate values

another one could be by moving from a single match statement to one per pair, so relationship-uniqueness doesn't have to be computed

picking the first value is probably easier by returning id's for Tops and then on the client picking one out of that list.

from the conf file:
#dbms.memory.pagecache.size=10g (currently commented)
dbms.jvm.additional=-XX:+AlwaysPreTouch
are these the settings you were looking for?

Can you expand on what you mean by limit cardinality? I'm not seeing where the duplicate values are.
I am also thinking about trying a single MATCH per pair; Not sure if this could cause dead ends later on.
Why would you return a property instead of the node itself?

Yes that pagecache setting
and there is a dbms.heap.size setting too

What I meant is if you do a 3-hop-expand then at the first hope all neighbours are unique but at hop 2 and 3 you revisit certain nodes multiple times (reachable via different ways) and then those have to be expanded again. So aggregating those again to a minimal set is beneficial.

I'm still pondering how to best rewrite your query.

Would it by change possible to share your graphdb with me?

is there an easy way to share the graph db with you directly?

You can PM me a dropbox/drive/s3 link of the zipped graph.db folder.

Thank you.

I'm not seeing an option to PM you. Is this feature not available for all users? Perhaps through another platform?