Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-01-2019 12:44 PM
My query time varies from 1min to 5min to 20min when running. Is there a reason for this inconsistency? Is it because of the rand() in the first MATCH?
MATCH (a:Tops) WITH a ORDER BY rand() LIMIT 1
MATCH (a) -[s1:CC_SCORE]- (b:Bottoms),
(a) -[s2:CC_SCORE]- (c:Shoes),
(a) -[s3:CC_SCORE]- (d:Bags),
(a) -[s4:CC_SCORE]- (e:Jewelry),
(b:Bottoms) -[s5:CC_SCORE]- (c:Shoes),
(b:Bottoms) -[s6:CC_SCORE]- (d:Bags),
(b:Bottoms) -[s7:CC_SCORE]- (e:Jewelry),
(c:Shoes) -[s8:CC_SCORE]- (d:Bags),
(c:Shoes) -[s9:CC_SCORE]- (e:Jewelry),
(d:Bags) -[s10:CC_SCORE]- (e:Jewelry)
RETURN toFloat(s1.score) + toFloat(s2.score) + toFloat(s3.score) + toFloat(s4.score)
+ toFloat(s5.score) + toFloat(s6.score) + toFloat(s7.score) + toFloat(s8.score)
+ toFloat(s9.score) + toFloat(s10.score) AS totalScore, a, b, c, d, e
ORDER BY totalScore DESC LIMIT 1;```
- neo4j version 3.3.4
- using py2neo in jupyter notebook
- `PROFILE` image attached
03-01-2019 05:45 PM
Can you share the PROFILE output of your query? It seems that the attachment didn't make it.
I also don#t see a rand() in your query() but yes, it could totally affect the volume of data processed depending on how you use the results.
You should add relationship directions
and possibly a label for a
.
What is CC score?
This is a global query. So it might depend also on your configured memory and graph size.
What is your heap/page-cache config?
And are you using community or enterprise (which comes with Neo4j Desktop).
03-04-2019 03:14 PM
Sorry not sure why the PROFILE plan image didn't come through the first time... Also I edited the query text, for some reason the first line with the rand() was hidden. CC_SCORE is a pairwise score (0-1) between nodes. Only relationships higher than .99 were added to the graph. Where can I find the heap/page-cache config? I am using the community edition but I believe my company does have enterprise.
03-04-2019 05:48 PM
page-cache and heap are configured in neo4j.conf which depending on your system is either in $NEO4J_HOME/conf or /etc/neo4j/neo4j.conf
as you can see in your profile, it touches quite a lot of data
there might be several ways to optimize the query, one could be to limit cardinality earlier by summing duplicate values
another one could be by moving from a single match statement to one per pair, so relationship-uniqueness doesn't have to be computed
picking the first value is probably easier by returning id's for Tops and then on the client picking one out of that list.
03-05-2019 10:28 AM
from the conf file:
#dbms.memory.pagecache.size=10g (currently commented)
dbms.jvm.additional=-XX:+AlwaysPreTouch
are these the settings you were looking for?
Can you expand on what you mean by limit cardinality? I'm not seeing where the duplicate values are.
I am also thinking about trying a single MATCH per pair; Not sure if this could cause dead ends later on.
Why would you return a property instead of the node itself?
03-05-2019 05:10 PM
Yes that pagecache setting
and there is a dbms.heap.size setting too
What I meant is if you do a 3-hop-expand then at the first hope all neighbours are unique but at hop 2 and 3 you revisit certain nodes multiple times (reachable via different ways) and then those have to be expanded again. So aggregating those again to a minimal set is beneficial.
I'm still pondering how to best rewrite your query.
03-05-2019 05:23 PM
Would it by change possible to share your graphdb with me?
03-06-2019 11:26 AM
is there an easy way to share the graph db with you directly?
03-06-2019 01:44 PM
You can PM me a dropbox/drive/s3 link of the zipped graph.db folder.
Thank you.
03-06-2019 02:02 PM
I'm not seeing an option to PM you. Is this feature not available for all users? Perhaps through another platform?
All the sessions of the conference are now available online