Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-27-2019 05:06 PM
Hello everyone,
I am facing a weird issue with Neo4j. I have a relatively large graph with about 2 million nodes, and I would like to run personalized pagerank on some lists of nodes. I use the following syntax to grab the nodes I need
MATCH (a:type {id:value})
MATCH (b:type {id:value2})
MATCH (c:type {id:value3})
.....
and it seems not to be working out well in terms of performance.
More specifically, fetching 500 nodes, even without feeding them to pagerank, takes about a minute, and 1000 takes about 10, which is not the linear increase I expected.
Using PROFILE reveals that cartesian products are formed, first for a, b then for a, b, c, etc. Given that id is a unique index and that I provide the right type of node, is this performance drop for multiple matches expected? If not, what could be the culprit?
Thanks in advance,
Alex
01-27-2019 11:15 PM
In this kind of case, a cartesian product is expected and correct, and since these are unique indexes your result should only be a single row.
It would help to confirm the existence of an index on :type(id), and to see the PROFILE query plan with all elements expanded.
01-28-2019 01:02 AM
Thanks for the feedback.
One thing I should note, even though it may be clear from the first post, is that when I need to find N nodes, I perform N matches, so the final query has about ~N lines. From what I searched online, this is considered a bad practice. Maybe I should use some form of batching instead? Or would that not be relevant?
01-28-2019 08:06 AM
I think we would need to see the full query with some description on what it's supposed to do before we can make that call.
01-28-2019 11:02 PM
You should
MATCH (n:Type) WHERE n.id IN $params
...
All the sessions of the conference are now available online