Neo4j

alex_mandalios · ‎01-27-2019

Hello everyone,

I am facing a weird issue with Neo4j. I have a relatively large graph with about 2 million nodes, and I would like to run personalized pagerank on some lists of nodes. I use the following syntax to grab the nodes I need
MATCH (a:type {id:value})
MATCH (b:type {id:value2})
MATCH (c:type {id:value3})
.....
and it seems not to be working out well in terms of performance.

More specifically, fetching 500 nodes, even without feeding them to pagerank, takes about a minute, and 1000 takes about 10, which is not the linear increase I expected.
Using PROFILE reveals that cartesian products are formed, first for a, b then for a, b, c, etc. Given that id is a unique index and that I provide the right type of node, is this performance drop for multiple matches expected? If not, what could be the culprit?

Thanks in advance,
Alex

andrew_bowman · ‎01-27-2019

In this kind of case, a cartesian product is expected and correct, and since these are unique indexes your result should only be a single row.

It would help to confirm the existence of an index on :type(id), and to see the PROFILE query plan with all elements expanded.

alex_mandalios · ‎01-28-2019

Thanks for the feedback.

One thing I should note, even though it may be clear from the first post, is that when I need to find N nodes, I perform N matches, so the final query has about ~N lines. From what I searched online, this is considered a bad practice. Maybe I should use some form of batching instead? Or would that not be relevant?

andrew_bowman · ‎01-28-2019

I think we would need to see the full query with some description on what it's supposed to do before we can make that call.

michael_hunger · ‎01-28-2019

You should

have an index or constraint
use parameters
use IN

MATCH (n:Type) WHERE n.id IN $params
...

Neo4j

Multiple matches performance drop