Neo4j

Celes-10 · ‎07-27-2021

Hello everyone,

I have a use-case where I have thousands of relationships for a specific relationship type with a numerical attribute. I would like to get the top 10 nodes by sorting on this attribute. There is a B-tree index on the relationship attribute (I am using the latest version of neo4j). My cypher query uses sorting on this specific attribute but it seems that the sorting does not use an index as it goes through all the relationships with many db hits making the query very slow, is that expected? What should I do to tell the engine to use an index for sorting? Thanks a lot

My query is similar to this :

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) WHERE a1.name IN ['name1', 'name2'] AND NOT a2.name IN ['name3', 'name4'] WITH a1, a2 ORDER BY rel.score DESC WITH a1.name as a1_name, collect(a2.name) as a RETURN a1_name, a[0..10]

The same thing happens with the simpler query:

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) WHERE a1.name='name1' RETURN a2.name ORDER BY rel.score DESC LIMIT 10

There is already an index here on the node name attribute which it uses correctly. Also, doing an exact match on the relationship attribute works quite well and uses well the index. It seems that the problem happens only with sorting.

Thanks a lot for your help.

The screenshot is for this query:

PROFILE MATCH (a1:Node)-[rel:ASPECT_ADJACENT_CONNECTION]-(a2) WHERE a1.name="Test data" RETURN a2.name ORDER BY rel.pmi DESC LIMIT 10
It performs 10k db hits while i am only asking for 10 results and it doesn't seem to use the index.

Bennu · ‎07-29-2021

Hi @Celes-10 !

This feature was requested long time ago on Node indexes, it's now closed but with no direct use of the index (sort sometimes is out of the planner optimization). Is there any low boundary for the pmi score? Can you add a WHERE condition like

MATCH (a1: Node)-[rel:REL_TYPE]-(a2:Node) 
WHERE a1.name='name1' 
AND rel.score > 0
RETURN a2.name ORDER BY rel.score DESC LIMIT 10

Technically, a 10k filter should be easy to handle. How long does it the query takes? Can you send the full PROFILE of the query? I assume you also have an index on the Node(name).

See ya!

H

Celes-10 · ‎08-03-2021

Hello @Bennu , Thanks for your answer. Yes i have a unique constraint on the Node(name) and it creates an index on it automatically. Actually my use-case is a bit more than that specific query. I would like to extract the topX of the topX... until an arbitrary depth. For now i do it by step in multiple queries (equal to the depth) where an example of the first query is the one showed above to get the top10. Then I do another query with the results of the first one (the top10 nodes) to get the top10 of the top10 ... This takes time. I am not sure if this is the most optimal way to do it, perhaps I should use one of the algorithms in the apoc suite?

To answer your question, given a depth of 3 with some filtering on the pmi score, it takes 2 to 4 seconds

Cheers

Neo4j

Relationships Index for sorting