Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-01-2020 04:25 AM
Hello, I have a question about the ORDER BY performance of neo4j.
Node - 10,001,203 EA
Releationship - 11,251,324 EA
| 1 | "article_index" | ["Article"] | ["articleId", "officeId"] | "native-btree-1.0" |
| 2 | "office_index" | ["Article"] | ["officeId"] | "native-btree-1.0" |
| 3 | "order_index" | ["Article"] | ["updateDateTime"] | "native-btree-1.0" |
MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a LIMIT 100
MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'}) RETURN a ORDER BY a.updateDateTime DESC LIMIT 100
The number of 'Article' nodes is about 10 million.
and The number of '(a:Article {officeId: '001'})' is 227,245.
There are many performance differences depending on whether ORDER BY is used or not.
Is there any other way to address ORDER BY performance issues?
Thanks
07-01-2020 09:57 AM
Can you try this?
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
07-01-2020 10:22 AM
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
RETURN a LIMIT 100
The above query itself is slow. it takes about 400ms
Is there a problem with the 'updateDateTime' index?
Or need something like a descending index?
The 'Article' Node properties are as follows.
"properties": {
"contents.title": "...",
"contents.content": "...",
"officeId": "001",
"articleId": "0009656943",
"contents.section": "10464f000",
"insertDateTime": "20180820035121",
"updateDateTime": "20181029013007",
"refinedContent": "..."
}
07-01-2020 10:32 AM
Do you have indexes on officeId
and updateDateTime
?
Also, can you please profile the query and share the query execution plan?
PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
07-01-2020 10:37 AM
I have indexes on officeId and updateDateTime.
PROFILE
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
07-01-2020 11:30 AM
And are the results returned correct?
I am trying to think of a way to make this faster than 282ms. Number of (a:Article {officeId: '001'})
is 227,245. How many nodes are (s:Section {sectionId:'104'})
?
07-01-2020 05:18 PM
We can try to leverage index-backed ordering, but this requires a means to hint the type of the property, such as in a WHERE clause against a value of the same type. Can you give this a try?
PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a
07-01-2020 07:53 PM
PROFILE
MATCH (a:Article {officeId:'001'})
where a.updateDateTime < dateTime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
RETURN a
No result from the above query. I think where a.updateDateTime < dateTime()
clause doesn't seem to work.
updateDateTime format
"updateDateTime": "20181029013007"
Profile
Is this right condition between updateDateTime and dateTime()?
07-01-2020 10:15 PM
Ah, I thought you were using a dateTime type. If you're using the epoch time, then maybe something like WHERE a.updateDateTime < timestamp()
for that segment.
07-01-2020 10:33 PM
I just changed the property 'insertDateTime' to dateTime type with the query below
CALL apoc.periodic.iterate(
"MATCH (a:Article) RETURN a",
"SET a.insertDateTime = datetime({epochmillis: apoc.date.parse(a.insertDateTime, 'ms', 'yyyyMMddhhmmss')})",
{batchSize:1000, parallel:true})
I made a Index on 'insertDateTime' property and I ran the command you recommend
PROFILE
MATCH (a:Article {officeId:'001'})
where a.insertDateTime < datetime() AND (a)<-[:CONSISTS_OF]-(:Section {sectionId:'104'})
WITH a
ORDER BY a.insertDateTime DESC
LIMIT 100
RETURN a
Below query is also slow ..
PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
WHERE a.insertDateTime < dateTime()
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100
And this query is faster than both above
PROFILE MATCH (a:Article {officeId:'001'})<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a ORDER BY a.insertDateTime DESC LIMIT 100
Is leverage index-backed ordering working?
07-01-2020 07:15 PM
MATCH (a:Article {officeId:'001'})
WITH a
ORDER BY a.updateDateTime DESC
LIMIT 100
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
RETURN a
The result of returned is not correct.There are only seven results.
MATCH (a)<-[r:CONSISTS_OF]-(s:Section {sectionId:'104'})
I think the above query is executed as a result from 'ORDER BY a.updateDateTime DESC LIMIT 100'.
And Number of (s:Section {sectionId:'104'})
is only one node
All the sessions of the conference are now available online