Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-19-2021 04:42 AM
Hi all,
Currently we are evaluating using Full-text search (FTS) index for some of our use cases. In one of the use cases we want to show data matching hashtag and results are sorted by a property (creation time) and we were not able to sort by property and limit the results with FTS index query method (db.index.fulltext.queryNodes).
Here is simplified data model:
MERGE (g:Group {id: string})
MERGE (a:Article {
content: string, group_id: string,
created_at: unixtimestamp,
created_at_str: "20210819T1200"})
MERGE (g)-[:CONTAINS]->(a)
Example case: group1 has 90.000 articles, group2 has 10.000 articles
// index
call db.index.fulltext.createNodeIndex("fts__article",
["Article"], ["content", "group_id", "created_at_str"]);
// query most recent articles with very common hashtag
// (lets assume this hashtag is available in all articles,
// neo4j is sorting 100k articles outside of full text search index
with 'group_id: (group1 OR group2) AND content: "#verycommonhashtag"' as query
call db.index.fulltext.queryNodes("fts__article", query)
yield node as a, score
return a
order by a.created_at desc
limit 100
Imagine now we have 1M matching articles and query is getting slower and slower because of sorting 1M data.
We have considered boosting results by creation time and limit results to order by doing:
with 'group_id: (group1 OR group2) AND content: "#verycommonhashtag" AND (created_at_str: 202108*^10 OR created_at_str: 202107*^8 OR created_at_str:202*^1)' as query
call db.index.fulltext.queryNodes("fts__article", query, {limit: 1000}
yield node as a, score
return a
order by a.created_at desc
limit 100
which boosts articles from last 2 month then limits results by 1000, then we order outside of FTS index. This works fine if distribution of data on each group is almost similar every day or every month across all groups. But we have some groups which are very active in some days and in-active in some other weeks, which makes it difficult to create boosting condition, should boosting criteria be by month (202108^10), by week (2021-week-34^10), by day (20210819^10) or even by hour?
Imagine 1 group had 2.000 articles on a single day and we are boosting by day or week, when we limit by 1.000 we are missing some articles which might be more recent than articles returned as a result of limiting
Based on these observations, we are thinking maybe ordering inside Lucene might be helpful for us and faster. Still 1M articles should be sorted somewhere if they match criteria, either by Lucene or by Neo4j, considering that data is already in Lucene it might be faster than (1) fetching matched results from Lucene (2) then order in Neo4j.
Do you guys have plans to add sort order in FTS index?
Thanks a lot for awesome product!
08-19-2021 01:05 PM
one thing you can try to utilize both indexes the regular one on :Article(created_at)
and the FTS
with 'group_id: (group1 OR group2) AND content: "#verycommonhashtag" AND (created_at_str: 202108*^10 OR created_at_str: 202107*^8 OR created_at_str:202*^1)' as query
call db.index.fulltext.queryNodes("fts__article", query, {limit: 1000}
yield node as a, score
match (b:Article) where b.created_at > 0 // or < datetime()
and a=b // alternatively where id(b) = id(a)
return b
order by b.created_at desc
limit 100
08-20-2021 06:07 AM
Hi Michael,
Thanks for taking time to look into my question/feature request.
you should not merge on multiple properties just the id(s) you have in your constraint
Not sure I understand this statement, what do you mean here, can you please expand little bit more what it means "merging" here? I am only returning results from FTS index ordered by time
regarding query you have shared, I have 2 questions:
where b.created_at > 0 and id(b) == id(a)
? (because we already have 1000 articles from FTS, why should we bother another index again to sort data?)Thanks again!
08-18-2022 05:54 AM
Hi @Ika I have the same issue as you. Did you find a solution for your use case?
All the sessions of the conference are now available online