Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-08-2021 05:11 AM
Let me tell my use-case with full text Index:
I want to find nodes with similar values inside a property.
I am facing this error while querying nodes using full text index -
Query:
MATCH (e: Email)
WITH e
CALL db.index.fulltext.queryNodes('convEmailFtIndex', e.convEmail)
YIELD node, score
RETURN e.email, node.convEmail, node.email, score ORDER BY e.email LIMIT 1000
Error:
ERROR
Neo.ClientError.Procedure.ProcedureCallFailed
Failed to invoke procedure db.index.fulltext.queryNodes: Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered "<EOF>" at line 1, column. 0.
Was expecting one of:
<NOT> ...
"+"...
"-"...
<BAREOPER> ...
"(" ...
"*"...
<QUOTED>...
<TERM>...
<PREFIXTERM>...
<WILDTERM>...
<REGEXPTERM>...
"["...
"{"...
<NUMBER>...
<TERM>...
I have this property called convEmail in my email which is basically only the alphabets of an email id, excluding numbers and special characters.
This is how I created the full-text index:
CALL db.index.fulltext.createNodeIndex('convEmailFtIndex', ['Email'], ['convEmail'], {analyzer: 'standard-no-stop-words'});
It would be very helpful if someone can help me resolve this. Thank you.
10-08-2021 12:36 PM
Hello @awesomeanonymously88
Your query looks good.
Can you provide some data to recreate your Email nodes please?
Which version of Neo4j are you using?
Regards,
Cobra
10-08-2021 01:17 PM
Hi @Cobra, thanks for the response.
I am using Neo4j version 4.2.5.
And sorry, I cannot provide email information as those are customer data in our business.
One thing I can confirm is that I have preprocessed the emails to contain only alphabets for better querying.
However, if I query using a specific email id (in the full-text index) instead of providing it generically to find out top 1000 records, it's giving me results.
But in our use-case, we need to find out top matching emails for each email which are similar in order to merge those nodes together.
(PS. I have already tried using apoc text functions like JaroWinkler to find out the similarity between two emails but unfortunately it's taking too much time. In our database, there are more than 10 million email ids so need to find out top matching email for each one.)
It would be great if you could help me solve this problem. Thank you.
10-08-2021 01:53 PM
This is my dataset: email.txt (26,1 Ko) (you must replace the file extension by .csv
).
I used the version 4.3.5 of Neo4j.
First, I created the nodes:
LOAD CSV WITH HEADERS FROM 'file:///email.csv' AS row
WITH row
MERGE (c:Email {id: row.id})
SET c.email = row.email
Then, I created the index:
CREATE FULLTEXT INDEX email FOR (n:Email) ON EACH [n.email]
Finally, I tested your query:
MATCH (e:Email)
CALL db.index.fulltext.queryNodes('email', e.email)
YIELD node, score
RETURN e.email, node.email, score
ORDER BY e.email
Everything worked on my side, this is the result: export.txt (195,0 Ko)
I also tried with the option:
CREATE FULLTEXT INDEX email FOR (n:Email) ON EACH [n.email] OPTIONS {indexConfig: {`fulltext.analyzer`: 'url_or_email'}}
The query also worked
Regards,
Cobra
10-08-2021 03:49 PM
Thank you.
Btw do you have any idea about the error I posted? Like what it says or what needs to be done?
10-08-2021 10:56 PM
Did you try another analyser in the option?
Can you try to update your database to the latest version?
I think it could also come from your data.
Can you try to a LIMIT 2
after the WITH e
and tell me if that still works?
All the sessions of the conference are now available online