Neo4j

awesomeanonymou · ‎10-08-2021

Let me tell my use-case with full text Index:

I want to find nodes with similar values inside a property.

I am facing this error while querying nodes using full text index -

Query:

MATCH (e: Email)
WITH e
CALL db.index.fulltext.queryNodes('convEmailFtIndex', e.convEmail)
YIELD node, score
RETURN e.email, node.convEmail, node.email, score ORDER BY e.email LIMIT 1000

Error:

ERROR 
Neo.ClientError.Procedure.ProcedureCallFailed
Failed to invoke procedure db.index.fulltext.queryNodes: Caused by: org.apache.lucene.queryparser.classic.ParseException: Encountered "<EOF>" at line 1, column. 0.
Was expecting one of:
<NOT> ...
"+"...
"-"...
<BAREOPER> ...
"(" ...
"*"...
<QUOTED>...
<TERM>...
<PREFIXTERM>...
<WILDTERM>...
<REGEXPTERM>...
"["...
"{"...
<NUMBER>...
<TERM>...

I have this property called convEmail in my email which is basically only the alphabets of an email id, excluding numbers and special characters.

This is how I created the full-text index:
CALL db.index.fulltext.createNodeIndex('convEmailFtIndex', ['Email'], ['convEmail'], {analyzer: 'standard-no-stop-words'});

It would be very helpful if someone can help me resolve this. Thank you.

Cobra · ‎10-08-2021

Hello @awesomeanonymously88

Your query looks good.
Can you provide some data to recreate your Email nodes please?
Which version of Neo4j are you using?

Regards,
Cobra

awesomeanonymou · ‎10-08-2021

Hi @Cobra, thanks for the response.
I am using Neo4j version 4.2.5.
And sorry, I cannot provide email information as those are customer data in our business.
One thing I can confirm is that I have preprocessed the emails to contain only alphabets for better querying.

However, if I query using a specific email id (in the full-text index) instead of providing it generically to find out top 1000 records, it's giving me results.

But in our use-case, we need to find out top matching emails for each email which are similar in order to merge those nodes together.

(PS. I have already tried using apoc text functions like JaroWinkler to find out the similarity between two emails but unfortunately it's taking too much time. In our database, there are more than 10 million email ids so need to find out top matching email for each one.)

It would be great if you could help me solve this problem. Thank you.

Cobra · ‎10-08-2021

This is my dataset: email.txt (26,1 Ko) (you must replace the file extension by .csv).

I used the version 4.3.5 of Neo4j.

First, I created the nodes:

LOAD CSV WITH HEADERS FROM 'file:///email.csv' AS row
WITH row
MERGE (c:Email {id: row.id})
SET c.email = row.email

Then, I created the index:

CREATE FULLTEXT INDEX email FOR (n:Email) ON EACH [n.email]

Finally, I tested your query:

MATCH (e:Email)
CALL db.index.fulltext.queryNodes('email', e.email)
YIELD node, score
RETURN e.email, node.email, score
ORDER BY e.email

Everything worked on my side, this is the result: export.txt (195,0 Ko)

I also tried with the option:

CREATE FULLTEXT INDEX email FOR (n:Email) ON EACH [n.email] OPTIONS {indexConfig: {`fulltext.analyzer`: 'url_or_email'}}

The query also worked

Regards,
Cobra

awesomeanonymou · ‎10-08-2021

Thank you.
Btw do you have any idea about the error I posted? Like what it says or what needs to be done?

Cobra · ‎10-08-2021

Did you try another analyser in the option?
Can you try to update your database to the latest version?
I think it could also come from your data.

Can you try to a LIMIT 2 after the WITH e and tell me if that still works?

Neo4j

Error while cypher-querying nodes using full-text index