Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-13-2020 09:47 PM
Hi
I would like to solve some challenges I have on Customers De Dup.
I thought about Graph DB as a candidate for the challenge.
Found this reference but unfortunately, the full article is missing.
Does anyone have any information /Ideas about this issue?
Thanks
Tal
08-14-2020 02:12 PM
I think you are looking for information on a topic of research within Natural Language Processing
check out the Neo4j chapter on NLP
These procedures support entity extraction, key phrase extraction, sentiment analysis, and document classification.
Other examples
08-16-2020 02:25 AM
Hi Joel,
Sorry, No. I'm not looking for NLP. The Article i'm looking for is about de-duplication challenge. The link I added to an article dealing about it but I can't find the extension
Thanks Tal
08-19-2020 07:46 AM
Hi Tal, I'll rephrase, de-duplicating names, is an NLP challenge, and there are NLP solutions built to solve it. Best of luck! Regards, Joel
08-23-2020 03:59 AM
Hi Joel
On this article the issue is to build data structure that support matching AFTER the process:
meaning to deiced if combination of matches are bring to the same person: for example , day of birth, gender, address together are found the same - meaning it is the same person.
I'm looking for the article discuss that since when i open full article found 404 ...
Tx
Tal
08-23-2020 10:41 PM
Mindmajix Neo4j onineTraining describes what a graph database is, how to install Neo4j, how to query graphs in Neo4j with a query language, Cypher, and how to add and manipulate data. All these topics are well covered in the training curriculum to help learners get better insight.
08-23-2020 11:46 PM
If your situation is similar to the scenario mentioned in your referenced article, then this is same as in money laundering schemes.
In these scenarios, you can build a similarity relationships between the two customer names. For this use, Jaro-Winkler similarity.
To explain this in simple terms:
Consider two simple words: 'coronavirus' and 'cornivorus'. They both are of same length and contain same alphabets but rearranged differently in cornivorus.
Now we know the invisible connection between these two words!
Here comes the similarity:
with "coronavirus" as norm1, "cornivorus" as norm2
return toInteger(apoc.text.jaroWinklerDistance(norm1, norm2) * 100) as similarity
Result: 96
You can setup your own similarity limit to build a similarity relationship and this should help you to address de-duplication scenarios.
All the sessions of the conference are now available online