Neo4j

prateek_sethi · ‎08-20-2021

Hi, I am running the following query

MATCH (fn1: full_name)
MATCH (fn2: full_name)
WHERE fn1.full_name <> fn2.full_name and  apoc.text.fuzzyMatch(fn1.full_name, fn2.full_name)=TRUE
MERGE (fn1)-[:FUZZY_MATCH]-(fn2)

which is currently taking more than an hour to run. The graph consists of approximately 54K full_name nodes. The idea is to create a connection between similar names.
Is there a way for me to optimize this process?

(Screenshot of query map for reference)

andreperez · ‎08-20-2021

Hi, have you set indexes already?

prateek_sethi · ‎08-20-2021

Even index creation is taking time but I'm currently creating indexes for full_name nodes. Will update on performance once it gets done.

andreperez · ‎08-20-2021

Also, you can check the constraint creation. I believe this can speed up even more (but at the cost of having an exclusive field)

prateek_sethi · ‎08-20-2021

Is there anything I'm missing out on in terms of query tuning? I was hoping that might be an area to explore as well.

Neo4j

NEO4j Optimize match and merge based on fuzzy match