Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-08-2022 04:17 AM
Hi everyone,
I have a query like this:
WITH event AS data
MERGE (i:Individual {id: data.id})
SET
i.corporateTitleEnumId = data.corporateTitleEnumId,
i.costCenter = data.costCenter,
i.department = data.department,
i.externalId = data.externalId,
i.email = data.email,
i.emailVerified = data.emailVerified,
i.fax = data.fax,
i.firstName = data.firstName,
i.genderEnumId = data.genderEnumId,
i.languageEnumId = data.languageEnumId,
i.lastName = data.lastName,
i.middleName = data.middleName,
i.mobile = data.mobile,
i.officeNumber = data.officeNumber,
i.phone1 = data.phone1,
i.phone2 = data.phone2,
i.pCreated = data.pCreated,
i.pModified = data.pModified,
i.titleEnumId = data.titleEnumId,
i.userId = data.userId,
i.userPrincipalName = data.userPrincipalName,
i.uuid = data.uuid
WITH i, data
MERGE (a:Address{id: COALESCE(data.addressId,'')})
MERGE (i)-[:HAS_ADDRESS]->(a)
MERGE (e:ExternalID:AGO{value: COALESCE(data.agoPersonId,''), type: "Person ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e)
MERGE (c:Customer{id: COALESCE(data.customerId,'')})
MERGE (i)<-[:IS_RELATED_WITH]-(c)
MERGE (e1:ExternalID:IAS{value: COALESCE(data.iasvSubjectId,''), type: "Subject ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e1)
MERGE (e2:ExternalID:IAS{value: COALESCE(data.iasPersonId,''), type: "Person ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e2)
MERGE (e3:ExternalID:ILM{value: COALESCE(data.ilmEmployeeId,''), type: "Employee ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e3)
MERGE (e4:ExternalID:IPP{value: COALESCE(data.ippContactId,''), type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e4)
MERGE (e5:ExternalID:SAP{value: COALESCE(data.sapContactId,''), type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e5)
MERGE (e6:ExternalID:SFR{value: COALESCE(data.sfrIndividualId,''), type: "Individual ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e6)
MERGE (e7:ExternalID:SYM{value: COALESCE(data.symvContactId,''), type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e7)
MERGE (e8:ExternalID:SYM{value: COALESCE(data.symContactId,''), type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e8)
MERGE (e9:ExternalID:TIS{value: COALESCE(data.tisContactId,''), type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e9)
I get the data from Kafka but it is the case that there might be duplicates and null values. Since this query is very heavy (regarding performance) the load time is extremely high. Does anyone have any idea on how to optimise this query while considering duplicate and null values removal? I tried to use apoc.periodic.iterate but it does not seem to have a huge impact. Thank you in advance!
Solved! Go to Solution.
04-08-2022 06:20 AM
That is a crazy difference. Do you have indexes on many or all of the keys you are merging on?
04-08-2022 05:14 AM
You could try something like the following. If the attributes are all the same between node(i) and data, then you can set i = data. The call blocks will not attempt the merges if the attribute is null. Since i = data, you probably only need to pass 'I' in the 'with' clauses, but all your code uses data, so I passed it so you could just cut/paste what you have into the call blocks and make minor adjustments. You need the double 'with' clauses in the call blocks, as it complains about a 'where' clause following the first 'with' clause, so it's a work around.
Maybe it will help.
WITH event AS data
MERGE (i:Individual {id: data.id})
SET i = data
WITH i, data
CALL {
with i, data
with i, data
where data.addressid IS NOT NULL
MERGE (a:Address{id: data.addressId})
MERGE (i)-[:HAS_ADDRESS]->(a)
}
// add all the other attributes to be set using the same call pattern
CALL {
with i, data
with i, data
where data.tisContactId IS NOT NULL
MERGE (e9:ExternalID:TIS{value: data.tisContactId, type: "Contact ID"})
MERGE (i)-[:HAS_EXTERNAL_ID]->(e9)
}
04-08-2022 05:50 AM
Thank you for your answer. I will give it a try. Just to create and idea that while I am using the MERGE everywhere alongside with apoc.periodic.itearate, it takes around 10-20 mins to create/merge 30,000 nodes. While when I just use CREATE only it takes 5-10 mins to create 10 million nodes and 8 million relationships. I guess MERGE is supposed to be slower but not this much slow or am I wrong?
04-08-2022 06:20 AM
That is a crazy difference. Do you have indexes on many or all of the keys you are merging on?
04-08-2022 07:37 AM
I had indexes on some of them but not all of the properties used for merging. Now, the data import is faster (reasonable time compared with CREATE).
All the sessions of the conference are now available online