Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-18-2022 07:03 AM
Hi everyone,
I have the following situation. I am sinking data from Kafka to Neo4j using the Kafka connector Neo4j connector. I have 44 topics from which I should read from and some of the topics are intersecting with each other, which means that during the data creation, I can have locks. The reason why the locks exist is that I am using multiple Merge operations since I need to deal with Null and Duplicate values. An example query is shown below:
WITH event AS data
MERGE (cs:CustomerSite { id: data.id })
ON CREATE SET cs = data
ON MATCH SET cs = data
WITH data, cs
CALL apoc.do.when( data.customerId IS NOT NULL,
\" MERGE (c:Customer{id: data.customerId}) MERGE (c)-[:HAS_SITE]->(cs) \",
\" RETURN True \",
{data: data, cs: cs}
) YIELD value
WITH data, cs
CALL apoc.do.when( data.mainAddressId IS NOT NULL,
\" MERGE (a:Address{id: data.mainAddressId}) MERGE (cs)-[:IS_LOCATED]->(a) \",
\" RETURN True \",
{data: data, cs: cs}
) YIELD value
WITH data, cs
CALL apoc.do.when( data.partnerId IS NOT NULL,
\" MERGE (cp:Customer:Partner{id: data.partnerId}) MERGE (cp)-[:HAS_SITE]->(cs) \",
\" RETURN True \",
{data: data, cs: cs}
) YIELD value
WITH data, cs
CALL apoc.do.when( data.symCsaId IS NOT NULL,
\" MERGE (e:ExternalID:SYM{value: data.symCsaId, type: 'CSA ID'}) MERGE (cs)-[:HAS_EXTERNAL_ID]->(e) \",
\" RETURN True \",
{data: data, cs: cs}
) YIELD value
WITH data, cs
CALL apoc.do.when(data.deleted = 1,
\" DETACH DELETE cs \",
\" RETURN True \",
{data: data, cs: cs}
) YIELD value
RETURN 'done'
Due to locking on nodes while the relationships are being created, I get a very poor performance from Neo4j side and even sometimes the Kafka consumer disconnects due to the huge processing time needed on Neo4j. The amount of data that I am trying to load is around 80mil nodes and 80mil relationships. I could see that Kafka sends batches of data that consist of different topics and therefore it increases significantly the possibility of locks. When I have only one topic per connector and the connectors are not all at the same time up and running, the performance is quite decent.
Do you have any idea on a possible improvement on the performance on the Neo4j side? Or on the Kafka side, to trigger Kafka to send only batches of one topic at a time and not a mix of topics? My configurations consist of a cluster with 3 core replicas.
Heap size = 20G
Pagecache size = 60G
Transaction max size = 2G
Thanks in advance!
Solved! Go to Solution.
06-10-2022 04:35 PM
06-10-2022 04:35 PM
All the sessions of the conference are now available online