Neo4j

andreperez · ‎06-30-2021

I'm using Pyingest script to read 10 CSV Files with 10k columns and 7 rows each. The best result I had so far was using a Chunk Size of 10000.
My PC has 16gb of RAM and a 4c/8t CPU.
The first two always take about 1~2 minutes, then it grows a bit and by the last file the write process takes around 5 minutes.
My DB Heap configs are the following:

# Java Heap Size: by default the Java heap size is dynamically calculated based
# on available system resources. Uncomment these lines to set specific initial
# and maximum heap size.
dbms.memory.heap.initial_size=5G
dbms.memory.heap.max_size=5G

# The amount of memory to use for mapping the store files.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the Java heap size.
dbms.memory.pagecache.size=7G

I got this value from neo4j-admin memrec (not mine actually, I'm using the AppImage version of Neo4J so I can't use memrec, just found this on this forum from a similar device).

Since Pyingest is using Pandas to optimize the CSV read I guess the Timings are getting slower because of Garbage Collector problems.

I'm trying to get the best result I can on my pc, so that when I put this on a Server (much more powerful obviously than my machine) it will perform beautifully.

Is there any way to optimize more?

EDIT: I forgot to put the queries I'm using.

      WITH $dict.rows as rows UNWIND rows as row
      MERGE (a: Cookie_id {domain: row.domain}) 
      MERGE (b: OS {version: row.version}) 
      MERGE (c: Device_type {classification: row.classification}) 
      MERGE (d: Device_model {model: row.model}) 
      MERGE (e: IP {addr: row.addr}) 
      MERGE (f: Access_time {hour_group: row.hour_group}) 
      MERGE (g: Access_day {is_weekend: row.is_weekend}) 
      MERGE (a)-[:USING_OS]->(b)
      MERGE (a)-[:BY_TYPE]->(c)
      MERGE (a)-[:ACCESSED_BY]->(d)
      MERGE (a)-[:HAS_IP]->(e)
      MERGE (a)-[:ACCESSED_AT_TIME]->(f)
      MERGE (a)-[:ACCESSED_AT_DAY]->(g)
      RETURN a

florent_biville · ‎06-30-2021

Hi there! Have you created indices on the nodes you're MERGE-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.

View solution in original post

florent_biville · ‎06-30-2021

Hi there! Have you created indices on the nodes you're MERGE-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.

andreperez · ‎06-30-2021

Actually I didn't.
Will try this to see the boost. Thanks for your answer!

andreperez · ‎06-30-2021

I've got a question about this. Can I set any property as a constraint to act as a Index?
Or does it needs to be the Id?

andreperez · ‎06-30-2021

Oh man! The total time to proccess the 10 files is taking now 15 seconds.
I set a constraint for the main property and set it as the Node Key.
Thank you so much

Neo4j

10k writes taking around 4 minutes. Is this the limit?