Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-30-2021 06:35 AM
I'm using Pyingest script to read 10 CSV Files with 10k columns and 7 rows each. The best result I had so far was using a Chunk Size of 10000.
My PC has 16gb of RAM and a 4c/8t CPU.
The first two always take about 1~2 minutes, then it grows a bit and by the last file the write process takes around 5 minutes.
My DB Heap configs are the following:
# Java Heap Size: by default the Java heap size is dynamically calculated based
# on available system resources. Uncomment these lines to set specific initial
# and maximum heap size.
dbms.memory.heap.initial_size=5G
dbms.memory.heap.max_size=5G
# The amount of memory to use for mapping the store files.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the Java heap size.
dbms.memory.pagecache.size=7G
I got this value from neo4j-admin memrec (not mine actually, I'm using the AppImage version of Neo4J so I can't use memrec, just found this on this forum from a similar device).
Since Pyingest is using Pandas to optimize the CSV read I guess the Timings are getting slower because of Garbage Collector problems.
I'm trying to get the best result I can on my pc, so that when I put this on a Server (much more powerful obviously than my machine) it will perform beautifully.
Is there any way to optimize more?
EDIT: I forgot to put the queries I'm using.
WITH $dict.rows as rows UNWIND rows as row
MERGE (a: Cookie_id {domain: row.domain})
MERGE (b: OS {version: row.version})
MERGE (c: Device_type {classification: row.classification})
MERGE (d: Device_model {model: row.model})
MERGE (e: IP {addr: row.addr})
MERGE (f: Access_time {hour_group: row.hour_group})
MERGE (g: Access_day {is_weekend: row.is_weekend})
MERGE (a)-[:USING_OS]->(b)
MERGE (a)-[:BY_TYPE]->(c)
MERGE (a)-[:ACCESSED_BY]->(d)
MERGE (a)-[:HAS_IP]->(e)
MERGE (a)-[:ACCESSED_AT_TIME]->(f)
MERGE (a)-[:ACCESSED_AT_DAY]->(g)
RETURN a
Solved! Go to Solution.
06-30-2021 06:56 AM
Hi there! Have you created indices on the nodes you're MERGE
-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.
06-30-2021 06:56 AM
Hi there! Have you created indices on the nodes you're MERGE
-ing?
As you may know, MERGE=MATCH+CREATE, so creating indices on the MERGE patterns boosts the speed of the preliminary MATCH.
06-30-2021 06:58 AM
Actually I didn't.
Will try this to see the boost. Thanks for your answer!
06-30-2021 07:09 AM
I've got a question about this. Can I set any property as a constraint to act as a Index?
Or does it needs to be the Id?
06-30-2021 07:21 AM
Oh man! The total time to proccess the 10 files is taking now 15 seconds.
I set a constraint for the main property and set it as the Node Key.
Thank you so much
All the sessions of the conference are now available online