Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-09-2020 04:50 AM
-Danny
11-09-2020 04:53 AM
@dana.canzano Also want to know if loading data through kattel will be faster than load csv.
if there are any benchmark indicating load csv performance will be very helpful.
11-09-2020 05:15 AM
i cant imagine whereby Kettle would be material faster than load csv
. But also to date there is no data to support its slowness. It could be slow as a result of poor configuration, poor cypher, lack of indexes, etc.
11-09-2020 06:46 AM
Thanks Dana for the quick reply!
So are you saying that commit on 2 nodes out of the 3 nodes cluster OR 3 nodes out of 5 nodes cluster, won't add any overhead?
because load csv need to commit data on more than one node based upon the cluster configuration.
11-09-2020 07:23 AM
yes some overhead but is it the source of your poor performance. For example lets say your LOAD CSV is
LOAD CSV ....... ........ MERGE (n:Person {id:row[0]}) .......
and you have no index on :Person(id)
then each row loaded will do a ScanNodesByLabel and if you have 100k :Persons already in the graph then each rows will need to scan over all 100k nodes before insertion. Surely this will be a bigger performance drag than any concern relative to committing on 1 Neo instance or 2 of 3 cluster members
11-09-2020 08:51 AM
Thanks Dana!
if the db size is same both in standalone and 3 node cluster(including the underlying hardware specification) and I load same data in cluster as well as standalone.
Just need a benchmark performance difference between standalone vs cluster load csv.
As i need to make my mind to use load csv before moving from single node to cluster or after that. And need nos to convince my solution architect to backup my thought process.
-Danny
11-10-2020 12:06 AM
Hi Danny
In which cloud are you trying to deploy your cluster?
Sameer
11-11-2020 10:55 AM
Thanks Sameer!
It is on prem 3 node cluster.
Have you observed any performance degradation between standalone or cluster neo4j in respect to load csv?
11-09-2020 06:09 AM
can you share the headerfile, the query and the timings ?
All the sessions of the conference are now available online