Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-16-2020 01:11 PM
Hello! I use Neo 4.0.4 on CentOS 7.
I have a greate amount of data in .csv files(so LOAD CSV is not suitable), the structure of data like CDR - Call data records (Person1 called to Person2 in date/time, or Person2 called by Person 3 and so on )
I need to import it in my graph db, and I try to use a ability of neo4j-admin import. But example, given at official web site says that nodes is one .csv, and relations is another:
bin/neo4j-admin import --nodes=import/movies3-header.csv,import/movies3.csv --nodes=import/actors3-header.csv,import/actors3.csv --relationships=import/roles3-header.csv,import/roles3.csv
relationships are roles3-header.csv and roles3.csv files here.
Is it possible to import relations from the same csv file that nodes, because the data structure is such that the nodes and relations are in the same file?
Thanks!
Solved! Go to Solution.
12-24-2020 04:30 PM
How much data is too much data? If it's too much for LOAD CSV, it may be too much for Neo4J. You have to get the data into the DB some how...
Since you have a lot of data, you will need to do a periodic commit:
USING PERIODIC COMMIT 500
Using
PERIODIC COMMIT
will prevent running out of memory when importing large amounts of data. However, it will also break transactional isolation and thus it should only be used where needed.
see:
Also see:
12-16-2020 08:43 PM
Generally CDR file is one big .csv file with lots of columns depending on the features. Anyway you will have call date, call time, calling phone, called phone .... Here from each row you have to create and nodes build relationships all at the same time. Check this: https://neo4j.com/blog/neo4j-call-detail-records-analytics/
12-16-2020 10:12 PM
Thank you for your answer.
I have read this article. LOAD CSV function used there. It's not suitable for me because I have a lot of data
12-24-2020 04:30 PM
How much data is too much data? If it's too much for LOAD CSV, it may be too much for Neo4J. You have to get the data into the DB some how...
Since you have a lot of data, you will need to do a periodic commit:
USING PERIODIC COMMIT 500
Using
PERIODIC COMMIT
will prevent running out of memory when importing large amounts of data. However, it will also break transactional isolation and thus it should only be used where needed.
see:
Also see:
12-25-2020 01:01 AM
Hello, clem!
I have about a hundred of 20 Gb .csv files. Now , I load one of them into db with default value of USING PERIODIC COMMIT (1000), it takes 210 seconds for 20 millions nodes with 8 properties each( without relatons yet). Do you think, it's a normal result of usage LOAD CSV with USING PERIODIC COMMIT?
12-28-2020 05:12 PM
Hi @djenia88, since you are already using CentOS 7, there is a utility in UNIX/LINUX, that can split large files in chucks or csv into specified lines files.
Usage: /usr/gnu/bin/split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'. With no INPUT, or when INPUT
is -, read standard input.
Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N generate suffixes of length N (default 2)
--additional-suffix=SUFFIX append an additional SUFFIX to file names.
-b, --bytes=SIZE put SIZE bytes per output file
-C, --line-bytes=SIZE put at most SIZE bytes of lines per output file
-d, --numeric-suffixes[=FROM] use numeric suffixes instead of alphabetic.
FROM changes the start value (default 0).
-e, --elide-empty-files do not generate empty output files with '-n'
--filter=COMMAND write to shell COMMAND; file name is $FILE
-l, --lines=NUMBER put NUMBER lines per output file
-n, --number=CHUNKS generate CHUNKS output files. See below
-u, --unbuffered immediately copy input to output with '-n r/...'
--verbose print a diagnostic just before each
output file is opened
--help display this help and exit
--version output version information and exit
split -a 4 -d -l input_filename.csv OUTPUT_
the above command will split the huge csv file into number of lines specified with output filename like OUTPUT_0001, OUTPUT_0002 so on ...
then you can either create a shell script or python program to enumerate through the files, and you can use either LOAD CSV or neo4j-admin.
Let me or the community if you need further guidanace.
12-31-2020 10:00 AM
Thanks, @ dominicvivek06. Usefull info
All the sessions of the conference are now available online