cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Neo4j-admin import relations from the same .csv file

djenia88
Node Clone

Hello! I use Neo 4.0.4 on CentOS 7.

I have a greate amount of data in .csv files(so LOAD CSV is not suitable), the structure of data like CDR - Call data records (Person1 called to Person2 in date/time, or Person2 called by Person 3 and so on )

I need to import it in my graph db, and I try to use a ability of neo4j-admin import. But example, given at official web site says that nodes is one .csv, and relations is another:

bin/neo4j-admin import --nodes=import/movies3-header.csv,import/movies3.csv --nodes=import/actors3-header.csv,import/actors3.csv --relationships=import/roles3-header.csv,import/roles3.csv
relationships are roles3-header.csv and roles3.csv files here.

Is it possible to import relations from the same csv file that nodes, because the data structure is such that the nodes and relations are in the same file?

Thanks!

1 ACCEPTED SOLUTION

How much data is too much data? If it's too much for LOAD CSV, it may be too much for Neo4J. You have to get the data into the DB some how...

Since you have a lot of data, you will need to do a periodic commit:

USING PERIODIC COMMIT 500

Using PERIODIC COMMIT will prevent running out of memory when importing large amounts of data. However, it will also break transactional isolation and thus it should only be used where needed.

see:

Also see:

View solution in original post

6 REPLIES 6

ameyasoft
Graph Maven

Generally CDR file is one big .csv file with lots of columns depending on the features. Anyway you will have call date, call time, calling phone, called phone .... Here from each row you have to create and nodes build relationships all at the same time. Check this: https://neo4j.com/blog/neo4j-call-detail-records-analytics/

djenia88
Node Clone

Thank you for your answer.
I have read this article. LOAD CSV function used there. It's not suitable for me because I have a lot of data

How much data is too much data? If it's too much for LOAD CSV, it may be too much for Neo4J. You have to get the data into the DB some how...

Since you have a lot of data, you will need to do a periodic commit:

USING PERIODIC COMMIT 500

Using PERIODIC COMMIT will prevent running out of memory when importing large amounts of data. However, it will also break transactional isolation and thus it should only be used where needed.

see:

Also see:

Hello, clem!

I have about a hundred of 20 Gb .csv files. Now , I load one of them into db with default value of USING PERIODIC COMMIT (1000), it takes 210 seconds for 20 millions nodes with 8 properties each( without relatons yet). Do you think, it's a normal result of usage LOAD CSV with USING PERIODIC COMMIT?

Hi @djenia88, since you are already using CentOS 7, there is a utility in UNIX/LINUX, that can split large files in chucks or csv into specified lines files.

Usage: /usr/gnu/bin/split [OPTION]... [INPUT [PREFIX]]
Output fixed-size pieces of INPUT to PREFIXaa, PREFIXab, ...; default
size is 1000 lines, and default PREFIX is 'x'.  With no INPUT, or when INPUT
is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names.
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of lines per output file
  -d, --numeric-suffixes[=FROM]  use numeric suffixes instead of alphabetic.
                                   FROM changes the start value (default 0).
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines per output file
  -n, --number=CHUNKS     generate CHUNKS output files.  See below
  -u, --unbuffered        immediately copy input to output with '-n r/...'
      --verbose           print a diagnostic just before each
                            output file is opened
      --help     display this help and exit
      --version  output version information and exit

split -a 4 -d -l input_filename.csv OUTPUT_
the above command will split the huge csv file into number of lines specified with output filename like OUTPUT_0001, OUTPUT_0002 so on ...

then you can either create a shell script or python program to enumerate through the files, and you can use either LOAD CSV or neo4j-admin.

Let me or the community if you need further guidanace.

djenia88
Node Clone

Thanks, @ dominicvivek06. Usefull info