cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Bulk loader using group name

I’m loading data using the bulk importer but specifying the group name for the :ID field to avoid collisions if an ID from one node type is the same as the ID from another node type. I loaded two files for the same node type, and they have different headers and differing numbers of columns (each file has headers, there is no separate headers file). I think the load is complaining that it assumes the headers should be the same for both files because they are the same node/group names. Is this the expected behavior - and how do I get around it? I want to be able to load different headers for different files of the same node type.

3 REPLIES 3

Please provide the header lines and and a few data lines of the files you're trying to import. Provide the exact neo4j-admin import call as well.

As you can see, these different address part files have varying headers. Scroll to the bottom to see the error -- it seems to be continuing to reference the header on the first file even though address-part-6 has its own header. Not sure... thanks.

part-address-1

ADD_ID:ID(ADDRESS-ID) STREET STREET_2 CITY STATE ZIP PROVINCE POSTAL_CODE COUNTRY :LABEL UUID
60d4fa7d0e898b594f3a8e9863a00e386374da5aa499df8ca4e7240b26bf70c9 251 HWBZUFQEQ WAY null RXPSRKNHA GT 10879 null null US ADDRESS 0816af79b147ccce524fea360bdcace70abbfe046a3b8de7d591766d09a66d0f
116780b587174407a632103649492e92d5821b1de28f7b81a5ce1cd00371c081 569 OQAMUBYPW DRIVE null BPLQQRMMG null null NDJGMI T8U 1I9 null ADDRESS ee89d9b33764fdb71de80145213f70d2774a2054171424d8a9bdd1a21c8a4dcc
22d0c052db51fb080699e0ad93699f7911a4b863be81e9b23c2701bdaef19bf8 889 RUDXJWAYS WAY null PEPUZGQEV ZR 14490 null null US ADDRESS 48844f134ffb1526ce2ee60f207035adbdd63a8124eae0dfa0506dcf6538c98e

part-address-2

ADD_ID:ID(ADDRESS-ID) STREET CITY STATE ZIP PROVINCE POSTAL_CODE COUNTRY :LABEL UUID
3bdb0ff92c470cba17096f925131b27960d136706848098834440abb180bb8d8 015 JJVGXPLCY BOULEVARD null YN 98925 null null US ADDRESS 213983c2455bb1f397cac28d00f6470cb01453deaec2385032c86469c30f34b8
4aa235f4970a31b3029636182d52d76c8534658a9a0fca6b0c16b28d7bc8ee5d 443 XPRPOLWXY AVENUE QHWSOEWRH null null MUNKMZ F9I 2U2 null ADDRESS 39b2835f66c55271f0a8b4cc4e1cde72e373e065ebd32185cabfb5af2d55a248
c037161b7fa217b5e34dc4fcdf29574e4937193d07c8bb2dbf5663be907ff985 772 ZRCECZXME BOULEVARD QMGZVMWIA JQ 21742 null null US ADDRESS ee921c9bfedb00d19f22fa54719ce69addf4d8079b3b38100fedd7564f22d7eb

part-address-6

ADD_ID:ID(ADDRESS-ID) STREET_NAME STREET_NBR BLDG_RM CITY STATE ZIP PROVINCE POSTAL_CODE COUNTRY :LABEL UUID
66fd10ebad515f2c8e815bfc2d7f672d6a89aff95c42200a468894f74c2a2a91 872 IAFTAMLOB SQUARE 14577 null ESVYNXYEK QX 15380 LXVSYE M2O 4N0 US ADDRESS 160bee756cd2ca1ff86c74b01e3bbcd7f11fca1ffb076ec300202cb5ba8a4f18
060bb2f8ccaacb24beaff893f25a2bca66c9eae9965b0a9f40a1ea8c9157abf9 863 ZFYVAWNAR STREET 72327 null ONMKQGAGS ON 69944 INQLGO D4O 4W3 US ADDRESS ac8f99fbd49082583c032a2abe4cf53787abb8f9d5b7dbe607845eeb17b24464
d6ca0edc3b3144ca2b70d41c2a3590da6e8d7931e76a818ffd66d9345b3cd3ac 438 FVJQIFECW BOULEVARD 13000 null FADNHWOTZ BL 64916 GCDJUA N2E 6X3 US ADDRESS 139eb4866d53d020b7c933849ce19865f281a9381805bbab36da29ab9eb5adc2

Import file (bash script)
bin/neo4j-admin import
--nodes "$local_data_path/nodes/part-address.*.csv" --ignore-duplicate-nodes --ignore-missing-nodes --delimiter "TAB" --database="graph.db"

Here's the error:
for header: [ADD_ID:ID(ADDRESS-ID), STREET:string, STREET_2:string, CITY:string, STATE:string, ZIP:string, PROVINCE:string, POSTAL_CODE:string, COUNTRY:string, :LABEL, UUID:string]
raw field value: UUID
original error: Extra column not present in header on line 1 in /data/nodes/part-address-6.csv with value UUID
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:272)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:140)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

You could cut off the header and use a separate header file (or files to massage the columns)

You can also use :IGNORE to skip certain columns and use --ignore-extra-columns

--ignore-extra-columns <true/false>
        Whether or not to ignore extra columns in the data not specified by the header.
        Skipped columns will be logged, containing at most number of entities specified
        by bad-tolerance, unless otherwise specified by skip-bad-entries-loggingoption.
        Default value: false

Otherwise you'd need to preprocess the CSV's or import via cypher.