Neo4j

christianne_gre · ‎09-24-2018

I’m loading data using the bulk importer but specifying the group name for the :ID field to avoid collisions if an ID from one node type is the same as the ID from another node type. I loaded two files for the same node type, and they have different headers and differing numbers of columns (each file has headers, there is no separate headers file). I think the load is complaining that it assumes the headers should be the same for both files because they are the same node/group names. Is this the expected behavior - and how do I get around it? I want to be able to load different headers for different files of the same node type.

stefan_armbrust · ‎09-25-2018

Please provide the header lines and and a few data lines of the files you're trying to import. Provide the exact neo4j-admin import call as well.

christianne_gre · ‎09-25-2018

As you can see, these different address part files have varying headers. Scroll to the bottom to see the error -- it seems to be continuing to reference the header on the first file even though address-part-6 has its own header. Not sure... thanks.

part-address-1

ADD_ID:ID(ADDRESS-ID)	STREET	STREET_2	CITY	STATE	ZIP	PROVINCE	POSTAL_CODE	COUNTRY	:LABEL	UUID
60d4fa7d0e898b594f3a8e9863a00e386374da5aa499df8ca4e7240b26bf70c9	251 HWBZUFQEQ WAY	null	RXPSRKNHA	GT	10879	null	null	US	ADDRESS	0816af79b147ccce524fea360bdcace70abbfe046a3b8de7d591766d09a66d0f
116780b587174407a632103649492e92d5821b1de28f7b81a5ce1cd00371c081	569 OQAMUBYPW DRIVE	null	BPLQQRMMG	null	null	NDJGMI	T8U 1I9	null	ADDRESS	ee89d9b33764fdb71de80145213f70d2774a2054171424d8a9bdd1a21c8a4dcc
22d0c052db51fb080699e0ad93699f7911a4b863be81e9b23c2701bdaef19bf8	889 RUDXJWAYS WAY	null	PEPUZGQEV	ZR	14490	null	null	US	ADDRESS	48844f134ffb1526ce2ee60f207035adbdd63a8124eae0dfa0506dcf6538c98e

part-address-2

ADD_ID:ID(ADDRESS-ID)	STREET	CITY	STATE	ZIP	PROVINCE	POSTAL_CODE	COUNTRY	:LABEL	UUID
3bdb0ff92c470cba17096f925131b27960d136706848098834440abb180bb8d8	015 JJVGXPLCY BOULEVARD	null	YN	98925	null	null	US	ADDRESS	213983c2455bb1f397cac28d00f6470cb01453deaec2385032c86469c30f34b8
4aa235f4970a31b3029636182d52d76c8534658a9a0fca6b0c16b28d7bc8ee5d	443 XPRPOLWXY AVENUE	QHWSOEWRH	null	null	MUNKMZ	F9I 2U2	null	ADDRESS	39b2835f66c55271f0a8b4cc4e1cde72e373e065ebd32185cabfb5af2d55a248
c037161b7fa217b5e34dc4fcdf29574e4937193d07c8bb2dbf5663be907ff985	772 ZRCECZXME BOULEVARD	QMGZVMWIA	JQ	21742	null	null	US	ADDRESS	ee921c9bfedb00d19f22fa54719ce69addf4d8079b3b38100fedd7564f22d7eb

part-address-6

ADD_ID:ID(ADDRESS-ID)	STREET_NAME	STREET_NBR	BLDG_RM	CITY	STATE	ZIP	PROVINCE	POSTAL_CODE	COUNTRY	:LABEL	UUID
66fd10ebad515f2c8e815bfc2d7f672d6a89aff95c42200a468894f74c2a2a91	872 IAFTAMLOB SQUARE	14577	null	ESVYNXYEK	QX	15380	LXVSYE	M2O 4N0	US	ADDRESS	160bee756cd2ca1ff86c74b01e3bbcd7f11fca1ffb076ec300202cb5ba8a4f18
060bb2f8ccaacb24beaff893f25a2bca66c9eae9965b0a9f40a1ea8c9157abf9	863 ZFYVAWNAR STREET	72327	null	ONMKQGAGS	ON	69944	INQLGO	D4O 4W3	US	ADDRESS	ac8f99fbd49082583c032a2abe4cf53787abb8f9d5b7dbe607845eeb17b24464
d6ca0edc3b3144ca2b70d41c2a3590da6e8d7931e76a818ffd66d9345b3cd3ac	438 FVJQIFECW BOULEVARD	13000	null	FADNHWOTZ	BL	64916	GCDJUA	N2E 6X3	US	ADDRESS	139eb4866d53d020b7c933849ce19865f281a9381805bbab36da29ab9eb5adc2

Import file (bash script)
bin/neo4j-admin import
--nodes "$local_data_path/nodes/part-address.*.csv" --ignore-duplicate-nodes --ignore-missing-nodes --delimiter "TAB" --database="graph.db"

Here's the error:
for header: [ADD_ID:ID(ADDRESS-ID), STREET:string, STREET_2:string, CITY:string, STATE:string, ZIP:string, PROVINCE:string, POSTAL_CODE:string, COUNTRY:string, :LABEL, UUID:string]
raw field value: UUID
original error: Extra column not present in header on line 1 in /data/nodes/part-address-6.csv with value UUID
at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:272)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:140)
at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:106)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

michael_hunger · ‎01-15-2019

You could cut off the header and use a separate header file (or files to massage the columns)

You can also use :IGNORE to skip certain columns and use --ignore-extra-columns

--ignore-extra-columns <true/false>
        Whether or not to ignore extra columns in the data not specified by the header.
        Skipped columns will be logged, containing at most number of entities specified
        by bad-tolerance, unless otherwise specified by skip-bad-entries-loggingoption.
        Default value: false

Otherwise you'd need to preprocess the CSV's or import via cypher.

Neo4j

Bulk loader using group name