Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-16-2021 12:23 AM
Hey there,
I stumbled while importing data via the neo4j-importer using CSV files that contain duplicates.
It seems like there is an issue with the length and/or the character set of the used IDs. I ran several tests to find a specific pattern, but I can't see any. Maybe you guys do, or maybe there are other restrictions regarding the IDs that I'm not aware of.
neo4j 4.4.0 (Ubuntu Desktop AppImage)
VM Name: OpenJDK 64-Bit Server VM
VM Vendor: Azul Systems, Inc.
VM Version: 11.0.8+10-LTS
JIT compiler: HotSpot 64-Bit Tiered Compilers
VM Arguments: [-Xmx6291456k, -XX:+UseG1GC, -XX:-OmitStackTraceInFastThrow, -XX:+AlwaysPreTouch, -XX:+UnlockExperimentalVMOptions, -XX:+TrustFinalNonStaticFields, -XX:+DisableExplicitGC, -XX:MaxInlineLevel=15, -XX:-UseBiasedLocking, -Djdk.nio.maxCachedBufferSize=262144, -Dio.netty.tryReflectionSetAccessible=true, -Djdk.tls.ephemeralDHKeySize=2048, -Djdk.tls.rejectClientInitiatedRenegotiation=true, -XX:FlightRecorderOptions=stackdepth=256, -XX:+UnlockDiagnosticVMOptions, -XX:+DebugNonSafepoints, -Dlog4j2.disable.jmx=true, -Dfile.encoding=UTF-8]
./neo4j-admin import \
--verbose \
--skip-duplicate-nodes=true \
--nodes nodes.csv
Available resources:
Total machine memory: 31.14GiB
Free machine memory: 5.117GiB
Max heap memory : 6.000GiB
Processors: 12
Configured max memory: 22.63GiB
High-IO: true
Nodes, started 2021-12-16 07:55:24.521+0000
[*Nodes:?? 1.004GiB---------------------------------------------------------------------------] 0 ∆ 0
Done in 28ms
Prepare node index, started 2021-12-16 07:55:24.556+0000
Critical error occurred! Shutting down the import...
[*RESOLVE (~2 collisions):1.004GiB------------------------------------------------------------] 0 ∆ 0
Done in 61ms
IMPORT FAILED in 216ms.
Data statistics is not available.
Peak memory usage: 1.004GiB
Import error: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
Caused by:DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
at java.base/java.lang.Thread.run(Thread.java:834)
WARNING Import failed. The store files in /home/dbr/.config/Neo4j Desktop/Application/relate-data/dbmss/dbms-49a06a75-bf53-4257-b193-a193c9e41556/data/databases/neo4j are left as they are, although they are likely in an unusable state. Starting a database on these store files will likely fail or observe inconsistent records so start at your own risk or delete the store manually
org.neo4j.kernel.impl.store.InvalidRecordException: DynamicRecord[1,used=false,(0),type=-1,data=byte[],start=true,next=-1] not in use
at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:141)
at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1074)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:897)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:850)
at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecordByCursor(CommonAbstractStore.java:830)
at org.neo4j.kernel.impl.store.CommonAbstractStore.streamRecords(CommonAbstractStore.java:1003)
at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecords(CommonAbstractStore.java:979)
at org.neo4j.kernel.impl.store.PropertyStore.ensureHeavy(PropertyStore.java:300)
at org.neo4j.kernel.impl.store.PropertyStore.getTextValueFor(PropertyStore.java:700)
at org.neo4j.kernel.impl.store.PropertyType$9.value(PropertyType.java:129)
at org.neo4j.kernel.impl.store.record.PropertyBlock.newPropertyValue(PropertyBlock.java:280)
at org.neo4j.internal.batchimport.NodeInputIdPropertyLookup.lookupProperty(NodeInputIdPropertyLookup.java:59)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.buildCollisionInfo(EncodingIdMapper.java:527)
at org.neo4j.internal.batchimport.cache.idmapping.string.EncodingIdMapper.prepare(EncodingIdMapper.java:263)
at org.neo4j.internal.batchimport.IdMapperPreparationStep.process(IdMapperPreparationStep.java:54)
at org.neo4j.internal.batchimport.staging.LonelyProcessingStep.lambda$receive$0(LonelyProcessingStep.java:53)
at java.base/java.lang.Thread.run(Thread.java:834)
As mentioned before, I tried to figure out a pattern. So there are some ridiculous test cases (test data):
"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqr","Tina","Person"
"id:ID","name",":LABEL"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tom","Person"
"abcdefghijklmnopqrstuvwxyzabcdefghijklmnopq","Tina","Person"
"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
"id:ID","name",":LABEL"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
"id:ID","name",":LABEL"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
"id:ID","name",":LABEL"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tom","Person"
"abcdefgaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa","Tina","Person"
12-16-2021 04:53 AM
This looks like a bug related to the --skip-duplicate-nodes
flag. Could you report an issue to Issues · neo4j/neo4j · GitHub and link it back here?
Best,
ABK
All the sessions of the conference are now available online