Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-12-2020 05:30 AM
Dear all,
I am using neo4j v4.1.3 windows and docker versions (I have tried both community and enterprise editions), and I have the following issue:
If I execute neo4j-admin import with a set of csv files (around 20 different files containing nodes and 1 with relationships, 8M nodes and 6M relationships in total) on my windows machine, everything works find and the data gets imported into a new database.
But when I execute the exact same command with the exact same csv files on the docker version, the import process finished without errors but the data gets corrupted.
If I run the neo4j-admin check-consistency after having done the import, I get tons of errors saying that the next properties did not have this property as its previous records and many other errors that prevent me to use the database.
Same product version, same csv file and same command result in two completely different outcomes when run on two different platforms (Windows and Linux).
Any ideas?
Thanks
Solved! Go to Solution.
11-23-2020 09:24 AM
I've never used consistency check, I guess it is primarily for checking backups?
Below is a snippet from my script that creates neo4j database with the utility inside docker...
Note: EPHERMERALCONTAINER is not equal to CONTAINER, since one can not use neo4j-admin import on the running database that neo4j docker immediately creates when it starts up.
docker exec -t \
${EPHEMERALCONTAINER} bin/neo4j-admin import \
--database="${CONTAINER}" \
--verbose \
--skip-bad-relationships=true \
--skip-duplicate-nodes=true \
--ignore-empty-strings=true \
--normalize-types=true \
--trim-strings=false \
--delimiter "\t" \
...
...
I just tried to run a consistency check on my database right after creation like this (snippet below) but it throws an error (something about issues with the index files not existing, and they probably don't yet, this newly created database has never been in the "running" state yet). Now I have a bit of catch 22, what I normally do is shutdown the ephemeral neo4j docker instance, and then start up the actual (using the database just created), that would probably create the remaining files required, but one must
neo4j stop
in order to run a consistency check, but docker will immediately exit if I do that.
How are you able to run the consistency check? you start up the database, and then stop it, and use another separate docker instance?
11-13-2020 01:40 PM
Could you elaborate on the setup? Are you using the neo4j-admin inside docker to create a database, or another copy somewhere else (e.g. installed on the host). I use a multi-step process to avoid installing any neo4j components on the host. I didn't see you mention that so I guess you are using a separate neo4j installation. Could you double check that the versions are the same for both docker and the neo4j-admin you are using?
11-23-2020 07:12 AM
Hi,
Thanks for your message.
I am using neo4j-admin inside docker, that's right. This is my setup:
-I have a docker container running latest enterprise image (v4.2.0), and persisting the data, plugin and import folders into an external volume.
-I copy the csv files into the import folder
-I connect to the docker container and run the neo4j-admin import command directly there (is there any other way of executing this command?)
The neo4j-admin version is the one contained on the neo4j docker image, both are v4.2.0.
The import process returns no errors, I am using the skip-duplicate-nodes and skip-bad-relationships parameters both in Windows and docker. My Windows machine has more resources available (32Gb RAM and 4 cores), while my docker instance has 5Gb RAM and 2 cores.
So, after having run the exact same command on both servers, the output is the same and no critical errors are reported. But when I run the neo4j-admin check-consistency command, I get many critical errors on the docker instance reporting existing relationships with either source or destination node missing.
Any clues?
Many thanks.
11-23-2020 09:24 AM
I've never used consistency check, I guess it is primarily for checking backups?
Below is a snippet from my script that creates neo4j database with the utility inside docker...
Note: EPHERMERALCONTAINER is not equal to CONTAINER, since one can not use neo4j-admin import on the running database that neo4j docker immediately creates when it starts up.
docker exec -t \
${EPHEMERALCONTAINER} bin/neo4j-admin import \
--database="${CONTAINER}" \
--verbose \
--skip-bad-relationships=true \
--skip-duplicate-nodes=true \
--ignore-empty-strings=true \
--normalize-types=true \
--trim-strings=false \
--delimiter "\t" \
...
...
I just tried to run a consistency check on my database right after creation like this (snippet below) but it throws an error (something about issues with the index files not existing, and they probably don't yet, this newly created database has never been in the "running" state yet). Now I have a bit of catch 22, what I normally do is shutdown the ephemeral neo4j docker instance, and then start up the actual (using the database just created), that would probably create the remaining files required, but one must
neo4j stop
in order to run a consistency check, but docker will immediately exit if I do that.
How are you able to run the consistency check? you start up the database, and then stop it, and use another separate docker instance?
11-23-2020 10:53 PM
Hi, thanks for your prompt response.
Yes, this is the only way I have to check the consistency. First I run the import to create the new database, then I start the database so the indices are created as well, then I stop it and finally I run the consistency-check.
I have just checked I am using the skip-duplicate-nodes and skip-bad-relationship properties. I am not using the ignore-empty-strings or any of the other ones. The relevant issue is that the exact same command ran on Windows creates a consistent database, but the docker version seem to have some issues and some inconsistencies are created as well corrupting the database.
I have double-checked the import.report that gets created with the import command and both in Windows and docker the number of reported errors while importing is exactly the same, so I'm guessing that the number of skipped records is the same.
The main errors I keep on having on docker are:
-The source node is not in use
-The target node is not in use
Any clues?
Thanks!
11-24-2020 06:53 AM
I'd like to quick double check on something, when you start the database have you looked inside to check it yourself? and checked the import report-file file contains no errors? I drop my import report in the mounted /import folder with
--report-file=importreport.txt
Manually run queries to check node and relationship counts are correct, spot check some example properties then while the database is still running check the docker logs
docker logs containername
I'm going to guess the database is fine?
11-26-2020 04:40 AM
Hi,
If I start the database everything works until I try to query some of the links affected. I am using a tool called Linkurious that consumes the database, and one of the first things to do after setting the connection is to index the database. When I try to create this index is when I get lots of errors and the whole process fails.
So... the database seems to be partially corrupted, and as long as you don't consume any of the affected nodes/links, there are no errors. The moment you hit one of the corrupted links/nodes, you get the errors. If you run the check-consistency command, you get a big list of errors listed.
My feeling is that for some reason the docker version is interpreting some special characters on the csv files slightly different than the Windows one, and this is causing it not to skip the bad-relationships the same way, thus generating some bad relationships.
I will try again using the parameters you have suggested on your previous comment: --ignore-empty-strings and trim-strings.
Thanks
11-28-2020 07:04 PM
It is possible there is a rare bug, but from my experience (with a lot of different types of dirty data) every time I had an issue with the data neo4j-admin import
threw errors either back to the console, or into the import log. You may have done this already, but at times I've had to step through each step one a time to make sure I don't miss any error messages. For good measure I'd also check docker logs containername
and the neo4j logs inside the container (right after the import command), I map the logs folder to a volume, but if you haven't you can bash into the container to check them like this
docker exec -it containername bash
I have no name!@8cb123ebc39a:/var/lib/neo4j$ cd logs
I have no name!@8cb123ebc39a:/var/lib/neo4j/logs$ ls
debug.log security.log
cat debug.log
...
11-30-2020 07:03 AM
Hi again,
I tried with the suggested options enabled:
--ignore-empty-strings=true
--normalize-types=true
--trim-strings=false \
and this seems to have solved the issue. Why the Windows version worked fine without these parameters seems to be another discussion...
Many thanks for all the support!
Regards.
All the sessions of the conference are now available online