Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-24-2020 05:05 AM
After exporting UTF-8 characters to cypher using APOC (which works correctly), these are not imported correctly in neo4j.
A simple reproducible example - file 'utf8_test.cypher' (confirmed format as 'UTF-8' encoding):
:begin
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
:begin
UNWIND [{_id:0, properties:{x:"μ"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Test;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`) WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
command line:
bin\cypher-shell -u neo4j -p [password] < import\utf8_test.cypher
query:
MATCH (x:Test) RETURN x.x
returns:
"μ"
Any feedback appreciated.
neo4j 4.1.1
neo4j Desktop 1.3.4
08-24-2020 01:29 PM
Hello @folterj
Can you try this on your database?
CREATE (n:Test {title: "μ"});
MATCH (n:Test) RETURN n.title
On my database, I get the right result.
Regards,
Cobra
08-24-2020 02:37 PM
Hi @Cobra,
Yes, running this from the browser works fine. We have a database with UTF-8 characters in it, which we can export correctly as well. However, the importing the cypher does not work.
As I understand this is the best way to import - our APOC exported cypher has millions of nodes/relationships, optimally batched and uses param unwind (and even importing that takes a surprisingly long time).
08-24-2020 07:03 PM
Hi @folterj
I created utf8_test.cypher based on the text from begin to commit.
Then I used the same cypher-shell command you did.
It works correctly.
My operating environment:
macOS Catalina 10.15.6
Neo4j 4.1.1
Neo4j Desktop 1.3.4 (1.3.4.27)
08-24-2020 10:46 PM
Could we see it?
Unique constraint and UNWIND will make the load faster
What is the power of the database and the computer?
Can we wee how you load nodes and relations?
Regards,
Cobra
08-25-2020 09:03 AM
Hi, thanks for the quick responses.
@koji, I'm using Windows 10 64-bit. If you copied the text from the above then it should copy the 2-byte UTF8 character correctly from this page (I just checked) to reproduce the problem. So it seems it might be an issue specifically on Windows.
@Cobra, my initial post shows exactly how to reproduce the issue including its result (on Windows 10 64-bit). Regarding optimisation, from experience it seems CREATE is faster than MERGE. We create separate files for nodes, and for relationships (the latter indeed benefits from indexing), using unwind_batch_params, and automatic batching. APOC already uses our internal unique uuids, so no generated _id fields are added to the exported cyphers. The batching makes a huge difference, and also avoids neo4j becoming overwhelmed and crashing (well, it works before version 4.1 anyway). But it's still slow to my expectation. We use cypher-shell for loading. But this is a different topic to the issue of this post.
08-25-2020 10:36 PM
Hello @folterj
I'm a bit confused, I just executed your queries on a database and it's working correctly,
the correct value is returned (I'm on Windows)
Regards,
Cobra
08-26-2020 02:27 AM
Hi @Cobra,
Thanks, that's interesting, did you use cypher-shell from the command line for the import as well?
Also, I assume you have the same version of neo4j & Windows 10 64-bit?
If so maybe it's something specific to my local set-up as I seem to keep hitting problems. neo4j uses it's own java distribution that comes with the installation so that's all fine I assume. I can try to completely remove neo4j including any settings in other locations etc., and reinstall once more.
Joost.
08-26-2020 02:30 AM
No I did not, so I presume it's maybe coming from this if there is a problem. I just created a local database on Neo4j Desktop and executed your requests. I hope it helps
08-26-2020 02:40 AM
Hi @Cobra,
The import is the main issue, our export cypher is about 1.3GB which is not feasible let alone optimal to paste and execute as a query into the browser.
We're looking into this avenue as we're currently not able to import a dump file from 4.0.4 to 4.1.x without neo4j crashing persistently. We hope to avoid this issue using cypher import, though apart from the UTF8 and automatic float to int conversion in APOC, we have not been able to find a working solution this way either unfortunately. But that's again another issue (Upgrade fails from Enterprise 4.0.4 to 4.1.0)
08-26-2020 02:45 AM
What is it in your export cypher?
Looks like you got a lot of problems
03-30-2021 11:42 PM
Hello,
I definitely confirm cypher-shell
for Windows does not read utf8 encoded file properly. Please see this this reported github issue.
So far the only workaround I found is to use notepad++ and use the "Encoding > Convert to ANSI" on my cypher input file. Then cypher-shell
correctly process my file and I can see the special characters in my neo4j browser.
All the sessions of the conference are now available online