Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-18-2021 10:37 AM
My csv file has two columns:
nid embedding
001 [0.001, 0.002, 0.003]
002 [0.004, 0.005, 0.006]
My current load statement in Python driver is:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:' '})
yield map as row",
"MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount", {batchSize:1000, iterateList:true, parallel:true})
However, this will load the 2nd column embedding as a whole string, not an array of floats in Neo4j. How to load them as an array of floats? I can also change my embedding column format if it make things easier.
I am thinking something like this:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:' '})
yield map as row",
"
**WITH row.embedding = apoc.convert.fromJsonList(row.embedding)**
MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount", {batchSize:1000, iterateList:true, parallel:true})
But the added WITH before MERGE is disallowed. How to update the row's value before MERGE?
Solved! Go to Solution.
11-19-2021 03:07 AM
@lingvisa
I might be wrong, but the sep:' '
with multiple spaces looks weird to me,
because is equivalent to a sep:' '
(with a single space) because the load.csv
doesn't support multi char separator, so it considers only first char.
The provided file is a .tsv
, so it should be separated by tabs, and in this case the correct separator is sep:'TAB'
. Because with separator ' '
, would be considered also spaces between 0.001,
and 0.002
.
That is, if I have a csv file like yours, with tab
character (\t
) after nid
, 001
and 002
and other spaces are "real" whitespaces, I could be something like this (I used the apoc.map.setKey
to update a key):
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:'TAB'})
yield map as row",
"
WITH row
WITH apoc.map.setKey(row, 'embedding', apoc.convert.fromJsonList(row.embedding)) as row
MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount",
{batchSize:1000, iterateList:true, parallel:true})
11-19-2021 03:07 AM
@lingvisa
I might be wrong, but the sep:' '
with multiple spaces looks weird to me,
because is equivalent to a sep:' '
(with a single space) because the load.csv
doesn't support multi char separator, so it considers only first char.
The provided file is a .tsv
, so it should be separated by tabs, and in this case the correct separator is sep:'TAB'
. Because with separator ' '
, would be considered also spaces between 0.001,
and 0.002
.
That is, if I have a csv file like yours, with tab
character (\t
) after nid
, 001
and 002
and other spaces are "real" whitespaces, I could be something like this (I used the apoc.map.setKey
to update a key):
CALL apoc.periodic.iterate("
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:'TAB'})
yield map as row",
"
WITH row
WITH apoc.map.setKey(row, 'embedding', apoc.convert.fromJsonList(row.embedding)) as row
MERGE (m:Topic {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount",
{batchSize:1000, iterateList:true, parallel:true})
11-19-2021 10:43 AM
The sep:' ' is a display effect. When I compose the cypher statement from my code, delimiter is defined as '\t' in Python, but it shows as 2 white spaces on screen. I don't know the string 'TAB' can be used in here. Does it internally convert 'TAB' into '\t'?
It's good to know apoc.map.setKey() syntax, which is great. I also found another solution by modifying the apoc.load.csv() command:
CALL apoc.load.csv('topic.tsv', {nullValues:['','na','NAN',false], sep:' \t', mapping:{embedding:{array:true, arraySep:',', type:'float'}} })
The mapping parameter seems working well.
11-22-2021 06:24 AM
Exactly, the 'TAB'
is converted to '\t '
All the sessions of the conference are now available online