Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-28-2020 03:32 AM
Hi,
I want to create a database from a CSV file using LOAD CSV. A column called IDFILM conatins a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p:Pellicula label. Two other columns -DIRECCIO and INTERPRETS- may have a unique value or multiple values in each of their cells (and, sometimes, are empty). So, I want to split cells containing multiple values in order to generate separated nodes. Nodes from the column DIRECCIO will belong to the label d:Director, and nodes from the INTERPRETS column will belong to the label i:Interpret.
And then, I want to relate these isolated nodes from the columns DIRECCIO and INTERPRETS with the nodes from the IDFILM column through two different relationships:
(p:Pellicula)-[:ES_DIRIGIDA_PER]->(d:Director)
(p:Pellicula)-[:ES_INTERPRETADA_PER]->(i)
I've prepared the following query to load and transform CSV data into the Neo4j database, in the way I have explained.
LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala, SPLIT(row.DIRECCIO, ",") AS nomDirector, SPLIT(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
MERGE (d:Director {nomDirector: nomDirector})
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;
The point is that orders like "SPLIT(row.DIRECCIO, ",") AS nomDirector" just transform multivalued cells into lists of values, but not into separated nodes. In addition, I have anothe doubt. In some cells, values can be separated by two different separators. For exemple: "a, b, c and d" where delimiters are "," and "and".
Thank you in advance.
Miquel Centelles
01-28-2020 05:29 AM
Hi Miquel,
one suggestion would be using something like this:
UNWIND nomDirectorList AS nomDirector
before MERGE
01-28-2020 07:15 AM
Thank you very much, Pilar.
I've been working around your suggestion of using the clause UNWIND to expand the list of values into a sequence of rows. In Importing CSV Data into Neo4j https://neo4j.com/developer/guide-import-csv/#import-load-csv, there is an example of Cypher query to do it:
//split string of employee skills into separate nodes
LOAD CSV FROM 'file:///data.csv' AS row
MERGE (e:Employee {employeeId: row.Id})
UNWIND split(row.skills, ',') AS skill
MERGE (s:Skill {name: skill})
MERGE (e)-[r:HAS_EXPERIENCE]->(s);
The point is that I don't find out how to insert this clause in my query. It always returns errors like "WITH is required between SET and UNWIND".
Miquel Centelles
01-28-2020 07:34 AM
Would you try runing this sentence:
LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal, row.VERSIO AS versioPellicula, row.SINOPSI AS sinopsi, row.ORIGINAL AS titolOriginal, row.QUALIFICACIO AS qualificacio, datetime(row.ANY) AS anyProduccio, row.IDFILM AS pelliculaId, row.CARTELL AS cartell, row.ESTRENA AS dataEstrena, row.TRAILER AS trailer, row.TITOL AS titolCatala
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector
MERGE (d:Director {nomDirector: nomDirector})
UNWIND SPLIT(row.INTERPRETS, ",") AS nomInterpret
MERGE (i:Interpret {nomInterpret: nomInterpret})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;
01-28-2020 07:48 AM
Many thanks. It returns the error message:
Error. Neo.ClientError.Statement.SyntaxError
WITH is required between SET and UNWIND (line 5, column 1 (offset: 749))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"
01-28-2020 08:16 AM
May you can suppress SET statment and put all the attributes inside the CREATE clause?
01-28-2020 12:53 PM
It keeps on error:
WITH is required between CREATE and UNWIND (line 4, column 1 (offset: 156))
"UNWIND SPLIT(row.DIRECCIO, ",") AS nomDirector"
^
01-28-2020 03:43 PM
Try this:
LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaDef/pellicules.csv' AS row
WITH
row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,
row.VERSIO AS versioPellicula,
row.SINOPSI AS sinopsi,
row.ORIGINAL AS titolOriginal,
row.QUALIFICACIO AS qualificacio,
datetime(row.ANY) AS anyProduccio,
row.IDFILM AS pelliculaId,
row.CARTELL AS cartell,
row.ESTRENA AS dataEstrena,
row.TRAILER AS trailer,
row.TITOL AS titolCatala,
SPLIT(row.DIRECCIO, ",") AS nomDirector,
SPLIT(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
with p, nomDirector, nomInterpret
UNWIND RANGE(0,SIZE(nomDirector)-1) as i
MERGE (d:Director {normDirector: nomDirector[i]})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
UNWIND RANGE(0,SIZE(nomInterpret)-1) as j
MERGE (i:Interpret {nomInterpret: nomInterpret[j]})
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;
01-29-2020 01:32 AM
Thanks. I returns the following error:
Neo.ClientError.Statement.SyntaxError
WITH is required between CREATE and UNWIND (line 26, column 1 (offset: 998))
"UNWIND RANGE(0,SIZE(nomInterpret)-1) as j"
01-29-2020 05:58 AM
Finally, the Cypher query proposed by ameyasoft works, with just two chenges.
LOAD CSV WITH HEADERS FROM 'file:///CinemaEnCatalaCom/pellicules.csv' AS row FIELDTERMINATOR ';'
WITH
row.IDIOMA_x0020_ORIGINAL AS idiomaOriginal,
row.VERSIO AS versioPellicula,
row.SINOPSI AS sinopsi,
row.ORIGINAL AS titolOriginal,
row.QUALIFICACIO AS qualificacio,
datetime(row.ANY) AS anyProduccio,
row.IDFILM AS pelliculaId,
row.CARTELL AS cartell,
row.ESTRENA AS dataEstrena,
row.TRAILER AS trailer,
row.TITOL AS titolCatala,
split(row.DIRECCIO, ",") AS nomDirector,
split(row.INTERPRETS, ",") AS nomInterpret
CREATE (p:Pellicula {pelliculaId: pelliculaId})
SET p.idiomaOriginal = idiomaOriginal, p.versioPellicula = versioPellicula, p.sinopsi = sinopsi, p.titolOriginal = titolOriginal, p.qualificacio = qualificacio, p.anyProduccio = anyProduccio, p.cartell = cartell, p.dataEstrena = dataEstrena, p.trailer = trailer, p.titolCatala = titolCatala
WITH p, nomDirector, nomInterpret
UNWIND RANGE(0,SIZE(nomDirector)-1) AS i
MERGE (d:Director {nomDirector: nomDirector[i]})
CREATE (p)-[:ES_DIRIGIDA_PER]->(d)
WITH p, nomDirector, nomInterpret
UNWIND RANGE(0,SIZE(nomInterpret)-1) AS j
MERGE (i:Interpret {nomInterpret: nomInterpret[j]})
CREATE (p)-[:ES_INTERPRETADA_PER]->(i)
;
The changes are:
Thank you very much to you and Pilar Sáez!
Miquel Centelles
All the sessions of the conference are now available online