Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-26-2018 10:38 AM
Guys,
I am trying to run the following code and it is giving me a **"non-related" error:
USING PERIODIC COMMIT 3000 LOAD CSV WITH HEADERS
FROM "file:///tse-votacao_candidato_municipio_zona-municipal-2014-db.07.facts.csv" AS row
FIELDTERMINATOR ';'
MATCH
(c:City {tse_code: toInteger(row.cod_municipio_tse)}),
(:Publication {auto_name: row.publication})<-[:is_present_on]-(m:Metric {auto_name: row.cod_metrica}),
(:State {acronym: row.sigla_uf})<-[:belongs_to]-(z:ElectoralZone {code: row.cod_zona_eleitoral}),
(:Election {year: date(row.ano_eleicao), auto_name: row.cod_descricao_eleicao})<-[:round_of]-
(:ElectionRound {number: row.num_turno})<-[:runs_in]-(cand:Candidate {code: row.sq_candidato})
CREATE
(afe:Measurement)
SET
afe.value = toInteger(row.total_votos),
afe.unit = 'votes',
afe.date = date(row.data_arquivo)
WITH
m, cand, afe, c, z
CREATE UNIQUE
(afe)-[:taken_from]->(c),
(afe)-[:taken_of]->(m),
(afe)-[:filtered_by]->(cand),
(afe)-[:filtered_by]->(z);
I consider the error as non-related because I have tried running the code above without the updating part for each of the MATCH
s clauses and it runs without problem. Apparently, the problem occurs only when I put all of them together.
The first 5 lines of the file (more than 7M lines on the file) are listed below:
publication;cod_metrica;cod_descricao_eleicao;ano_eleicao;num_turno;sigla_uf;cod_zona_eleitoral;cod_municipio_tse;sq_candidato;data_arquivo;total_votos
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000003;2018-05-17;1508
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000001;2018-05-17;3027
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000048;2018-05-17;0
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000146;2018-05-17;21
votacao_candidato_municipio_zona-municipal-2014;votacao-nominal-por-canditato-por-eleicao-e-zona;eleicoes-gerais-2014;2014;1;AC;9;01007;10000000152;2018-05-17;2540
I get the following error:
Neo.DatabaseError.General.UnknownError: unknown value: (2014-01-01) of type class java.time.LocalDate)
What might be going on here?
Thanks in advance,
10-26-2018 01:03 PM
Guys, I would like to "increase" my suspicion that this is a bug.
Since I am stuck with this error, I started trying different approaches to solve my problem (mainly refactoring my query). When I tried this query, the java.time.LocalDate
error vanished!
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS
FROM "file:///tse-votacao_candidato_municipio_zona-municipal-2014-db.07.facts.csv" AS row
FIELDTERMINATOR ';'
MATCH
(c:City),
(p:Publication)<-[:is_present_on]-(m:Metric),
(s:State)<-[:belongs_to]-(z:ElectoralZone),
(e:Election)<-[:round_of]-(er:ElectionRound)<-[:runs_in]-(cand:Candidate)
WHERE
c.tse_code = toInteger(row.cod_municipio_tse)
and p.auto_name = row.publication
and m.auto_name = row.cod_metrica
and s.acronym = row.sigla_uf
and z.code = row.cod_zona_eleitoral
and e.year = date(row.ano_eleicao)
and e.auto_name = row.cod_descricao_eleicao
and er.number = row.num_turno
and cand.code = row.sq_candidato
CREATE
(afe:Measurement)
SET
afe.value = toInteger(row.total_votos),
afe.unit = 'votes',
afe.date = date(row.data_arquivo)
WITH
m, cand, afe, c, z
MERGE
(afe)-[:taken_from]->(c)
MERGE
(afe)-[:taken_of]->(m)
MERGE
(afe)-[:filtered_by]->(cand)
MERGE
(afe)-[:filtered_by]->(z);
Now I am struggling with OutOfMemoryError
, but at least I know to what this is related...
10-26-2018 01:52 PM
I suspect you're suffering from the well known "eager" Problem, see https://markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
10-26-2018 02:41 PM
Stefan,
I profiled my query earlier and I found no 'Eager' on the plan I saw.
It is more likely that my memory constraints are the responsible here...
But, the main question remains: why did the LocalDate
error vanish with the code refactoring?
10-27-2018 01:38 AM
I've just tried your second statement exlcuding PERIODIC COMMIT
and prefixed it with EXPLAIN
. The query plan indeed does contain an eager. So the whole csv import will be run in a single transaction which is the cause for the OOM.
Split the action into multiple smaller ones not showing eager
and iterate multiple times over the large file.
Regarding the date error: I couldn't reproduce this.
10-27-2018 03:09 AM
Stefan,
I was able to run the statement without OOM Error decreasing my batch size on the periodic commit.
As I said, I checked here and I did not find the eager step when I profiled (not explained) the statement.
(If I am not mistaken, profile runs the query, but explain just a guesses what would happen).
Thank you for website you sent (good material!!) but I would like to focus on the other error, if possible.
Best regards,
10-27-2018 03:25 AM
Just some additional thoughts on why we have different outputs:
I used profile, not explain.
The query optimizer takes a lot of info when choosing how to run the query. I guess the presence of my indexes and some statistics plays a important role here.
Although the article you sent is really interesting, it is for a older version of Neo4j. I don't know if this eager step is currently as common as it was before.
Regards,
10-27-2018 03:20 PM
It is not as common anymore but still shows up and if it does it disables periodic commit effectively.
Perhaps the localdate issue came up b/c it got further in the data?
Do you have a value of (2014-01-01)
in your data file (with the parenthesis)
10-27-2018 03:55 PM
No Michael,
The only dates with 2014 are related to :Election in the Match part of the statement.
They were previously imported to Neo4j with zero problems previously. That's why I double checked the Match statements one a one.
Thanks,
10-27-2018 06:03 PM
you can switch to
call apoc.periodic.iterate(
'LOAD CSV ... AS row RETURN row',
'MATCH ...',
{batchSize:10000, iterateList:true});
that should get rid of the OOM
10-29-2018 09:26 AM
Hi @michael.hunger,
Can you please clarify the differences between the suggested apoc.periodic.iterate
and the previous LOAD CSV
?
By the way, should I keep the periodic commit
in the inner statement or is it useless in this approach?
Thanks!
11-21-2018 02:49 AM
The OOM happens b/c of too large memory size of the transaction. Periodic iterate batches it up.
Cypher itself has more strict guarantees about visibility that's why it's possible to accidentally disable the PERIODIC COMMIT.
All the sessions of the conference are now available online