Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-09-2022 06:07 AM - edited 08-09-2022 06:14 AM
Hi,
I'm trying to write a query that would delete Entities in my Database that are not present in the file I use to load daily.
E.G.:
DATABASE | FILE |
"A" | "A" |
"B" | - |
"C" | "C" |
"D" | - |
"E" | "E" |
I have 5 nodes in my database with the above codes in the "DATABASE" column and I also have 3 rows in the file I use to load my database with the above codes in the "FILE" column where "-" means not present.
I'd like to write a query that would return "B" and "D". I think that would be called a "Left Excluding Join" in the SQL world.
This is the query I came up with:
LOAD CSV FROM 'http://my.server.com/file.txt' AS row FIELDTERMINATOR '~'
WITH row[1] as code
WITH collect(code) as CodesToDelete
MATCH (n:Client) WHERE NOT n.code IN CodesToDelete
SET n.shouldDelete = true
RETURN n.code;
It actually works for the example I provided but when I use it against my Production db like this:
cat query.cypher | cypher-shell -u $user -p $pwd -a $address > codesToDelete.txt
The query won't finish.
I've tried to circumvent this by adding line 5 so I could then delete the marked nodes but it doesn't seem to be any more efficient. I know I could use the iterate Apoc procedure but I can't manage to write it properly.
How would I rewrite this query so it could be used against a db with 2 mill+ nodes and 2 mill+ relationships?
Thanks in advance
08-09-2022 06:50 AM
Hi @mrksph
I think your Cypher is correct.
This is the data with delimiter '~'.
"A"~"A"
"B"~-
"C"~"C"
"D"~-
"E"~"E"
Create the nodes.
LOAD CSV FROM 'file:///IRIS_Cpty_JQUEST.txt' AS row FIELDTERMINATOR '~'
CREATE (:Client {code: row[0]})
You can see five nodes.
Then, SET n.shouldDelete = true
LOAD CSV FROM 'file:///IRIS_Cpty_JQUEST.txt' AS row FIELDTERMINATOR '~'
WITH row[1] as code
WITH collect(code) as CodesToDelete
MATCH (n:Client) WHERE NOT n.code IN CodesToDelete
SET n.shouldDelete = true
RETURN n.code;
'B' and 'D' are results.
08-09-2022 11:14 AM
I think I understand...you want to delete the nodes that have a '-' in the FILE column. If so, try this for executing in the browser:
:auto load csv from "file:///Book1.csv" as row FIELDTERMINATOR '~'
call {
with row
with row
where row[1] = "-"
match(n:Client)
where n.code = row[0]
delete n
return row[0] as nodeCodeToDelete
} in transactions of 1000 rows
return nodeCodeToDelete
Remove the 'return' statements if you don't need to return the result.
If using cypher-shell, which does not support implicit transactions, you can try this. It worked for the test data your provided.
call apoc.periodic.iterate(
'
load csv from "file:///Book1.csv" as row FIELDTERMINATOR "~"
with row
where row[1] = "-"
return row[0] as codeToDelete
',
'
match(n:Client)
where n.code = codeToDelete
delete n
',
{batchSize: 1000}
)
All the sessions of the conference are now available online