cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to Left Exclude Join optimally

mrksph
Node Clone

Hi,

I'm trying to write a query that would delete Entities in my Database that are not present in the file I use to load daily. 
E.G.:

DATABASE FILE
"A" "A"
"B" -
"C" "C"
"D" -
"E" "E"

 

I have 5 nodes in my database with the above codes in the "DATABASE" column and I also have 3 rows in the file I use to load my database with the above codes in the "FILE" column where "-" means not present. 

I'd like to write a query that would return "B" and "D". I think that would be called a "Left Excluding Join" in the SQL world. 

This is the query I came up with: 

 

LOAD CSV FROM 'http://my.server.com/file.txt' AS row FIELDTERMINATOR '~'
WITH row[1] as code
WITH collect(code) as CodesToDelete
MATCH (n:Client) WHERE NOT n.code IN CodesToDelete
SET n.shouldDelete = true
RETURN n.code;

 

 

It actually works for the example I provided but when I use it against my Production db like this:

cat query.cypher | cypher-shell -u $user -p $pwd -a $address > codesToDelete.txt

The query won't finish.


I've tried to circumvent this by adding line 5 so I could then delete the marked nodes but it doesn't seem to be any more efficient. I know I could use the iterate Apoc procedure but I can't manage to write it properly.


How would I rewrite this query so it could be used against a db with 2 mill+ nodes and 2 mill+ relationships?


Thanks in advance

2 REPLIES 2

Hi @mrksph 
I think your Cypher is correct.

This is the data with delimiter '~'.

"A"~"A"
"B"~-
"C"~"C"
"D"~-
"E"~"E"

Create the nodes.

LOAD CSV FROM 'file:///IRIS_Cpty_JQUEST.txt' AS row FIELDTERMINATOR '~'
CREATE (:Client {code: row[0]})

You can see five nodes.
スクリーンショット 2022-08-09 22.36.43.png

Then, SET n.shouldDelete = true

LOAD CSV FROM 'file:///IRIS_Cpty_JQUEST.txt' AS row FIELDTERMINATOR '~'
WITH row[1] as code
WITH collect(code) as CodesToDelete
MATCH (n:Client) WHERE NOT n.code IN CodesToDelete
SET n.shouldDelete = true
RETURN n.code;

'B' and 'D' are results.
スクリーンショット 2022-08-09 22.35.31.png









I think I understand...you want to delete the nodes that have a '-' in the FILE column. If so, try this for executing in the browser:

:auto load csv from "file:///Book1.csv" as row FIELDTERMINATOR '~' 
call {
    with row
    with row
    where row[1] = "-"
    match(n:Client)
    where n.code = row[0]
    delete n
    return row[0] as nodeCodeToDelete
} in transactions of 1000 rows
return nodeCodeToDelete

Remove the 'return' statements if you don't need to return the result. 

If using cypher-shell, which does not support implicit transactions, you can try this.  It worked for the test data your provided. 

call apoc.periodic.iterate(
    '
        load csv from "file:///Book1.csv" as row FIELDTERMINATOR "~"
        with row
        where row[1] = "-"
        return row[0] as codeToDelete
    ',
    '
        match(n:Client)
        where n.code = codeToDelete
        delete n
    ',
    {batchSize: 1000}
)

 

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online