Neo4j

paolodipietro58 · ‎05-05-2022

I'm running neo4j server community 4.4.5 with browser 4.4.3 on Ubuntu 20.04 machine.

This is from the online manual:

This is the error:

And this is the query

:auto LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row FIELDTERMINATOR ';'

CALL {
WITH row
WITH row
    WHERE row.id IS NOT NULL

OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row, bankAccount, organisation

CALL {
    WITH transaction, row, bankAccount
    WITH transaction, row, bankAccount
        WHERE bankAccount IS NOT NULL

    MERGE (transaction)-[h:HAS]->(bankAccount)
}

CALL {
    WITH transaction, row, bankAccount
    WITH transaction, row, bankAccount
        WHERE bankAccount IS NULL
    
    MERGE (error:Error)
    MERGE (transaction)-[h:HAS_NO_BANK_ACCOUNT]->(error)
}

CALL {
    WITH transaction, row, organisation
    WITH transaction, row, organisation
        WHERE organisation IS NOT NULL

    MERGE (transaction)-[h:BELONGS_TO]->(organisation)
}

CALL {
    WITH transaction, row, organisation
    WITH transaction, row, organisation
        WHERE organisation IS NULL
    
    MERGE (error:Error)
    MERGE (transaction)-[h:HAS_NO_ORGANISATION]->(error)
}

} IN TRANSACTIONS;

I tried to make some changes but with no luck!

Any suggestion?

glilienfield · ‎05-06-2022

Try this code. See in the each cypher statement in each 'with' clause, it terminates with a 'RETURN 1' statement. This causes the apoc 'when' method to return a value in the 'yield value' statement, which gets merged with the current result. Without returning a value, the current result gets merged with 'null' and the query stops since there is no result to process. At least this is my hypothesis. I got it to work locally when I returned a value.

:auto 
LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct MERGE (tran)-[h:HAS]->(bankAcct) RETURN 1', 
    'WITH $tran as tran  MERGE (error:Error) MERGE (tran)-[h:HAS_NO_BANK_ACCOUNT]->(error) RETURN 1',
    {tran: transaction, bankAcct: bankAccount}) yield value

    with transaction, row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org MERGE (tran)-[h:BELONGS_TO]->(org) RETURN 1',
    'WITH $tran as tran MERGE (error:Error) MERGE (tran)-[h:HAS_NO_ORGANISATION]->(error) RETURN 1',
    {tran: transaction, org: organisation}) yield value

    RETURN 1

} IN TRANSACTIONS

RETURN 1

View solution in original post

Rcolinp · ‎05-05-2022

Hey there!

From a brief peek at your cypher code snippet, it appears you are hitting that error due to a syntax error on row 5? I was able to reproduce your query above (different data) using v4.4.5 (I am in v4.4.5 browser though).

I believe you are seeing an error because you are following the CALL { with row with row with a WHERE clause.

i.e. -> You have:

:auto LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row 
CALL { 
WITH row
WITH row 
      WHERE row.id IS NOT NULL
...

What you may want is to tuck that where clause in your OPTIONAL MATCH. Something like this:

:auto LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row 
CALL {  WITH row WITH row 
OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)}) WHERE ...

Reading through regarding auto transactions, I don't think you can use a WHERE clause there...

I hope this somewhat helps and there isn't much else to it. Post back here if you are seeing same thing. I'd be curious what is throwing that error if it isn't how you are stating WHERE row.id IS NOT NULL.

Cheers,
Rob

glilienfield · ‎05-05-2022

You may recall you had a query a few weeks back that had nested CALL subqueries and returned the same error. In that case, I was not able to get rid of the error and ended up refactoring the query not to have nested CALLs. I did the same in this case using the apoc 'when' method to replace the use of the inner CALL subqueries used to mimic if/else functionality.

The query executes, but I don't have test data to validate it is actually working. The 'RETURN 1' statements were necessary to make it execute, since I got errors about 'can't terminate without a RETURN statement'.

:auto LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row, bankAccount, organisation

    CALL apoc.when(bankAccount IS NOT NULL, 
    'MERGE ($tran)-[h:HAS]->($bankAcct)', 
    'MERGE (error:Error) MERGE ($tran)-[h:HAS_NO_BANK_ACCOUNT]->(error)',
    {tran: transaction, bankAcct: bankAccount}) yield value

    with transaction, organisation

    CALL apoc.when(organisation IS NOT NULL, 
    'MERGE ($tran)-[h:BELONGS_TO]->($org)',
    'MERGE (error:Error) MERGE ($tran)-[h:HAS_NO_ORGANISATION]->(error)',
    {tran: transaction, org: organisation}) yield value

    RETURN 1

} IN TRANSACTIONS
RETURN 1

paolodipietro58 · ‎05-06-2022

Unfortunately, it looks nice but run badly

Why not simply this way? (We need apoc.do.when because we are writing ....)

glilienfield · ‎05-06-2022

That is what I originally had, but I thought it would not recognize the variables 'transaction', 'organization', and bankAccount' when the apoc routine executed the cypher. I did not try it to verify. Did you try it?

The error looks like you can't use the passed parameters in the match pattern.

bennu_neo · ‎05-06-2022

Hi @glilienfield ,

With $tran he will look for the variable within the outer scope. With tran alone should work.

Bennu

Oh, y’all wanted a twist, ey?

paolodipietro58 · ‎05-06-2022

The second WITH is used to filter the row.id, and it is trick because in a call the first WITH cannot be filtered.
It has ran until now, I don't think it is the problem, because if I remove it the problem remains.
I'm just moving to the latest 4.4.6, then let you know ....

glilienfield · ‎05-05-2022

I tried another approach that allowed me to eliminate the extra return statements. I used some 'list filtering' to simulate an 'if' capability. Each of the lists 'bankAcctNotNull', 'bankAcctIsNull', 'organisationNotNull', and 'organisationIsNull' will either be empty or contain a single value of '1', depending on the 'is null / is not null' conditions. As such, the 'foreach' loops will either execute one time or zero times.

Again, the query runs but I have not tested it.

:auto LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row, bankAccount, organisation, 
    [i in [1] where bankAccount IS NOT NULL] as bankAcctNotNull,
    [i in [1] where bankAccount IS NULL] as bankAcctIsNull,
    [i in [1] where organisation IS NOT NULL] as organisationNotNull,
    [i in [1] where organisation IS NULL] as organisationIsNull

    foreach(i in bankAcctNotNull | 
        MERGE (transaction)-[h:HAS]->(bankAccount)
    )

    foreach(i in bankAcctIsNull | 
        MERGE (error:Error) 
        MERGE (transaction)-[h:HAS_NO_BANK_ACCOUNT]->(error)
    )

    foreach(i in organisationNotNull | 
        MERGE (transaction)-[h:BELONGS_TO]->(organisation)
    )

    foreach(i in organisationIsNull | 
        MERGE (error:Error) 
        MERGE (transaction)-[h:HAS_NO_ORGANISATION]->(error)
    )

} IN TRANSACTIONS

paolodipietro58 · ‎05-05-2022

@glilienfield The latter looks really interesting! I'll try it tomorrow!

paolodipietro58 · ‎05-06-2022

@glilienfield @Rcolinp I used the first solution from @glilienfield : it is much more readable of the second, despite the return statement.
You can remove the second return 1 statement: it runs without.

So, the problem key are the nested calls: if you have a running query with CALL {} IN TRANSACTIONS and add a nested CALL {} doing nothing, the entire query fails with an error.

While the same query without the inner call, succeeds (without any return!):

This latest to demonstrate the usage of the double WITH ROW inside the CALL. With only one it doesn't run with the WHERE clause:

Now a question is still open: What is the meaning of using :auto combined with using WITH TRANSACTIONS OF 100 ROWS? Who has the precedence? :auto or 100?

glilienfield · ‎05-06-2022

I had passed the variables from the outer scope as parameters to the apoc method, then referenced the parameters in the cypher. It looks like it does not like using parameters in replacement of match binder variables.

I just did an experiment. the apoc method did not recognize variables defined outside the method. This make sense to me, as the cypher will not have knowledge of these variables when it is executed on the server.

I tried this as a work around. I bound the passed parameters to variables in a 'with' clause, which can them be used in the match pattern.

:auto LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row, bankAccount, organisation

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct MERGE (tran)-[h:HAS]->(bankAcct)', 
    'WITH $tran as tran  MERGE (error:Error) MERGE (tran)-[h:HAS_NO_BANK_ACCOUNT]->(error)',
    {tran: transaction, bankAcct: bankAccount}) yield value

    with transaction, organisation

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org MERGE (tran)-[h:BELONGS_TO]->(org)',
    'WITH $tran as tran MERGE (error:Error) MERGE (tran)-[h:HAS_NO_ORGANISATION]->(error)',
    {tran: transaction, org: organisation}) yield value

    RETURN 1

} IN TRANSACTIONS

RETURN 1

paolodipietro58 · ‎05-06-2022

I'll give you another point to think about: the :auto keyword doesn't like to the cypher-shell!

paolodipietro58 · ‎05-06-2022

Well, I gave it this version and it partially run:

But I have 895 nodes, and if I check I have 895 :HAS_NO_BANK_ACCOUNT but 0 :HAS_NO_ORGANIZATION AND 0 :BELONGS_TO. Still cannot explain

paolodipietro58 · ‎05-06-2022

@glilienfield @Rcolinp @bennu.neo Yes, I tried a lot of times, the first APOC.DO.WHEN runs fine, but the second APOC.DO.WHEN is never executed!

glilienfield · ‎05-06-2022

Can you please try reversing the order? Does the 'organisation' one now run and the 'bankAccount' when doesn't?

paolodipietro58 · ‎05-06-2022

Already done: it stops after the first one!

paolodipietro58 · ‎05-06-2022

I have already tried this too: moving the optional matches below doesn't change anything.

BTW: You other solutions run as a charme, but I would like to figure this out and understand this unusual behavior

glilienfield · ‎05-06-2022

with the return statements added to the 'when' cypher? I tested it locally with similar code and it executed both 'when' statements.

paolodipietro58 · ‎05-06-2022

Yes, It don't execute the second when. I changed the names to the relationships, so I can check if it creates one branch or the other, and I have no relationships from the second WHEN.

WITH row
    WHERE row.transactionid IS NOT NULL

MERGE (transaction:Transaction { transactionid : row.transactionid })
        SET transaction = row,
            transaction.uuid = apoc.create.uuid(),
            transaction.consumer_id = NULL // remove the attribute no more used

    with transaction, row
    
    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org MERGE (tran)-[h:TRANSACTION_BELONGS_TO_ORGANISATION]->(org)',
    'WITH $tran as tran MERGE (error:Error) MERGE (tran)-[h:TRANSACTION_HAS_NO_ORGANISATION]->(error)',
    {tran: transaction, org: organisation}) yield value AS X

    WITH transaction, row

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct MERGE (tran)-[h:TRANSACTION_HAS_BANK_ACCOUNT]->(bankAcct)', 
    'WITH $tran as tran  MERGE (error:Error) MERGE (tran)-[h:TRANSACTION_HAS_NO_BANK_ACCOUNT]->(error)',
    {tran: transaction, bankAcct: bankAccount}) yield value AS Y


return 1

} IN TRANSACTIONS OF 5000 ROWS

return 1;

glilienfield · ‎05-06-2022

You can change the return values from all 1's to different values and return them. You can then see which cypher statements executed. maybe add a 'limit 10' or so after the first 'where' clause so it doesn't execute all the rows.

The above code you pasted does not have the 'return' statements in the apoc 'when' cypher. I was able to get both 'when' statements to execute in my prototype only when I returned something from the first 'when' cypher; otherwise, the query stops since it produces no results.

paolodipietro58 · ‎05-06-2022

Please, show me what you mean with a return statements in the apoc.when cypher.
Anyway, I can't try now because neo4j is not responding: it' loading millions transactions using your other query and, even I split the input file in chunks of 10000lines (with shell split) and I'm running in the cypher shell, the browser doesn't respond. I'll have to wait it will finish to make more tests!

glilienfield · ‎05-06-2022

Try this code. See in the each cypher statement in each 'with' clause, it terminates with a 'RETURN 1' statement. This causes the apoc 'when' method to return a value in the 'yield value' statement, which gets merged with the current result. Without returning a value, the current result gets merged with 'null' and the query stops since there is no result to process. At least this is my hypothesis. I got it to work locally when I returned a value.

:auto 
LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct MERGE (tran)-[h:HAS]->(bankAcct) RETURN 1', 
    'WITH $tran as tran  MERGE (error:Error) MERGE (tran)-[h:HAS_NO_BANK_ACCOUNT]->(error) RETURN 1',
    {tran: transaction, bankAcct: bankAccount}) yield value

    with transaction, row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org MERGE (tran)-[h:BELONGS_TO]->(org) RETURN 1',
    'WITH $tran as tran MERGE (error:Error) MERGE (tran)-[h:HAS_NO_ORGANISATION]->(error) RETURN 1',
    {tran: transaction, org: organisation}) yield value

    RETURN 1

} IN TRANSACTIONS

RETURN 1

paolodipietro58 · ‎05-06-2022

Understood: will try ASAP. Thank you anyway!

paolodipietro58 · ‎05-09-2022

@glilienfield @bennu.neo @Rcolinp Justr for your info, I loaded 11 million transactions of different kind, using the latest approach proposed by @glilienfield.

This approach using apoc turned out to be by far the most readable one (after a bit of a sprucing up of the query!)

This is the result:

LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row FIELDTERMINATOR ';'

CALL {

WITH row

WITH row
    WHERE row.transactionid IS NOT NULL

MERGE (transaction:Transaction { transactionid : row.transactionid })
        SET transaction = row,
            transaction.uuid = apoc.create.uuid(),
            transaction.consumer_id = NULL // remove the attribute no more used

    with transaction, row
    
    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org 
        MERGE (tran)-[h:TRANSACTION_BELONGS_TO_ORGANISATION]->(org) 
        RETURN 1',
    'WITH $tran as tran 
        MERGE (error:Error) 
        MERGE (tran)-[h:TRANSACTION_HAS_NO_ORGANISATION]->(error) 
        RETURN 1',
    {tran: transaction, org: organisation}) yield value AS X

    WITH transaction, row

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct 
        MERGE (tran)-[h:TRANSACTION_HAS_BANK_ACCOUNT]->(bankAcct)
        RETURN 1', 
    'WITH $tran as tran  
        MERGE (error:Error) 
        MERGE (tran)-[h:TRANSACTION_HAS_NO_BANK_ACCOUNT]->(error) 
        RETURN 1',
    {tran: transaction, bankAcct: bankAccount}) yield value AS Y

return transaction

} IN TRANSACTIONS OF 5000 ROWS

return count(transaction)

The story will follow with a new discussion .... but this is an achieved result I'd like to share!

Thank you to everybody!

bennu_neo · ‎05-09-2022

Super @paolodipietro58 !

So @glilienfield is suggesting APOC solutions now.

Oh, y’all wanted a twist, ey?

paolodipietro58 · ‎05-09-2022

Well, we make some trial, and the APOC.do.when looks better, also in terms of query readability.

paolodipietro58 · ‎05-27-2022

@glilienfield @bennu.neo @Rcolinp I have a little good new on this:

I discovered tha you can remove the WITH statement inside and this further simplify the query, just keep attention to the names!

    CALL apoc.do.when(organisation IS NOT NULL, 
    '   MERGE (tran)-[h:TRANSACTION_BELONGS_TO_ORGANISATION]->(org) 
        RETURN 1',
    '   MERGE (error:Error) 
        MERGE (tran)-[h:TRANSACTION_HAS_NO_ORGANISATION]->(error) 
        RETURN 1',
    {tran: transaction, org: organisation}) yield value AS X

glilienfield · ‎05-06-2022

I think it may be caused by not returning a value from the 'when' cypher statements. I added returns. I also rearranged some things, but that should have no effect.

Try this:

:auto 
LOAD CSV WITH HEADERS FROM 'file:///Book1.csv' AS row FIELDTERMINATOR ';'
WITH row
WHERE row.id IS NOT NULL

CALL {
    WITH row

    MERGE (transaction:Transaction { id : toInteger(row.id) })
		SET transaction = row,
			transaction.uuid = apoc.create.uuid(),
            transaction.id = toInteger(row.id),
            transaction.consumer_id = NULL // remove the attribute no more used

    WITH transaction, row

    OPTIONAL MATCH (bankAccount:BankAccount {id: toInteger(row.bank_account_id)})

    CALL apoc.do.when(bankAccount IS NOT NULL, 
    'WITH $tran as tran, $bankAcct as bankAcct MERGE (tran)-[h:HAS]->(bankAcct) return 1', 
    'WITH $tran as tran  MERGE (error:Error) MERGE (tran)-[h:HAS_NO_BANK_ACCOUNT]->(error) return 1',
    {tran: transaction, bankAcct: bankAccount}) yield value

    with transaction, row

    OPTIONAL MATCH (organisation:Organisation {id: toInteger(row.organisation_id)})

    CALL apoc.do.when(organisation IS NOT NULL, 
    'WITH $tran as tran, $org as org MERGE (tran)-[h:BELONGS_TO]->(org) return 1',
    'WITH $tran as tran MERGE (error:Error) MERGE (tran)-[h:HAS_NO_ORGANISATION]->(error) return 1',
    {tran: transaction, org: organisation}) yield value

    RETURN 1

} IN TRANSACTIONS

RETURN 1

glilienfield · ‎05-09-2022

I agree, it is much more readable than the other approach.

glilienfield · ‎05-27-2022

That is very interesting, so I looked at the source code to understand how that works. It turns out the apoc procedure passes the parameter map to the neo4j API 'execute' method as parameters, so they can be referenced as parameters, i.e. with a '$' sign. It also turns out the the procedure creates a 'WITH' clause with all the parameter values individually assigned to their parameter's names, i.e. 'WITH value1 as key1, value2 as key2, ...' This 'WITH' clause is prepended to both the 'if' cypher and the 'else' cypher before they are passed to the 'execute' method. This allows the parameter values to be referenced directly as cypher variables. In summary, the apoc.do.when allows passed parameters to be references as parameters (with '$') and directly as cypher variables. This explains why both methods work. This approach is used in all of the apoc cypher methods that pass a parameter map.

Good find.

Neo4j

:auto transaction doesn't run as described in the manual