Neo4j

emroberts95 · ‎04-01-2021

I'm trying to find the number of nodes that are missing a particular relationship.

This query does not work, it simply returns the number of all nodes of this type:

MATCH (e:Enzyme) 
MATCH (r:Reaction) 
WHERE NOT (e)-[:CATALYZES]->(r)
RETURN count(DISTINCT r)

However, this one works as expected:

MATCH (r:Reaction) 
WHERE NOT (:Enzyme)-[:CATALYZES]->(r)
RETURN count(DISTINCT r)

Why?

andrew_bowman · ‎04-02-2021

Remember that operations in Cypher execute per row, so for however many l results there are, the next MATCH for l2 will execute for each of those (multiplies the rows). Then for those results, the 3rd MATCH executes, multiplying the number of results out again, and then for those the last MATCH on e will execute, and it will check, per row, if the enzyme binds to that particular l node and that it does not bind to that particular l2 and l3` node for that row.

Provided that your MATCHes can match to more than one row (if SMILES and InChiKey are not unique properties on :Ligand nodes), then this will not work.

Rather than prematching and forming a cartesian product from your MATCHes, just do one MATCH and embed the properties inline, that can filter then during expansion of the pattern:

MATCH (e:Enzyme) 
WHERE (e)-[:BINDS]->(:Ligand {SMILES:'O=O'}) AND NOT (:Ligand {SMILES:'OO'})<-[:BINDS]-(e)-[:BINDS]->(:Ligand {InChiKey:'OUUQCZGPVNCOIJ-UHFFFAOYSA-M'})
RETURN COUNT(e)

Since in this one you're only doing a MATCH on e, but not as part of a larger pattern, and the other patterns are only used as predicates, this will do a label scan on :Enzyme nodes and they will already be distinct, so no need to use the DISTINCT operator.

View solution in original post

andrew_bowman · ‎04-01-2021

Your two MATCHes create a cartesian product, every :Enzyme node paired with every :Reaction node (you can try a query RETURNing after your matches to see it in action). Then, for each pairing, it sees if that particular :Enzyme on that row has a :CATALYZES relationship with that particular :Reaction on that row, and if so removes the row. So all that's required for a :Reaction to remain as a result is for there to exist some :Enzyme where that enzyme doesn't catalyze that reaction, that allows that particular pairing/row to remain, and for the reaction to be counted.

Your second query correctly captures the use case of "find all :Reactions where there is no such pattern where an :Enzyme catalyzes the reaction."

emroberts95 · ‎04-02-2021

Thanks! What can I do if I need another MATCH statement with the same node later? Is this when I should use "WITH"?

emroberts95 · ‎04-02-2021

For example, this query:

MATCH (l:Ligand) WHERE l.SMILES ='O=O'
MATCH (l2:Ligand) WHERE l2.SMILES ='OO'
MATCH (l3:Ligand) WHERE l3.InChiKey ='OUUQCZGPVNCOIJ-UHFFFAOYSA-M'
MATCH (e:Enzyme) WHERE (e)-[:BINDS]->(l) AND NOT (l2)<-[:BINDS]-(e)-[:BINDS]->(l3)
RETURN COUNT(DISTINCT e)

I believe this query is returning twice as many :Enzyme nodes as it should. How do I filter out :Enzyme nodes that don't have a :BINDS relationship to Ligand nodes l2 and l3?

andrew_bowman · ‎04-02-2021

Remember that operations in Cypher execute per row, so for however many l results there are, the next MATCH for l2 will execute for each of those (multiplies the rows). Then for those results, the 3rd MATCH executes, multiplying the number of results out again, and then for those the last MATCH on e will execute, and it will check, per row, if the enzyme binds to that particular l node and that it does not bind to that particular l2 and l3` node for that row.

Provided that your MATCHes can match to more than one row (if SMILES and InChiKey are not unique properties on :Ligand nodes), then this will not work.

Rather than prematching and forming a cartesian product from your MATCHes, just do one MATCH and embed the properties inline, that can filter then during expansion of the pattern:

MATCH (e:Enzyme) 
WHERE (e)-[:BINDS]->(:Ligand {SMILES:'O=O'}) AND NOT (:Ligand {SMILES:'OO'})<-[:BINDS]-(e)-[:BINDS]->(:Ligand {InChiKey:'OUUQCZGPVNCOIJ-UHFFFAOYSA-M'})
RETURN COUNT(e)

Since in this one you're only doing a MATCH on e, but not as part of a larger pattern, and the other patterns are only used as predicates, this will do a label scan on :Enzyme nodes and they will already be distinct, so no need to use the DISTINCT operator.

ameyasoft · ‎04-01-2021

Try this:

MATCH (e:Enzyme)  
WHERE NOT (e)-[:CATALYZES]->()
RETURN count(DISTINCT e)

This gives you the enzymes that are not participating in any catalysis.

Neo4j

WHERE NOT query not working as expected to find nodes without a specific type of relationship