Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-16-2020 07:18 AM
I have the following query:
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"]
WITH drug,r1,c2,r2,covid order by (r1.freq+r2.freq) desc limit 20
CALL {
WITH drug,r1,c2
match (c_xy)<-[r_in_xy:IS_IN]-(s_xy)<-[r_extr_xy:Extracted_From]-(i_xy:Instance {predicate: type(r1)})-[r_sub_x:Inst_Subject]->(drug),
(i_xy)-[r_obj_y:Inst_Object]->(c2)
WITH c_xy, s_xy
LIMIT 3
RETURN collect("PMID:"+c_xy.pmid+"-"+c_xy.pyear+": "+s_xy.sentence) as sent_xy
}
CALL {
WITH c2,r2,covid
match (c_yz)<-[r_in_yz:IS_IN]-(s_yz)<-[r_extr_yz:Extracted_From]-(i_yz:Instance{predicate:type(r2)})-[r_sub_y:Inst_Subject]->(c2),
(i_yz)-[r_obj_z:Inst_Object]->(covid)
WITH c_yz, s_yz
LIMIT 3
RETURN collect("PMID:"+c_yz.pmid+"-"+c_yz.pyear+": "+s_yz.sentence) as sent_yz
}
RETURN sent_xy,drug.cui+":"+drug.name+"-"+type(r1)+"-> "+c2.cui+":"+c2.name+"-"+type(r2)+"-> "+covid.cui+":"+covid.name as Drug_Y_Covid,sent_yz;
The query plan is as follows:
My question is how can I optimize this query. I think the two main issues are:
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"]
WITH drug,r1,c2,r2,covid order by (r1.freq+r2.freq) desc limit 20
As you can notice I have already used a hint index to start from covid instead of the drug. Are there any further improvements I could make? As of now, the query gets relatively fast from covid to c2, but then it needs to expand on all the relationships of c2 and that is quite slow and then also filter the drug by its cui id. In this case, index doesn't play any role.
The second issue is the filtering of the instance predicate. There are a lot of instance nodes, so this is not really fast. I have tried adding an index on the property, but the query planner doesn't use it. My question is how could I go about optimizing my query, and secondly maybe refactoring my graph for best performance. I was thinking of using labels instead of properties for predicate, but then again filtering by dynamic node label is not as effective.
match (c_xy)<-[r_in_xy:IS_IN]-(s_xy)<-[r_extr_xy:Extracted_From]-(i_xy:Instance {predicate: type(r1)})-[r_sub_x:Inst_Subject]->(drug)
Any help appreciated
12-16-2020 07:25 AM
In the middle column of your query plan, it looks like the database is expanding all nodes off of all possible drugs, which is a problem -- and not using an index. Drug isn't labeled either, which is contributing.
Try rewriting this:
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid:Concept)
USING INDEX covid:Concept(cui)
where drug.cui in ["C0031853"] and covid.cui in ["C5203670","C5203676","C5203671"]
To something like this:
MATCH (drug:Drug { cui: "C0031853" })
WITH drug
MATCH (covid:Concept)
WHERE covid.cui in ["C5203670","C5203676","C5203671"]
WITH drug, covid
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid)
With the rest as usual. Notice the pattern of matching BOTH endpoints of the path before expanding the path, see if this helps
12-16-2020 07:49 AM
It doesn't seem to improve performance, it actually makes it worse by almost doubling the db hits...
When looking at the query planner it seems that it starts expanding from the drug rather than the covid, which makes it worse.
I have also tried using
MATCH (drug:Drug { cui: "C0031853" })
WITH drug
MATCH (covid:Concept)
WHERE covid.cui in ["C5203670","C5203676","C5203671"]
WITH drug, covid
MATCH (drug)-[r1:INHIBITS|DISRUPTS|INTERACTS_WITH]->(c2)-[r2:CAUSES|ASSOCIATED_WITH|PREDISPOSES|AUGMENTS|AFFECTS]->(covid)
USING JOIN ON covid
but it is not obeying me and doesn't start to expand paths from covid, but sticks to the drug.
All the sessions of the conference are now available online