Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-04-2021 08:00 AM
Hi everyone please help me tune this cypher query below.
MATCH (s:Survey { is_active: true })
with s SKIP 0 LIMIT 25
MATCH (s)<-[:HAS_CLIENT]-(client:Client)
MATCH (s)-[:SAMPLE_TYPE]->(st:ProjectSettingElement)
MATCH (s)-[:CATEGORY]->(pc:ProjectSettingElement)
MATCH (s)-[:PROJECT_MODE]->(pmode:ProjectSettingElement)
OPTIONAL MATCH (s)-[:SALES_PERSON]->(sp:StaffUser)
OPTIONAL MATCH (s)-[:PROJECT_MANAGER]->(pman:StaffUser)
OPTIONAL MATCH (s)-[:CREATED_BY]->(staff:StaffUser)
OPTIONAL MATCH (s)-[:USES_CURRENCY]->(cy:ProjectSettingElement)
WITH DISTINCT s, sp, staff, pman, cy, st, pc, pmode, client
OPTIONAL MATCH (tot:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WHERE (tot)-[:HAS_COMPLETE_STATUS]->()
WITH DISTINCT s, sp, staff, pman, cy, st, pc, pmode, client,tot
OPTIONAL MATCH (total_reach:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH DISTINCT count(total_reach) as total, s, sp, staff, pman, cy, st, pc, pmode, client,tot
WITH CASE WHEN COUNT(tot) > 0 THEN ((SUM(toFloat(coalesce(tot.duration,0)))/1000)/60)/COUNT(tot) ELSE 0 END as avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client
//INCOMPLETE
OPTIONAL MATCH (incpm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WHERE NOT (incpm)-[:HAS_COMPLETE_STATUS]->()
WITH count(incpm) as incomplete, total, s, sp, staff, pman, cy, st, pc, pmode, client, avg_loi
//REACHED
OPTIONAL MATCH (pre_cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(pre_cmp) as reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
//RETURNED
OPTIONAL MATCH (ret:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]->()
WHERE (ret)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(ret) as returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
//SPEEDER
OPTIONAL MATCH (speeder:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_RESTRICTION_STATUS]->(:Status{name:"speeder"})
WITH count(speeder) as speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]->(:Status{name:"complete"})
WHERE (cmp)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(cmp) as complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (qta:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"quotafull"})
WHERE (qta)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(qta) as quotafull, complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (trm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"terminate"})
WHERE (trm)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(trm) as terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (qal:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"quality"})
WHERE (qal)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(qal) as quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
// PRESCREENER STATS
OPTIONAL MATCH (pre_cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
WITH count(pre_cmp) as prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total,
s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (pre_qta:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"quotafull"})
WITH count(pre_qta) as prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned,
reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (pre_trm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"terminate"})
WITH count(pre_trm) as prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned,
reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
OPTIONAL MATCH (pre_qal:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"quality"})
WITH count(pre_qal) as prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned,
reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client, incomplete
OPTIONAL MATCH (tot_all:SurveySupUser{survey_id:s.survey_id , env:'live'})
WITH COUNT(tot_all) - (prescreener_quality + prescreener_complete + prescreener_terminate + prescreener_quotafull) as prescreener_incomplete, prescreener_quality, prescreener_terminate,
prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
WITH CASE WHEN toInteger(complete) > toInteger(s.total_n) THEN toInteger(complete) - toInteger(s.total_n) ELSE 0 END as oa,
prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client,incomplete
WITH CASE WHEN toInteger(total) > 0 THEN (toFloat(complete) * toFloat(s.cpi)) / toFloat(total) * 100 ELSE 0
END as epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, pman, cy, st, pc, pmode, client, incomplete,
CASE WHEN complete > 0 OR terminate > 0 THEN (toFloat(complete) / toFloat((complete + terminate))) * 100 ELSE 0 END as ir,
CASE WHEN total > 0 THEN (toFloat(complete) / toFloat(total) ) * 100 ELSE 0 END as cr,
CASE WHEN toFloat(reached)>0 THEN toFloat(toFloat(incomplete)/toFloat(reached) *100) ELSE 0 END as br
OPTIONAL MATCH (s)-[:HAS_STATUS]->(pe)
WITH epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
//LANGUAGE
OPTIONAL MATCH (s)-[:USE_LANGUAGE]->(l:Language)
WITH {
language_name: l.language_name,
language_code: l.language_code
} as language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY s.modified_on DESC
//LINKS
OPTIONAL MATCH (s)-[:HAS_SURVEY_LINKS]->(sl:SurveyLinks)
WITH sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
OPTIONAL MATCH (sl)-[:HAS_LINKS]->(lnk:Link)
WITH collect({
link: lnk.link,
is_active: lnk.is_active,
type: lnk.type
}) as links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
//DEVICES
OPTIONAL MATCH (s)-[:HAS_DEVICES]->(dv:Devices)
WITH dv, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
//DEVICE OBJECTS
OPTIONAL MATCH (dv)-[:HAS_DEVICE]->(d:DeviceObject)
WITH d, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY d.device_name ASC
WITH collect({
device_name: d.device_name,
device_code: d.device_code,
is_active: d.is_active
}) as devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
//RESTRICTIONS
OPTIONAL MATCH (s)-[:HAS_RESTRICTIONS]->(rs:Restrictions)
WITH rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
//RESTRICTION OBJECTS
OPTIONAL MATCH (rs)-[:HAS_RESTRICTION]->(ro:RestrictionObject)
WITH collect({
restriction_name: ro.restriction_name,
restriction_code: ro.restriction_code,
is_active: ro.is_active
}) as restrictions, rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
OPTIONAL MATCH(s)-[:HAS_PROJECT_TYPE]->(pt:ProjectSettingElement)
WITH pt, restrictions, rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull,
complete, speeder, returned, reached, avg_loi, total, s, sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY s.survey_id DESC
RETURN collect({
client_survey: {
quality: quality,
terminate: terminate,
quotafull: quotafull,
complete: complete,
incomplete: incomplete,
total: total,
epc : toString(epc),
oa : oa,
cr : toString(cr) ,
br : toString(br) ,
reached : reached,
returned: returned,
speeder : speeder,
ir : toString(ir) ,
avg_loi : toString(avg_loi),
prescreener_complete : prescreener_complete,
prescreener_quotafull: prescreener_quotafull,
prescreener_terminate: prescreener_terminate,
prescreener_incomplete: prescreener_incomplete,
prescreener_quality : prescreener_quality
},
soft_launch : {
value: s.soft_launch_value,
pc: s.soft_launch_pc,
status: s.soft_launch_status
},
client_reached_traffic_limit: {
value: s.client_reached_traffic_limit_value,
status: s.client_reached_traffic_limit_status
},
epc_traffic_limit : {
value : s.epc_traffic_limit_value,
pc : s.epc_traffic_limit_pc,
status : s.epc_traffic_limit_status
},
links: links,
language: language,
allowed_devices: devices,
restrictions: restrictions,
survey_id: s.survey_id,
survey_name: s.survey_name,
status_id : s.status_id,
status_name : s.status_name,
additional_cost : s.additional_cost,
po_number : s.po_number,
status_color:pe.badge_color,
survey_url_is_multiple: s.survey_url_is_multiple,
tags: s.tags,
total_n: s.total_n,
cpi: s.cpi,
ir: s.ir,
loi: s.loi,
field_days: s.field_days,
survey_cost: s.survey_cost,
start_date: s.start_date,
end_date : s.end_date,
sales_person: sp.name,
sales_person_contact: sp.contact,
sales_person_uid: sp.u_id,
created_by: staff.u_id,
staff_contact: staff.contact,
staff_designation: staff.designation,
staff_uid: staff.u_id,
project_manager: pman.name,
project_manager_contact: pman.contact,
project_manager_designation: pman.designation,
project_manager_uid: pman.u_id,
currency_name: cy.name,
currency_badge_color: cy.badge_color,
currency_id: cy.p_id,
sample_name: st.name,
sample_badge_color: st.badge_color,
sample_id: st.p_id,
category_name: pc.name,
category_badge_color: pc.badge_color,
category_id: pc.p_id,
project_mode : pmode.p_id,
project_type_name : pt.name,
project_type_id: pt.p_id,
client_id:client.client_id,
company_name: client.company_name,
address: client.address,
billing_person: client.billing_person,
billing_email: client.billing_email,
main_contact_name: client.main_contact_name,
main_contact_mobile: client.main_contact_mobile,
main_contact_email: client.main_contact_email }) as survey_list
As a note: SurveySupUser(survey_id) is index and this particular node is having like 50k nodes now.
And the db hits for this query is
Cypher version: CYPHER 4.2, planner: COST, runtime: PIPELINED. 5139002 total db hits in 2630 ms.
Any suggestions will be helpfull
09-04-2021 10:00 PM
Hi,
There are a lot of match and optional match statements in this query and then the bits are carried forward from part to part using 'WITH'. At every 'WITH', I see the possibility of a large amount of cartesian products being possible which when pass to the next part increase them exponentially.
All this is the reason that is leading to such a high db hit and the cartesian product is also resulting in a high volume of result rows that pass from part to part all making this whole cypher script quite heavy in terms of performance.
The best suggestion would be, if possible to break into smaller queries and then combine their results by externally calling the results and passing it into next bit through some programing language support or APIs. This would reduce the Cartesian products that get created and would eventually improve performance.
Looking at the complexity of the query and seeing how much things this query is trying to traverse, I doubt, going with any implementation of a single cypher query doing all won't be able to achieve that level of performance improvement, that separate queries combined externally could do, in my opinion.
Hope this is useful in someway or at least could guide you somewhere.
Regards.
09-10-2021 10:08 AM
Not a solution but a hint:
A while ago we decided to rebuild our big queries and replaced OPTIONAL MATCH with Pattern Comprehension which can be concat easily in WITH clauses and don't inherit the risk of blowing up the results with Cartesian products. This step made our queries faster, shorter and much easier to maintain.
Instead of using
OPTIONAL MATCH (s)-[:SALES_PERSON]->(sp:StaffUser)
OPTIONAL MATCH (s)-[:PROJECT_MANAGER]->(pman:StaffUser)
and putting this together in a WITH clause, usually not knowing if result already is exploded, we do
WITH ....,
[(s)-[:SALES_PERSON]->(sp:StaffUser) | sp ] as sp,
[(s)-[:PROJECT_MANAGER]->(pman:StaffUser) | pman ] as pman,
....
here you are sure that you get the count of lines you had before - and it also seems to be faster and is easier to manage from our point of view.
Pattern Comprehension can also be used in conditions like
WHERE [(s)-[:PROJECT_MANAGER]->(pman:StaffUser) | pman ] <> []
The result of a Pattern Comprehension always is a list of the result value defined after the | or an empty list if no results found.
09-24-2021 05:40 PM
Consider using pattern comprehensions or subqueries to take the place of your OPTIONAL MATCH + collects. Also, do a PROFILE of your query and look at how the number of rows flow. Watch for spikes where the rows go multiplicative, those are likely areas in the query where you're getting a cartesian product.
Remember: operations yield rows, and operations execute per row, so the more rows you see, the more work is being done, and if rows are multiplying out in ways that may not make sense to you, it likely means that your OPTIONAL MATCHes are generating additional rows, and subsequent operations will execute upon them. Control that by using scoped aggregations (in subqueries) or using pattern comprehensions (if doing the equivalent of OPTIONAL MATCH ... WHERE ... collect(<results>)
)
All the sessions of the conference are now available online