cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Please help me tune my query

Hi everyone please help me tune this cypher query below.

MATCH (s:Survey { is_active: true })
      with s  SKIP 0 LIMIT 25
      MATCH (s)<-[:HAS_CLIENT]-(client:Client)
                         
      MATCH (s)-[:SAMPLE_TYPE]->(st:ProjectSettingElement)
      MATCH (s)-[:CATEGORY]->(pc:ProjectSettingElement)
      MATCH (s)-[:PROJECT_MODE]->(pmode:ProjectSettingElement)
    
      
      OPTIONAL MATCH (s)-[:SALES_PERSON]->(sp:StaffUser)
      OPTIONAL MATCH (s)-[:PROJECT_MANAGER]->(pman:StaffUser)
      OPTIONAL MATCH (s)-[:CREATED_BY]->(staff:StaffUser)
      OPTIONAL MATCH (s)-[:USES_CURRENCY]->(cy:ProjectSettingElement)
     
      WITH DISTINCT s, sp, staff, pman, cy, st, pc, pmode, client    

      OPTIONAL MATCH (tot:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WHERE (tot)-[:HAS_COMPLETE_STATUS]->() 
      WITH DISTINCT  s,  sp, staff, pman, cy, st, pc, pmode, client,tot
      OPTIONAL MATCH (total_reach:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH DISTINCT count(total_reach) as total, s,  sp, staff, pman, cy, st, pc, pmode, client,tot
      WITH CASE WHEN COUNT(tot) > 0 THEN ((SUM(toFloat(coalesce(tot.duration,0)))/1000)/60)/COUNT(tot)  ELSE 0 END as avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client

      //INCOMPLETE
      OPTIONAL MATCH (incpm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WHERE NOT (incpm)-[:HAS_COMPLETE_STATUS]->()
      WITH  count(incpm) as incomplete, total, s,  sp, staff, pman, cy, st, pc, pmode, client, avg_loi
      //REACHED
      OPTIONAL MATCH (pre_cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
	    WITH count(pre_cmp) as reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete
		
      //RETURNED
      OPTIONAL MATCH (ret:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]->()
      WHERE (ret)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(ret) as returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete
      
      //SPEEDER
      OPTIONAL MATCH (speeder:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_RESTRICTION_STATUS]->(:Status{name:"speeder"})
	    WITH count(speeder) as speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete
      
      OPTIONAL MATCH (cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]->(:Status{name:"complete"})
      WHERE (cmp)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(cmp) as complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      OPTIONAL MATCH (qta:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"quotafull"})
      WHERE (qta)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(qta) as quotafull, complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      OPTIONAL MATCH (trm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"terminate"})
      WHERE (trm)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(trm) as terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      OPTIONAL MATCH (qal:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_COMPLETE_STATUS]-(:Status{name:"quality"})
      WHERE (qal)-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(qal) as quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete
      

      // PRESCREENER STATS

      OPTIONAL MATCH (pre_cmp:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"complete"})
      WITH count(pre_cmp) as prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, 
      s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      OPTIONAL MATCH (pre_qta:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"quotafull"})
      WITH count(pre_qta) as prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, 
      reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      
      OPTIONAL MATCH (pre_trm:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"terminate"})
      WITH count(pre_trm) as prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, 
      reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete

      OPTIONAL MATCH (pre_qal:SurveySupUser{survey_id:s.survey_id , env:'live'})-[:HAS_PRE_COMPLETE_STATUS]->(:Status{name:"quality"})
      WITH count(pre_qal) as prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, 
      reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client, incomplete

      OPTIONAL MATCH (tot_all:SurveySupUser{survey_id:s.survey_id , env:'live'})
      WITH COUNT(tot_all) - (prescreener_quality + prescreener_complete + prescreener_terminate + prescreener_quotafull) as prescreener_incomplete, prescreener_quality, prescreener_terminate, 
      prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete




      WITH CASE WHEN toInteger(complete) > toInteger(s.total_n) THEN toInteger(complete) - toInteger(s.total_n) ELSE 0 END as oa, 
      prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client,incomplete


      WITH CASE WHEN toInteger(total) > 0 THEN (toFloat(complete) * toFloat(s.cpi)) / toFloat(total) * 100 ELSE 0 
      END as epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, pman, cy, st, pc, pmode, client, incomplete,
      
      CASE WHEN complete > 0 OR terminate > 0 THEN (toFloat(complete) / toFloat((complete + terminate))) * 100 ELSE 0 END as ir,
      CASE WHEN total > 0 THEN (toFloat(complete) / toFloat(total) ) * 100 ELSE 0 END as cr, 
      CASE WHEN  toFloat(reached)>0 THEN toFloat(toFloat(incomplete)/toFloat(reached) *100) ELSE 0 END  as br
      OPTIONAL MATCH (s)-[:HAS_STATUS]->(pe)
      
      WITH epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe

      //LANGUAGE

      OPTIONAL MATCH (s)-[:USE_LANGUAGE]->(l:Language)
      WITH {
        language_name: l.language_name,
        language_code: l.language_code
      } as language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY s.modified_on DESC

      //LINKS
      OPTIONAL MATCH (s)-[:HAS_SURVEY_LINKS]->(sl:SurveyLinks)
      WITH sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
      
      OPTIONAL MATCH (sl)-[:HAS_LINKS]->(lnk:Link)
      WITH collect({
          link: lnk.link,
          is_active: lnk.is_active,
          type: lnk.type
      }) as links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
      
      //DEVICES
      OPTIONAL MATCH (s)-[:HAS_DEVICES]->(dv:Devices)
      WITH dv, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe
      
      //DEVICE OBJECTS
      OPTIONAL MATCH (dv)-[:HAS_DEVICE]->(d:DeviceObject)
      WITH d, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY d.device_name ASC
      WITH collect({
          device_name: d.device_name,
          device_code: d.device_code,
          is_active: d.is_active
      }) as devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe

      //RESTRICTIONS
      OPTIONAL MATCH (s)-[:HAS_RESTRICTIONS]->(rs:Restrictions)
      WITH rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe


      //RESTRICTION OBJECTS
      OPTIONAL MATCH (rs)-[:HAS_RESTRICTION]->(ro:RestrictionObject)
      WITH collect({
        restriction_name: ro.restriction_name,
        restriction_code: ro.restriction_code,
        is_active: ro.is_active
      }) as restrictions, rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe


      OPTIONAL MATCH(s)-[:HAS_PROJECT_TYPE]->(pt:ProjectSettingElement)
      WITH pt, restrictions, rs, devices, links, sl, language, epc, oa, prescreener_incomplete, prescreener_quality, prescreener_terminate, prescreener_quotafull, prescreener_complete, quality, terminate, quotafull, 
      complete, speeder, returned, reached, avg_loi, total, s,  sp, staff, ir ,pman, cy, st, pc, cr, br, pmode, client,incomplete, pe ORDER BY s.survey_id DESC 


      RETURN collect({

          client_survey: {
            quality: quality,
            terminate: terminate,
            quotafull: quotafull,
            complete: complete,
            incomplete: incomplete,
            total: total,
            epc : toString(epc),
            oa : oa,
            cr : toString(cr) ,
            br : toString(br) ,
            reached : reached,
            returned: returned,
            speeder :  speeder,
            ir : toString(ir) ,
            avg_loi : toString(avg_loi),
            prescreener_complete : prescreener_complete,
            prescreener_quotafull: prescreener_quotafull,
            prescreener_terminate: prescreener_terminate,
            prescreener_incomplete: prescreener_incomplete,
            prescreener_quality :  prescreener_quality

          },

          soft_launch : {
            value: s.soft_launch_value,
            pc: s.soft_launch_pc,
            status: s.soft_launch_status
          },

          client_reached_traffic_limit: {
            value: s.client_reached_traffic_limit_value,
            status: s.client_reached_traffic_limit_status
          },

          epc_traffic_limit : {
            value : s.epc_traffic_limit_value,
            pc : s.epc_traffic_limit_pc,
            status : s.epc_traffic_limit_status
          },

          links: links,
          language: language,
          allowed_devices: devices,
          restrictions: restrictions,

          survey_id: s.survey_id,
          survey_name: s.survey_name,
          status_id : s.status_id,
          status_name : s.status_name,
          additional_cost : s.additional_cost,
          po_number : s.po_number,
          status_color:pe.badge_color,
          survey_url_is_multiple: s.survey_url_is_multiple,
          tags: s.tags,
          total_n: s.total_n,
          cpi: s.cpi,
          ir: s.ir,
          loi: s.loi,
          field_days: s.field_days,
          survey_cost: s.survey_cost,
          start_date: s.start_date,
          end_date : s.end_date,

          sales_person: sp.name,
          sales_person_contact: sp.contact,
          sales_person_uid: sp.u_id,

          created_by: staff.u_id,
          staff_contact: staff.contact,
          staff_designation: staff.designation,
          staff_uid: staff.u_id,

          project_manager: pman.name,
          project_manager_contact: pman.contact,
          project_manager_designation: pman.designation,
          project_manager_uid: pman.u_id,

          currency_name: cy.name,
          currency_badge_color: cy.badge_color,
          currency_id: cy.p_id,

          sample_name: st.name,
          sample_badge_color: st.badge_color,
          sample_id: st.p_id,

          category_name: pc.name,
          category_badge_color: pc.badge_color,
          category_id: pc.p_id,

          project_mode : pmode.p_id,

          project_type_name : pt.name,
          project_type_id: pt.p_id,

          client_id:client.client_id,
          company_name: client.company_name,
          address: client.address,

          billing_person: client.billing_person,
          billing_email: client.billing_email,
          main_contact_name: client.main_contact_name,
          main_contact_mobile: client.main_contact_mobile,
          main_contact_email: client.main_contact_email }) as survey_list

As a note: SurveySupUser(survey_id) is index and this particular node is having like 50k nodes now.
And the db hits for this query is
Cypher version: CYPHER 4.2, planner: COST, runtime: PIPELINED. 5139002 total db hits in 2630 ms.

Any suggestions will be helpfull

3 REPLIES 3

Hi,

There are a lot of match and optional match statements in this query and then the bits are carried forward from part to part using 'WITH'. At every 'WITH', I see the possibility of a large amount of cartesian products being possible which when pass to the next part increase them exponentially.
All this is the reason that is leading to such a high db hit and the cartesian product is also resulting in a high volume of result rows that pass from part to part all making this whole cypher script quite heavy in terms of performance.

The best suggestion would be, if possible to break into smaller queries and then combine their results by externally calling the results and passing it into next bit through some programing language support or APIs. This would reduce the Cartesian products that get created and would eventually improve performance.

Looking at the complexity of the query and seeing how much things this query is trying to traverse, I doubt, going with any implementation of a single cypher query doing all won't be able to achieve that level of performance improvement, that separate queries combined externally could do, in my opinion.

Hope this is useful in someway or at least could guide you somewhere.

Regards.

Reiner
Graph Buddy

Not a solution but a hint:
A while ago we decided to rebuild our big queries and replaced OPTIONAL MATCH with Pattern Comprehension which can be concat easily in WITH clauses and don't inherit the risk of blowing up the results with Cartesian products. This step made our queries faster, shorter and much easier to maintain.

Instead of using

OPTIONAL MATCH (s)-[:SALES_PERSON]->(sp:StaffUser)
OPTIONAL MATCH (s)-[:PROJECT_MANAGER]->(pman:StaffUser)

and putting this together in a WITH clause, usually not knowing if result already is exploded, we do

WITH ...., 
    [(s)-[:SALES_PERSON]->(sp:StaffUser) | sp ] as sp,
    [(s)-[:PROJECT_MANAGER]->(pman:StaffUser) | pman ] as pman,
....

here you are sure that you get the count of lines you had before - and it also seems to be faster and is easier to manage from our point of view.

Pattern Comprehension can also be used in conditions like

WHERE  [(s)-[:PROJECT_MANAGER]->(pman:StaffUser) | pman ] <> []

The result of a Pattern Comprehension always is a list of the result value defined after the | or an empty list if no results found.

Consider using pattern comprehensions or subqueries to take the place of your OPTIONAL MATCH + collects. Also, do a PROFILE of your query and look at how the number of rows flow. Watch for spikes where the rows go multiplicative, those are likely areas in the query where you're getting a cartesian product.

Remember: operations yield rows, and operations execute per row, so the more rows you see, the more work is being done, and if rows are multiplying out in ways that may not make sense to you, it likely means that your OPTIONAL MATCHes are generating additional rows, and subsequent operations will execute upon them. Control that by using scoped aggregations (in subqueries) or using pattern comprehensions (if doing the equivalent of OPTIONAL MATCH ... WHERE ... collect(<results>))