Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-02-2022 09:44 AM
Hi
I would like to make an optiomal match for multiple nodes. My configuration is like follow :
If I want to do a left join (optimal match) on several nodes that will match the following sql query :
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from p_lineorder left join dates on lo_orderdate = d_datekey left join customer on lo_custkey = c_custkey left join supplier on lo_suppkey = s_suppkey left join part on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;
is it ideal to do like the 1st cypher query or like the 2nd
1st cypher query :
profile match (c:customer) where c.C_REGION starts with "AMERICA"
optional match (c)<-[:order_customer]-(l:lineorder)-[:order_part]->(p:part),(d:date)<-[:order_date]-(l)-[:order_supplier]->(s:supplier)
using join on l
where (p.P_MFGR = "MFGR#2" or p.P_MFGR = "MFGR#2")
and s.S_REGION starts with "AMERICA"
return d.D_YEAR, c.C_NATION, sum(l.LO_REVENUE) as revenu, sum(l.LO_SUPPLYCOST) as supplycost
ORDER BY d.D_YEAR, c.C_NATION;
2nd cypher query :
profile match (c:customer) where c.C_REGION starts with "AMERICA"
optional match (c)<-[:order_customer]-(l:lineorder)-[:order_part]->(p:part)
optional match (d:date)<-[:order_date]-(l)-[:order_supplier]->(s:supplier)
using join on l
where (p.P_MFGR = "MFGR#2" or p.P_MFGR = "MFGR#2")
and s.S_REGION starts with "AMERICA"
return d.D_YEAR, c.C_NATION, sum(l.LO_REVENUE) as revenu, sum(l.LO_SUPPLYCOST) as supplycost
ORDER BY d.D_YEAR, c.C_NATION;
Should I use one optimal match with a semi-colon in the middle or two optimal matches (or some other formulation of the query?!). If there were linear nodes the problem does not arise, but as there is a star structure with a central node, the query is not clear.
Thank you in advance
10-02-2022 06:30 PM
My understand is they will produce the same results, except for one scenario in which they are different. Neo4j will not traverse the same relationship twice when matching a pattern. As such, if you have two patterns that share a common relationship, then the results will be different. As an example, consider the following path:
create(a:Node{id:0})-[:REL]->(b:Node{id:1})-[:REL]->(c:Node{id:2})
The following query returns the results below. It results in one path: a--b--c
match(a:Node{id:0})--(b) , (b)--(c) return a, b, c
In contrast, the following query returns the results below. It results in two paths: a--b--c and a--b--a. The difference is there are two possibilities for the second match associated with the first match: b--c and b--a. Since the second match is independent of the first, the second match can traverse back towards 'a' from 'b'. This is not allowed when the two match patterns are on the same match.
match(a:Node{id:0})--(b:Node)
match (b)--(c)
return a, b, c
As a note, this should only be possible when specifying undirected match patterns, as these are. Providing a direction would not allow the reverse transversal. Modifying the second query with direction criteria eliminates the second match.
match(a:Node{id:0})-->(b:Node)
match (b)-->(c)
return a, b, c
I would think the two forms are equivalent in your case, since you have directed relationships. Do you get the same results with each form?
Does each line order have just one customer, one part, one date, and one supplier? If not, you will get the Cartesian product of the sequential matches.
In your query, the first match will give you all the line order nodes associated with the customer criteria. If there is only one part per lineorder, then there will be one row per lineorder since the part has only one relationship to lineorder. The second match will result in one row for each lineorder resulting from the first match, since, there is only one date and one supplier associated with the lineorder.
10-03-2022 01:07 AM
thanks glilienfield
Indeed, in my case, each lineorder is associated with one customer, one part and one date and my relationships are directed as shown in the following diagram:
In any case, your answer was very clear.
In my configuration, if the nodes were on the same path, the question does not arise but as they are not on the same path but all exit from the central lineorder node, I had to make a match between 3 and add a 4th afterwards but I wanted to find the optimal request
All the sessions of the conference are now available online