Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-23-2020 06:15 AM
Hi all,
You can skip the setup and go straight to the problem, at the end there is my idea about the query I'd like.
Background:
I have been implementing Neo4j to store and access a large multigraph of a particular form: Nodes are relatively few (~30k), but there are many edges with have multiple parameters such as a timestamp.
In my use case, I need to query edges that fall into certain periods of time. For that reason, I have followed your advise and implemented edges as nodes in Neo4j. Typically, we have
(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)
This works great!
Additionally, I am now required to keep track on what I call "contextual ties". These are ties between edge-nodes and other normal nodes. I use these to further condition my queries, picking up only edge-nodes that also have a contextual tie to nodes that are relevant
(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)
(e:edge {weight:x, time:y})-[:conto]->(c:context {weight:x})-[:conto]->(b:node)
A typical query looks like this:
MATCH p=(a:node)-[:onto]->(r:edge)-[:onto]->(b:node {idx:x})
WHERE ALL(r in nodes(p) WHERE size([(r) - [: conto]->(:context) - [: conto]->(e:node) WHERE e.idx IN [k,z] | e]) > 0 OR(r: node))
RETURN ...
The second line was an idea from this board, it essentially means the query gives me a,r,b but only if there is a "context" tie between edge e and another set of nodes [k,z].
Again, this works great.
Problem:
For each insert operation, the contextual ties are always the same for each edge-node. The way I insert contextual ties leads to duplication. Thus, my database grows exponentially large.
Assume I have one focal node a, and two edges to b and c.
Let's say I have one contextual tie to node k.
What I'd like to have is
(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c0)-k
(e2)-(c0)-k (same path c0 to k)
What I get is
(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c1)-k
(e2)-(c2)-k
So there are two, instead of one, context nodes.
In reality, I'll have up to 10 context nodes. So you can imagine if I insert a large number of edges e, then I get n(e)*10 instead of 10 new context nodes.
I obviously need to rely heavily on parameters, since I am adding a lot of connections starting at some node a and adding up to 50 edges to different nodes b,c etc.
Here is my parameterized query, starting always at some ego node and adding alters:
Parameters (not in correct Cypther, sorry, you get the idea):
ego: "a"
ties: [ {alter: "b", weight:0.5, time: 100}, {alter: "c", weight:0.7, time: 100} ....]
contexts: [{alter: "k", weight:0.3, time: 100}, {alter: "z", weight:0.4, time: 100} ... ]
Query
MATCH (a:node {idx: $ego})
WITH a UNWIND $ties as tie
MATCH (b:node {idx: tie.alter})
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a)
WITH r UNWIND $contexts as con
MATCH (q:node {idx: con.alter}) WITH r,q,con
MERGE (r)-[:conto]->(c:context {weight:con.weight, time:con.time })-[:conto]->(q)
Again, this works great, but even though I use MERGE to create contextual ties, it adds a new c:context node for each e:edge node.
I can not really come up with a way to get it working otherwise, while still relying on one collection of parameterized lists that I can pass from my application.
I really need to avoid using two queries and re-matching all edges. The first operation is only performant if it is a pure CREATE.. However, edges never need to be merged, they are always unique. Performance of the above query IS good, but it fills the database exceptionally quickly.
I can start either with edges or contexts, however WITH seems to return always the current path in the unwind. Instead, I'd need to use a WITH that returns ALL edges (or all contexts) that I have created and use those to create the other ties.
Something like
MATCH (a:node {idx: $ego})
WITH a UNWIND $ties as tie
MATCH (b:node {idx: tie.alter})
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a)
SAVE ALL r
UNWIND $contexts as con
MATCH (q:node {idx: con.alter}) WITH r,q,con
CREATE (c:context {weight:con.weight, time:con.time })-[:conto]->(q)
SAVE ALL c
FOR EACH PRODUCT (r,c)
CREATE (r)-[:conto]->(c)
I just can't get this to work without re-querying for r and c, which would be to slow.
There probably is a simple solution to fix this? Would you have any idea?
Many thanks!
All the sessions of the conference are now available online