Neo4j

IngoMarquart · ‎12-23-2020

Hi all,

You can skip the setup and go straight to the problem, at the end there is my idea about the query I'd like.

Background:
I have been implementing Neo4j to store and access a large multigraph of a particular form: Nodes are relatively few (~30k), but there are many edges with have multiple parameters such as a timestamp.

In my use case, I need to query edges that fall into certain periods of time. For that reason, I have followed your advise and implemented edges as nodes in Neo4j. Typically, we have

(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)

This works great!

Additionally, I am now required to keep track on what I call "contextual ties". These are ties between edge-nodes and other normal nodes. I use these to further condition my queries, picking up only edge-nodes that also have a contextual tie to nodes that are relevant

(a:node)-[:onto]->(e:edge {weight:x, time:y})-[:onto]->(b:node)
(e:edge {weight:x, time:y})-[:conto]->(c:context {weight:x})-[:conto]->(b:node)

A typical query looks like this:

MATCH p=(a:node)-[:onto]->(r:edge)-[:onto]->(b:node {idx:x}) 
WHERE  ALL(r in nodes(p) WHERE size([(r) - [: conto]->(:context) - [: conto]->(e:node) WHERE e.idx IN [k,z] | e]) > 0 OR(r: node))  
RETURN ...

The second line was an idea from this board, it essentially means the query gives me a,r,b but only if there is a "context" tie between edge e and another set of nodes [k,z].

Again, this works great.

Problem:

For each insert operation, the contextual ties are always the same for each edge-node. The way I insert contextual ties leads to duplication. Thus, my database grows exponentially large.

Assume I have one focal node a, and two edges to b and c.
Let's say I have one contextual tie to node k.
What I'd like to have is

(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c0)-k
(e2)-(c0)-k (same path c0 to k)

What I get is

(a)-(e1)-(b)
(a)-(e2)-(c)
(e1)-(c1)-k
(e2)-(c2)-k

So there are two, instead of one, context nodes.
In reality, I'll have up to 10 context nodes. So you can imagine if I insert a large number of edges e, then I get n(e)*10 instead of 10 new context nodes.

I obviously need to rely heavily on parameters, since I am adding a lot of connections starting at some node a and adding up to 50 edges to different nodes b,c etc.

Here is my parameterized query, starting always at some ego node and adding alters:

Parameters (not in correct Cypther, sorry, you get the idea):

ego: "a" 
ties: [ {alter: "b", weight:0.5, time: 100}, {alter: "c", weight:0.7, time: 100} ....]
contexts: [{alter: "k", weight:0.3, time: 100}, {alter: "z", weight:0.4, time: 100} ... ]

Query

MATCH (a:node {idx: $ego}) 
WITH a UNWIND $ties as tie 
MATCH (b:node {idx: tie.alter}) 
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a) 
WITH r UNWIND $contexts as con 
MATCH (q:node {idx: con.alter}) WITH r,q,con 
MERGE (r)-[:conto]->(c:context {weight:con.weight, time:con.time })-[:conto]->(q)

Again, this works great, but even though I use MERGE to create contextual ties, it adds a new c:context node for each e:edge node.

I can not really come up with a way to get it working otherwise, while still relying on one collection of parameterized lists that I can pass from my application.

I really need to avoid using two queries and re-matching all edges. The first operation is only performant if it is a pure CREATE.. However, edges never need to be merged, they are always unique. Performance of the above query IS good, but it fills the database exceptionally quickly.

I can start either with edges or contexts, however WITH seems to return always the current path in the unwind. Instead, I'd need to use a WITH that returns ALL edges (or all contexts) that I have created and use those to create the other ties.

Something like

MATCH (a:node {idx: $ego}) 
WITH a UNWIND $ties as tie 
MATCH (b:node {idx: tie.alter}) 
CREATE (b)<-[:onto]-(r:edge {weight:tie.weight, time:tie.time})<-[:onto]-(a) 
SAVE ALL r
UNWIND $contexts as con 
MATCH (q:node {idx: con.alter}) WITH r,q,con 
CREATE (c:context {weight:con.weight, time:con.time })-[:conto]->(q)
SAVE ALL c

FOR EACH PRODUCT (r,c)
CREATE (r)-[:conto]->(c)

I just can't get this to work without re-querying for r and c, which would be to slow.

There probably is a simple solution to fix this? Would you have any idea?

Many thanks!

Neo4j

Ideas to remove duplications in parameterized insert query