cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Filter by relationship property and then connect unconnected nodes/subgraphs

Hi all!

I have a problem in which I have to filter a relationship r by a property, let's say r.weight > 7. It will mostly result in a graph with unconnected subgraphs. So, I would like to also connect these subgraphs in the match by some other criteria, maybe the largest weight possible to maintain the graph connected:

In this figure the edges were filtered, resulting in two subgraphs. They are connected back by a relation whose weight is 5, because it's the largest weight connecting them.

My main difficulty is to realize how to accomplish this effect once I've already disregarded the relationships that didn't satisfy the first matching clause. I don't even know if it will be possible.

Thanks In advance.

1 ACCEPTED SOLUTION

Hi @guinametal !

About the first question. Blame apoc.export! I created the dump DB with standard qeries but I was too lazy to save it inmediately (or search on logs) and I lost it after couple of other query executions.

Any way, I just did it now so it's someting like:

CREATE(a:NODE {name : 'a'})
CREATE(b:NODE {name : 'b'})
CREATE(c:NODE {name : 'c'})
CREATE(d:NODE {name : 'd'})
CREATE(e:NODE {name : 'e'})
CREATE(f:NODE {name : 'f'})
CREATE(g:NODE {name : 'g'})
CREATE(h:NODE {name : 'h'})
CREATE(i:NODE {name : 'i'})
WITH *
CREATE(a)-[:CON{w:9}]->(b)
CREATE(a)-[:CON{w:7}]->(c)
CREATE(a)-[:CON{w:11}]->(d)
CREATE(b)-[:CON{w:12}]->(c)
CREATE(b)-[:CON{w:7}]->(d)
CREATE(c)-[:CON{w:8}]->(d)
CREATE(d)-[:CON{w:4}]->(e)
CREATE(d)-[:CON{w:5}]->(f)
CREATE(e)-[:CON{w:7}]->(f)
CREATE(e)-[:CON{w:9}]->(g)
CREATE(e)-[:CON{w:7}]->(h)
CREATE(e)-[:CON{w:8}]->(i)
CREATE(f)-[:CON{w:9}]->(g)
CREATE(f)-[:CON{w:10}]->(h)
CREATE(f)-[:CON{w:8}]->(i)
CREATE(g)-[:CON{w:10}]->(h)
CREATE(g)-[:CON{w:7}]->(i)
CREATE(h)-[:CON{w:11}]->(i)
CREATE(b)-[:CON{w:0}]->(i);

About second question and sticking to the second (better) solution, it's already iterating ovr every combination of clusters. You should know that custers are not labeled in order, actually the clusterId jumps according to the number of element in that cluster. In our case, there are just 2 clusters, 0 and 4. It may be interesting testing on a multicluster configuration tho.

So this second part will give you all the relations that jump from one cluster to another. You may just need to do and UNION with all the relations that are ok with you initial condition, in this case:

MATCH(n:NODE)-[r:CON]-(n2:NODE)
where gds.util.nodeProperty('gr_filtered', id(n), 'componentId') < gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') 
with gds.util.nodeProperty('gr_filtered', id(n), 'componentId') as c1, gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') as c2,collect(r) as rels
UNWIND rels as rel
with c1, c2, apoc.agg.maxItems(rel, rel.w) as r
return r
UNION
MATCH(:NODE)-[r:CON]->(:NODE)
WHERE r.w > 7
return r

H

PS: Looks like you already got the apoc.agg.maxItems part

View solution in original post

3 REPLIES 3

Bennu
Graph Fellow

Hi @guinametal !

Guess you are having a layer that orchestrate your DB calls. I haven't found a single query to do this but there's a pretty standard logic you may follow, hoping is performant enough.

Consider your use case as (with extra relations):

:begin
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
CALL db.awaitIndexes(300);
:begin
UNWIND [{_id:29, properties:{name:"a"}}, {_id:30, properties:{name:"b"}}, {_id:31, properties:{name:"c"}}, {_id:32, properties:{name:"d"}}, {_id:33, properties:{name:"e"}}, {_id:34, properties:{name:"f"}}, {_id:35, properties:{name:"g"}}, {_id:36, properties:{name:"h"}}, {_id:37, properties:{name:"i"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:NODE;
:commit
:begin
UNWIND [{start: {_id:29}, end: {_id:30}, properties:{w:9}}, {start: {_id:29}, end: {_id:31}, properties:{w:7}}, {start: {_id:29}, end: {_id:32}, properties:{w:11}}, {start: {_id:30}, end: {_id:31}, properties:{w:12}}, {start: {_id:30}, end: {_id:32}, properties:{w:7}}, {start: {_id:31}, end: {_id:32}, properties:{w:8}}, {start: {_id:32}, end: {_id:33}, properties:{w:4}}, {start: {_id:32}, end: {_id:34}, properties:{w:5}}, {start: {_id:33}, end: {_id:34}, properties:{w:7}}, {start: {_id:33}, end: {_id:35}, properties:{w:9}}, {start: {_id:33}, end: {_id:36}, properties:{w:7}}, {start: {_id:33}, end: {_id:37}, properties:{w:8}}, {start: {_id:34}, end: {_id:35}, properties:{w:9}}, {start: {_id:34}, end: {_id:36}, properties:{w:10}}, {start: {_id:34}, end: {_id:37}, properties:{w:8}}, {start: {_id:35}, end: {_id:36}, properties:{w:10}}, {start: {_id:35}, end: {_id:37}, properties:{w:7}}, {start: {_id:36}, end: {_id:37}, properties:{w:11}}, {start: {_id:30}, end: {_id:37}, properties:{w:0}}] AS row
MATCH (start:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.start._id})
MATCH (end:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row.end._id})
CREATE (start)-[r:CON]->(end) SET r += row.properties;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit

First we will charge or graph on memory.

CALL gds.graph.create.cypher(
    'gr_complete',
    'MATCH (n:NODE) RETURN id(n) AS id, labels(n) AS labels',
    'MATCH (n:NODE)-[r:CON]->(m:NODE)  RETURN id(n) AS source, id(m) AS target, r.w as w'
)

Then we will filter on condition:

CALL gds.beta.graph.create.subgraph('gr_filtered', 'gr_complete', '*', 'r.w > 7')
YIELD graphName, fromGraphName, nodeCount, relationshipCount;

Now we can have the isolated components

CALL gds.wcc.stream('gr_filtered')
YIELD nodeId, componentId
with componentId, collect(nodeId) as g, collect(gds.util.asNode(nodeId)) as nodes
RETURN componentId, g

Now we have the tricky part. Now we have 2 cluster but you may have more. Now you need to execute next step for every pair of clusters.

with [33, 34, 35, 36, 37] as c1, [29, 30, 31, 32] as c2
MATCH(n1:NODE)-[r:CON]-(n2:NODE)
where id(n1) in c1 and id(n2) in c2
return r order by r.w DESC limit 1

And this relation is your bridge between the 2 clusters. Now you just have you added it to the others

H

EDIT:

I was thinking about other ways to do so.

You can calculate clusters with wcc mutating the in memory graph as:

CALL gds.wcc.mutate('gr_filtered', { mutateProperty: 'componentId' })
YIELD nodePropertiesWritten, componentCount;

And then:

MATCH(n:NODE)-[r:CON]-(n2:NODE)
where gds.util.nodeProperty('gr_filtered', id(n), 'componentId') < gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') 
with gds.util.nodeProperty('gr_filtered', id(n), 'componentId') as c1, gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') as c2,collect(r) as rels
UNWIND rels as rel
with c1, c2, apoc.agg.maxItems(rel, rel.w) as r
return c1, c2, r

With r as the bridges.

Hello @Bennu !!
Thanks a lot. There is so much going on in your answer! I'm still figuring out all that new stuff!

Could you please clarify a couple of aspects of your answer?

The first one:
I didn't really get is your fashion of creating the nodes and relationships. Could you please explain what's is done there? The result is a graph just like the drawing in my post, but I didn't understand why you used those statements.

The second:
After running wcc and finding the clusters, is it possible get all cluster combinations ([0,1], [0,2],[0,3],[1,2][1,3] and so on) and loop through them to get the bridging relations in cypher? Do you have a clue on how to do that?

You are helping a lot!

Hi @guinametal !

About the first question. Blame apoc.export! I created the dump DB with standard qeries but I was too lazy to save it inmediately (or search on logs) and I lost it after couple of other query executions.

Any way, I just did it now so it's someting like:

CREATE(a:NODE {name : 'a'})
CREATE(b:NODE {name : 'b'})
CREATE(c:NODE {name : 'c'})
CREATE(d:NODE {name : 'd'})
CREATE(e:NODE {name : 'e'})
CREATE(f:NODE {name : 'f'})
CREATE(g:NODE {name : 'g'})
CREATE(h:NODE {name : 'h'})
CREATE(i:NODE {name : 'i'})
WITH *
CREATE(a)-[:CON{w:9}]->(b)
CREATE(a)-[:CON{w:7}]->(c)
CREATE(a)-[:CON{w:11}]->(d)
CREATE(b)-[:CON{w:12}]->(c)
CREATE(b)-[:CON{w:7}]->(d)
CREATE(c)-[:CON{w:8}]->(d)
CREATE(d)-[:CON{w:4}]->(e)
CREATE(d)-[:CON{w:5}]->(f)
CREATE(e)-[:CON{w:7}]->(f)
CREATE(e)-[:CON{w:9}]->(g)
CREATE(e)-[:CON{w:7}]->(h)
CREATE(e)-[:CON{w:8}]->(i)
CREATE(f)-[:CON{w:9}]->(g)
CREATE(f)-[:CON{w:10}]->(h)
CREATE(f)-[:CON{w:8}]->(i)
CREATE(g)-[:CON{w:10}]->(h)
CREATE(g)-[:CON{w:7}]->(i)
CREATE(h)-[:CON{w:11}]->(i)
CREATE(b)-[:CON{w:0}]->(i);

About second question and sticking to the second (better) solution, it's already iterating ovr every combination of clusters. You should know that custers are not labeled in order, actually the clusterId jumps according to the number of element in that cluster. In our case, there are just 2 clusters, 0 and 4. It may be interesting testing on a multicluster configuration tho.

So this second part will give you all the relations that jump from one cluster to another. You may just need to do and UNION with all the relations that are ok with you initial condition, in this case:

MATCH(n:NODE)-[r:CON]-(n2:NODE)
where gds.util.nodeProperty('gr_filtered', id(n), 'componentId') < gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') 
with gds.util.nodeProperty('gr_filtered', id(n), 'componentId') as c1, gds.util.nodeProperty('gr_filtered', id(n2), 'componentId') as c2,collect(r) as rels
UNWIND rels as rel
with c1, c2, apoc.agg.maxItems(rel, rel.w) as r
return r
UNION
MATCH(:NODE)-[r:CON]->(:NODE)
WHERE r.w > 7
return r

H

PS: Looks like you already got the apoc.agg.maxItems part