cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Visualize entire graph but with one condition filter

I can visualize my entire graph in two simple ways:
MATCH (n) RETURN n
or
MATCH p=()-->() RETURN p

Of my 500 or so nodes, and about twice that in relationships, I see that there is one of my dozen or so node labels that is dominating the graph, and for good reason. There are about 400 nodes of that node label. To simplify things I want to remove some of those nodes. In particular, those node labels which have a degree of 1.

I do not want to create any new properties and would like to filter off those nodes with straight-up cypher, or even some APOC. I can find those offending nodes using something like

(u)-[:rel]->(s:Xxx)<-[:rel]-(t)

or using APOC
WHERE apoc.node.degree(s) =1

The problem is that I just can't seem to find the combination of cypher to piece it all together. I can try:

MATCH p=()-->(), (s:Xxx)
WHERE apoc.node.degree(s) > 1 AND s IN nodes(p)
RETURN p

but that only gives me the s nodes and not the rest of the graph. This result does come back in a matter of seconds.
Switching the query around,

MATCH p=()-->(), (s:Xxx)
WHERE NOT apoc.node.degree(s) = 1 AND NOT s IN nodes(p)
RETURN p

shows that it may be the right solution, but the query runs way too long and eventually times out.

Trying it a different way:

MATCH (s:Xxx)
WHERE apoc.node.degree(s) = 1
WITH s as single_XXX
MATCH p=()-->()
WHERE NOT single_XXX IN nodes(p)
RETURN p

also timesout

I am running neo4j 4.0 community edition with pagecache at 6g and heap sizes at 2g. Bloom is out of the question

It seems simple, yet I cannot get it to work and any help would be appreciated.

1 ACCEPTED SOLUTION

What happens when you PROFILE the query?

I'm running the below on the Movies DB (~200 nodes, but running with way less memory than you are talking about here so your 500 node example shouldn't be the end of the world) and this is about as sensible as I can get the profile results to be.

PROFILE
MATCH (n:Person)
WHERE apoc.node.degree(n) <= 1
WITH COLLECT(id(n)) AS omit_nodes
MATCH (a)-[r]-(b)
WHERE NOT (
    id(a) IN omit_nodes OR id(b) IN omit_nodes
)
RETURN a, r, b

View solution in original post

2 REPLIES 2

What happens when you PROFILE the query?

I'm running the below on the Movies DB (~200 nodes, but running with way less memory than you are talking about here so your 500 node example shouldn't be the end of the world) and this is about as sensible as I can get the profile results to be.

PROFILE
MATCH (n:Person)
WHERE apoc.node.degree(n) <= 1
WITH COLLECT(id(n)) AS omit_nodes
MATCH (a)-[r]-(b)
WHERE NOT (
    id(a) IN omit_nodes OR id(b) IN omit_nodes
)
RETURN a, r, b

Thanks for the help, @youcef.kadri . Your solution executed in seconds and gave the correct result. I just need to practice some more Cypher Thinking.