cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Selecting a sub graph of n nodes

I have a neo4j graph of 600000 nodes which are connected to each other in the form (a:item)-[r]-(b:item). How do I get a random sample of graph network with 10000 nodes with relations between them?

(bascially what I require is a random graph consisting of relationships between 10k nodes(item))

4 REPLIES 4

@cortex3oct

Maybe you can try something like this:

MATCH (n:item)-[]->(:item) with id(n) as maxId order by id(n) desc limit 1

WITH maxId // retrieve maxId on node with outgoing relationships

UNWIND range(1,10000*3) as x // change this number (3) if needed

MATCH p=(n:item)-[r]->(m:item)
WHERE id(n) = toInteger(rand()*maxId) // get node with an id from 0 to maxId
RETURN distinct p
LIMIT 10000

That is, I cycle 10000 times multiplied by 3 and I search for an id between 0 and maxId.
I multiplied by 3 because I'm not sure if the current random id match a path p=(n:item)-[r]->(m:item) or something else. Because of this, I put limit 10000 to make sure I find no more than 10000.

You could change this 3 based on your dataset.
Of course, if this number is relatively too small, less than 10,000 nodes could be extracted.

@giuseppe.villani
Thanks for the answer, but there is one problem with this solution. What I exactly wanted is to get all relations between 10000 items (each item has multiple relations to other items). So here we might miss many. Let's say we cycle n times, I found 10k (n:Items), but due to relationships let's say I will get 30k(m:Items), then due to LIMIT 10k, I am loosing information.

So the solution that I wanted was to get a sample/sub(random) network of 10k items, which have relations between them. So something like getting a list of 10k random item ids and then checking that (n:Item) and (m:Item) are in that list of ids will work I guess. But i'm not sure how to do it.

@cortex3oct
Ok, i get it, I thought you wanted to limit the paths, not the nodes.
So, I would change the query like this (that is, I limit number of nodes with at least 1 rel with another :item before matching all paths):

MATCH (n:item)-[]-(:item) with id(n) as maxId order by id(n) desc limit 1

WITH maxId // retrieve maxId on node with outgoing relationships

UNWIND range(1,10000*5) as x // change this number (5) if needed

MATCH (n:item)-[]-(:item)
where id(n) = toInteger(rand()*maxId)
with distinct n
LIMIT 10000 // limit nodes
match p=(n)-[r]-(:item)
return p

Bennu
Graph Fellow

Hi @cortex3oct!

Actually, the problem is more trickier that it seems. Just in order to confirm my understanding. You expect that every relation of a node inside the subgraph lies inside the subgraph, aren't you?

In other word, there's no relation between a node inside the subgraph and one outside?

Bennu