Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-29-2022 11:19 AM - last edited on 07-25-2022 09:03 AM by TrevorS
Hi folks,
I am attempting to get a subgraph and graph data(as '.txt 'or other formats) from a big graph
Randomly sample all nodes types from the large graph
MATCH (source: Node)-[r*..]-(target: Node)
WHERE source.name<>target.name
WITH source, target
SKIP 10
LIMIT 1+rand(10)
RETURN *
I couldn't get this to work because the estimated rows are large, and the connection times out frequently while streaming.
Get some n hop relationship between 2 kinds of nodes, then extract the path data (including the source and target nodes, relationships, and the node data such as the node degree and node type). I have tried:
MATCH (source:Node{type: 'typeA'}),(target:Node{type: 'typeB'})
WHERE source.name<>target.name
CALL apoc.algo.allSimplePaths(source, target, '', 3) YIELD path AS Paths
WITH Paths AS P
WHERE length(P)<2
SKIP 10
LIMIT 500
RETURN P, apoc.path.elements(P) as elements
for Path length 2:
MATCH (source:Node{type: 'typeA'}),(target:Node{type: 'typeB'})
WHERE source.name<>target.name
CALL apoc.algo.allSimplePaths(source, target, '', 3) YIELD path AS Paths
WITH Paths AS P
WHERE length(P)>1 AND length(P)<3
SKIP 10
LIMIT 500
RETURN P, apoc.path.elements(P) as elements
Then path length3:
MATCH (source:Node{type: 'typeA'}),(target:Node{type: 'typeB'})
WHERE source.name<>target.name
CALL apoc.algo.allSimplePaths(source, target, '', 3) YIELD path AS Paths
WITH Paths AS P
WHERE length(P)>2
SKIP 10
LIMIT 500
RETURN P, apoc.path.elements(P) as elements
This yeilds like a million rows; however, I would like to sample the subpaths such that for a three hops subgraph, I can get 3000 total rows containing:
source | source type | relationship | target | target type | PathLength |
Any help will be greatly appreciated.
06-29-2022 05:01 PM
Please explain little bit more of your data model. The 'Node' has a property 'name' besides 'type'? At each level are you expecting thousands of nodes? If so, then one source node is connected to thousands of target nodes at level 1. Here I am trying to understand your model to offer some solutions.
06-29-2022 05:28 PM
Try this and check the numbers:
07-01-2022 02:30 PM
I used your sample data and ran this query:
07-01-2022 02:34 PM
Please run the above query in your database. If there is too much data, then run for levels 1 and 2 and let me know the node counts. Based on the node counts we can try some methods to extract a subset of nodes from each level. This is not going a direct process and may involve several steps.
07-03-2022 12:02 AM
I deeply appreciate your help, maybe a few more lines here could clarify my issues:
Say I have allsimplepaths(A, B, '', 3) that look like this:
Desired result: FOREACH pathlength, randomly return 1 row
The result is representative of all pathlengths:
The first row:
The second row:
the third row:
07-05-2022 06:02 PM
This code will export the results as a json file. For selecting random rows for each level you need to export the data for each level. Select the data rows for each level and you need to combine the results from each level.
MATCH (source:Node{type: 'Molecule'}),(target:Node{type: 'Gene'})
WHERE source.name<>target.name
CALL apoc.algo.allSimplePaths(source, target, '', 3) YIELD path
with relationships(path) as rels , nodes(path) as n1, length(path) as lvl
with lvl, collect(distinct n1) as n2, collect(distinct rels) as r2
with apoc.coll.toSet(apoc.coll.flatten(n2)) AS n12, apoc.coll.toSet(apoc.coll.flatten(r2)) AS r12, lvl
with n12 as nodes, r12 as relationships, lvl
WITH lvl, [ node in nodes | node {.*, label:labels(node)[0], id:tostring(id(node))}] as nodes,
[rel in relationships | rel {.*, fromNode:{label:labels(startNode(rel))[0], id:tostring(id(startNode(rel)))},type:type(rel), toNode:{label:labels(endNode(rel))[0], id:tostring(id(endNode(rel)))}}] as rels
With lvl, collect(distinct rels) as Allrels, collect(distinct nodes) as AllNodes order by lvl
WITH {nodes:AllNodes, relationships:Allrels, level:lvl} as json
RETURN apoc.convert.toJson(json)
Result:
06-30-2022 04:22 AM
CREATE (a:Node {name: 'mola', type: 'Molecule'})
CREATE (g:Node {name: 'molg', type: 'Molecule'})
CREATE (b:Node {name: 'drgb', type: 'Drug'})
CREATE (h:Node {name: 'drgh', type: 'Drug'})
CREATE (c:Node {name: 'mola', type: 'Disease'})
CREATE (i:Node {name: 'disi', type: 'Disease'})
CREATE (j:Node {name: 'disj', type: 'Disease'})
CREATE (m:Node {name: 'dism', type: 'Disease'})
CREATE (d:Node {name: 'chemd', type: 'Chemical'})
CREATE (k:Node {name: 'chemk', type: 'Chemical'})
CREATE (e:Node {name: 'genee', type: 'Gene'})
CREATE (l:Node {name: 'genel', type: 'Gene'})
CREATE (f:Node {name: 'mola', type: 'DNA'})
MERGE (a)-[:REL {r: 'subclass_of'}]->(b)
MERGE (a)-[:REL {r: 'cure'}]->(c)
MERGE (a)-[:REL {r: 'inhibits'}]->(d)
MERGE (b)-[:REL {r: 'heals'}]->(d)
MERGE (c)-[:REL {r: 'causes'}]->(d)
MERGE (c)-[:REL {r: 'expands'}]->(e)
MERGE (d)-[:REL {r: 'kills'}]->(e)
MERGE (d)-[:REL {r: 'involved_in'}]->(f)
MERGE (b)-[:REL {r: 'heals'}]->(i)
MERGE (c)-[:REL {r: 'part_of'}]->(j)
MERGE (c)-[:REL {r: 'expands'}]->(k)
MERGE (f)-[:REL {r: 'kills'}]->(l)
MERGE (b)-[:REL {r: 'heals'}]->(i)
MERGE (c)-[:REL {r: 'part_of'}]->(j)
MERGE (c)-[:REL {r: 'expands'}]->(k)
MERGE (l)-[:REL {r: 'kills'}]->(l)
MERGE (m)-[:REL {r: 'heals'}]->(i)
MERGE (a)-[:REL {r: 'part_of'}]->(e)
MERGE (c)-[:REL {r: 'expands'}]->(m)
MERGE (e)-[:REL {r: 'interacts_with'}]->(f)
MATCH (source),(target)
WHERE source<> 'None' AND target<>'None' AND source<target
CALL apoc.algo.allSimplePaths(source, target, '', 4)
YIELD path AS P
RETURN P, length(P)
06-30-2022 08:22 PM
Thanks for sharing the info. The solution is not straight forward and am working on it. Hopefully by this weekend I can send you the first steps for your solution. The path level 2 results contain the nodes in level 1 and 2 and so on.
All the sessions of the conference are now available online