Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-10-2022 03:50 AM
Hi, great minds! I am new to neo4j and currently exploring an existing graph to extract data for downstream tasks.
I would like to get all pairs of nodes and their relationship from the graph.
MATCH (n)-[r]-(n1) WHERE n<>n1 AND n1>n RETURN *
This will return about 12,726,288 estimated rows.
Instead, I decided to extract the pairwise information between 2 node types
MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN *
with 653,022 estimated rows; sadly, neo4j has timeout continuously. I have increased the connection timeout (ms) through the neo4j browser, yet nothing works differently.
Any suggestion will be highly appreciated.
Solved! Go to Solution.
06-13-2022 03:30 PM
Hi @wumirose
Have you tried with https://neo4j.com/docs/api/python-driver/current/api.html#graphdatabase ? This one should work as stream AFAIK.
Try something like
from neo4j import GraphDatabase
user = "youUsername"
password = "yourPassword"
uri = "yourUri"
driver = GraphDatabase.driver(uri, auth=(user,password))
with driver.session() as session:
result = session.run("MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN n as node1, n1 as node2, r as rel")
for record in result:
print("node1 {}".format(record["node1"]))
Lemme know how it goes
06-10-2022 07:09 AM
Hello @wumirose !
Why do you think this may be a timeout problem? This may be a Desktop OOM rendering problem. You may be asking for too much info to be displayed. Have you try with a driver of your preference? My personally, I have used SDN6 with Webflux without problems.
Bennu
06-12-2022 04:20 PM
I agree; probably the 653,022 rows are too much to extract.
Thanks a lot for your suggestion about SDN6 and Webflux, It's my first time learning a bit about reactive programming. However, it appears the reactive clients provide no support for Python, and I currently run my queries in Python Environment and connect to neo4j with py2neo.
Any further suggestions will be greatly appreciated.
06-13-2022 03:30 PM
Hi @wumirose
Have you tried with https://neo4j.com/docs/api/python-driver/current/api.html#graphdatabase ? This one should work as stream AFAIK.
Try something like
from neo4j import GraphDatabase
user = "youUsername"
password = "yourPassword"
uri = "yourUri"
driver = GraphDatabase.driver(uri, auth=(user,password))
with driver.session() as session:
result = session.run("MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN n as node1, n1 as node2, r as rel")
for record in result:
print("node1 {}".format(record["node1"]))
Lemme know how it goes
06-14-2022 08:06 AM
The API does the trick! I'm so happy right now.
Thank you so much @bennu_neo, for your help. It means a lot!
06-16-2022 12:18 PM
I noticed that the query gives only the first-order connection between 2 nodes; however, I will need at least the second-order relationship for my downstream application. I have tried:
MATCH (n)-[r*1..2]-(n1)
WHERE n<>n1 AND id(n)<id(n1)
WITH n.name as Name1, n1.name as Name2, r AS rel
UNWIND rel AS rl
RETURN Name1, Name2, Id, rl.id AS relId
which estimated about 2 million rows.
I would like to skip some rows so I can eventually end up with less than 1 million (~500,000). I have played with a few other queries like SKIP and LIMIT, but I can't seem to get a helpful result.
Your suggestions will be greatly appreciated.
06-17-2022 06:31 AM
Hi @wumirose !
This is technically another question, but let's do it 😄
In general, I don't agree with this whole db export stream but if it works for you. It's fine. Can you try a query like?
MATCH p = (n)-[*1..2]-(n1)
WHERE id(n)<id(n1)
WITH n.name as Name1, n1.name as Name2, relationships(p) as rel
SKIP 10
LIMIT 10
UNWIND rel AS rl
RETURN Name1, Name2, rl.id AS relId
Keep in mind that limit and skip will apply on the *WITH* step.
06-17-2022 07:45 AM
Actually, the match is between 2 node types, not the whole db😉.
MATCH p = (n:Node{type: 'typeA')-[*1..2]-(n1: Node(type: 'typeB')
This is so helpful! I haven't explored SKIP and LIMIT use before RETURN. Thanks a bunch for that.
All the sessions of the conference are now available online