Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-16-2020 01:14 PM
Hey
py2neo for the moment, But trying to build most of the stuff via queries. In any case, I'm struggling with this code... How can I properly and efficiently create 100k+ relationships?
# My getter to return list of nodes
def getNodesByType(type, limit=-1):
if limit > 0:
a = graph.run('''
MATCH (n:{type})
RETURN n
LIMIT {limit}'''.format(type=type, limit=limit))
return a
return graph.run('''
MATCH (n:{type}}
RETURN n'''.format(type=type))
def makeConnection(source, key, dest):
tx = graph.begin()
tx.run('''
UNWIND $arrSource AS mSrc
CALL apoc.create.relationship(mSrc,{k}, {d})
YIELD node
RETURN node
'''
, arrSource=source, k=key, d=dest)
tx.commit()
nodeA = getNodesByType("Pets", 10).to_ndarray()
nodeB= getNodesByType("Objects", 3).to_ndarray()
makeConnection(nodeA,"livesIn",nodeB[0])
makeConnection(nodeA[0:5000],"livesNextTo",nodeB[1])
makeConnection(nodeA[50000:99999],"livedIn",nodeB[2])
I keep on getting
TypeError: Neo4j does not support JSON parameters of type ndarray
I'm lost, I mean I'm passing Node object to it, so maybe thats wrong? How should I execute this task ?
Regards
Dariusz
Edit now that I look at it, I'm not sure if I can use the apoc as I need CREATE UNIQUE
as I want to only create a relationship if it's missing.
05-16-2020 01:32 PM
Hey
I'm starting at it & trying to wrap my head around, thank so much for fast help!
I take your Node has "iid" as "key" ? So Node(iid:"someid") ?
so the iid:row.id1/id2 standas for dics entries inside row mhmhmhmhmhmmm ok I see ! So I should not be passing directly Node form py2neo but rather grab node["myId"] instead and pass that. kkk gacia!.
I think I got it, will test it. As to other thing, how can I make sure that the connection is Unique ? I mean, I want to create relationship only if its missing here.
I don't see apoc.createRelationship.unique()
Regards
Dariusz
05-16-2020 01:36 PM
No problem, to be honest I never used py2neo
, I prefer to use Neo4j Python driver
(https://neo4j.com/docs/api/python-driver/current/index.html)
I don't know if I'm the good guy to help you with the py2neo
thing but I can give you some advices:)
You can check if the relationship exists before to create it thanks to EXISTS
(https://neo4j.com/docs/cypher-manual/current/functions/predicate/#functions-exists)
05-16-2020 02:39 PM
Hey
I'm not dead set on py2neo, it just was a lib that was proposed on neo youtube channel > https://www.youtube.com/watch?v=3JMhX1sT98U
So I can move to neo4j too.
As to the issue...
I've written my command as follow:
cmd = '''
MATCH (dest:{destTag}{{icId:"{destId}"}})
UNWIND $arrSource AS row
MATCH (source:{srcType}{{icId:row.id}})
CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
YIELD rel
RETURN rel
'''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
print(cmd)
c = tx.run(cmd, arrSource=data)
As far as I understand UNWIND, its a "loop", so I decided to match dest outside it(I guess?) as dest is constant, and perform the func above. It does work, which is great but it does not handle existing connections as it should.
I started reading the docs but I'm lost, the 2 examples they show are not enough and other stuff I read is just uhhh.... where is "if else...".... :- (((( :- )
"big brain time" :- ) Lets see if I can crack it ^^ brb.
05-16-2020 02:53 PM
For the if-else, in Cypher it's called CASE https://neo4j.com/docs/cypher-manual/current/syntax/expressions/
And if you want to check if the relation exists, you can do something in this style after the second MATCH
:
WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
05-16-2020 05:23 PM
Hello
Nope, I can't get through that if statement system. It is making my head hurt :- )
This is as far as I got >
MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
CALL apoc.when([NOT EXISTS((gt)-[:LINKED_TO]->(ct)),
'RETURN ct'],
'RETURN gt')
YIELD VALUE
RETURN VALUE
I was testing if I can just "test" and return either object depending on test, I think I need apoc.do.when for creating relationship, but for now its just read.
Anyway all I get is
Regards
Dariusz
05-17-2020 03:28 AM
cmd = '''
MATCH (dest:{destTag}{{icId:"{destId}"}})
UNWIND $arrSource AS row
MATCH (source:{srcType}{{icId:row.id}})
WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
YIELD rel
RETURN rel
'''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
c = tx.run(cmd, arrSource=data)
05-17-2020 03:37 AM
That make no sense at all :- )
So you do WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
And next statement is being executed if True, what statement happen if it exist? How would the else statement look like?
I'm going over as many if/else infos as I can but as far as I can tell its this language is "precious"...
05-17-2020 03:38 AM
You don't need ifelse, you just want to create the relation when it does not exist and it's what is doing the cypher request.
05-17-2020 03:37 AM
But if you will always get the same relation type, for example LINKED_TO
, you can use MERGE
or CREATE UNIQUE
to create the relation:)
05-17-2020 03:38 AM
Yeh I was going to go with unique once I understand the if/else/case/where/when syntaxes...
05-17-2020 03:40 AM
On this doc, the two examples are pretty clear I think:) There are 2 syntaxes for CASE:
https://neo4j.com/docs/cypher-refcard/current/
05-17-2020 03:59 AM
Nope not getting the CASE example to work at all.
MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
CASE
WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct)) THEN ct
WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) THEN gt
ELSE THEN ct, gt
END
just gives me
Invalid input 'S': expected 'l/L' (line 3, column 3 (offset: 101))
"CASE"
^
😕
05-17-2020 04:06 AM
You don't need CASE, this request is good:
cmd = '''
MATCH (dest:{destTag}{{icId:"{destId}"}})
UNWIND $arrSource AS row
MATCH (source:{srcType}{{icId:row.id}})
WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
YIELD rel
RETURN rel
'''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
c = tx.run(cmd, arrSource=data)
05-17-2020 04:07 AM
Yes I know, I'm just trying to understand how to use CASE at this point. To build more complex logic in future. :- )
05-17-2020 04:11 AM
MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
RETURN
CASE
WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct))
THEN ct
ELSE ct, gt
END,
CASE WHEN EXISTS ((gt)-[:LINKED_TO]->(ct))
THEN gt
ELSE ct, gt
END
05-17-2020 04:21 AM
So I can wrap this around in my head...
MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"}) << variable allocator, allocate ct
MATCH (gt:Objects{name:"geo_98608"}) < variable allocator, allocate gt
RETURN { < return objects from function below
CASE < if statement begin
WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct)) < case 0
THEN ct < returns ct from this function meaning the "RETURN ct" above returns ct then
ELSE ct, gt, < direct return to above, "RETURN ct, gt"
, < why comma? what does it do ?
"CASE WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) could be written as >"
CASE < case 1
WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) - check statement, case 1?
THEN gt same return idea as case 0
ELSE ct, gt same return idea as case 0
END - end case? Which case ? we made 2 cases don't we need 2x END ?
This is how I see it, but I don't get it. We made 2 cases, with 2 if else statements. Are they additive? If both cases are true, and case 0 return ct and case 1 return gt, does the return is then ct, gt?
As to the code above, I get this >
Invalid input ',': expected an identifier character, whitespace, '{', node labels, a property map, a relationship pattern, '.', '(', '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR or END (line 7, column 8 (offset: 164))
"ELSE ct, gt"
Apology for "silly" questions, but I'm just lost with that language so hard... : / I come from c++/Python but this is just black magic :- )
05-17-2020 04:58 AM
You can see this like a switch case:)
You should try to understand the example on the doc first before to apply it to your code:)
CASE n.eyes
WHEN 'blue' THEN 1
WHEN 'brown' THEN 2
ELSE 3
END
CASE
WHEN n.eyes = 'blue' THEN 1
WHEN n.age < 40 THEN 2
ELSE 3
END
05-17-2020 12:46 PM
Hello
Apology for slow reply. But as I'm new user apparently I'm limited to how much I can post :- )
Right so I had a "fun" day of if else tests. In any case I ended up running some tests but I hit a bit of a snag...
While I tried to connect 100k nodes 1 in the loop above, even without the WHERE NOT EXISTS
test It was unable to perform the task ...
I'm not quite sure how to bite it at this point. Could neo not be the tool for me ? I'm looking to have millions of nodes with relationships to each other that could be updated/changed at a 50-100k per user request.
Regards
Dariusz
05-17-2020 01:00 PM
Hello,
I think Neo4j is the tool you need but it requires time to learn how to use it correctly:)
You can ask your database to give you the nodes that have no relationships and after create a relation between them like this you don't need to check at the creation, that's anoter solution:)
05-17-2020 01:21 PM
HMmmm no idea, maybe its the py2neo ? All I did was send 100k {id:"vdsvds"}dict with message to create relation to 1 node. No idea why he would not execute that command. Its the "simplest" thing I can think off to test after creating the 100k nodes in 1st place.
How can I debug this? To see why it did not make the relationships?
05-17-2020 01:25 PM
Someting I do when I work on a new Cypher request is that I first test it in Neo4j Desktop and when it works correctly, I adapt it to my Python program:)
05-19-2020 09:12 AM
Hmmm its strange, when I make 20 nodes relationships it work, 500, it works, 100k it does not
05-19-2020 09:14 AM
You tried 100k in Neo4j Desktop or in Python?
Do there is an error message?
05-20-2020 04:49 AM
In python, using the same code that generated the 100k nodes and was able to connect 20 nodes, but at 100k it breaks. No error code. just hangs for ever.
05-20-2020 04:50 AM
Could we see the code?
05-20-2020 04:55 AM
I've cleaned it up, will try running it again today, but for now >
import uuid
from json import dumps, loads
from py2neo import Node, Relationship, Graph, NodeMatcher
import time
graph = Graph("http://localhost:7474", password="graph")
globalStart = time.time()
print(graph)
def make3Manual():
print("Deleting old graph")
graph.delete_all()
print("Deleted old graph")
nData = {}
nData["type"] = ""
nData["name"] = "name"
nData["reference"] = 1
nData["icId"] = ""
nodeSize = 100000 * [None]
matSize = 100 * [None]
containerSize = 100 * [None]
sceneSize = 200 * [None]
for id in range(0, len(nodeSize)):
n = nData.copy()
n["type"] = "Obj"
n["name"] = "geo_" + str(id)
n["refType"] = 1
n["icId"] = str(uuid.uuid4())
nodeSize[id] = n
for id in range(0, len(matSize)):
n = nData.copy()
n["type"] = "animal"
n["name"] = "ani_" + str(id)
n["refType"] = 1
n["icId"] = str(uuid.uuid4())
matSize[id] = n
for id in range(0, len(containerSize)):
n = nData.copy()
n["type"] = "Cont"
n["name"] = "Cont_" + str(id)
n["refType"] = 1
n["icId"] = str(uuid.uuid4())
containerSize[id] = n
for id in range(0, len(sceneSize)):
n = nData.copy()
n["type"] = "Scene"
n["name"] = "someani_" + str(id)
n["refType"] = 1
n["icId"] = str(uuid.uuid4())
sceneSize[id] = n
allData = nodeSize + matSize + containerSize + sceneSize
print("Allocated arrays")
tx = graph.begin()
start = time.time()
tx.run('''
UNWIND $mapEntry AS mItem
CALL apoc.create.node([mItem["type"]], {name:mItem["name"], art:mItem["atr"],icId:mItem["icId"], refType:mItem["refType"]})
YIELD node
RETURN node
'''
, mapEntry=allData)
tx.commit()
print("Processed in : ", time.time() - start)
def getNodes(nodeClass, lim=-1):
nMatcher = NodeMatcher(graph)
if (lim):
a = nMatcher.match(nodeClass).limit(lim)
else:
a = nMatcher.match(nodeClass)
print("Found nodes : ", len(a), " ", nodeClass)
return list(a)
def buildRelation(source, key, dest):
rela = len(source) * [None]
x = 0
for src in source:
rela[x] = Relationship(src, key, dest)
x = x + 1
tx = graph.begin()
for r in rela:
tx.create(r)
tx.commit()
def makeConnectionQuery(source, key, dest, sourceType, destType):
for s in source:
r = graph.run('''
MATCH a={s}, b={d}
CREATE UNIQUE (a)-[r-{k}]-()b
RETURN a,b
'''.format(s=s, k=key, d=dest))
print("new con, key : ", key, " Reply : ", r)
def getNodesByType(type, limit=-1):
if limit > 0:
a = graph.run('''
MATCH (n:{type})
RETURN n
LIMIT {limit}'''.format(type=type, limit=limit))
return a
return graph.run('''
MATCH (n:{type})
RETURN n'''.format(type=type))
def makeConnection(source, key, dest, sourceType, destType):
data = len(source) * [None]
for a, item in enumerate(source):
data[a] = {"id": item[0]["icId"]}
print(dest[0]["icId"])
print(data[0]["id"])
print("processing count : ", len(source))
tx = graph.begin() #
cmd = '''
MATCH (dest:{destTag}{{icId:"{destId}"}})
UNWIND $arrSource AS row
MATCH (source:{srcType}{{icId:row.id}})
CALL apoc.create.relationship(source, "{k}", {{relation:"usedIn"}}, dest)
YIELD rel
RETURN rel
'''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
# WHERE NOT EXISTS((source)-[:usedIn]->(dest))
print(cmd)
c = tx.run(cmd, arrSource=data)
print("con result : ", c)
tx.commit()
print("Building DB")
make3Manual()
print("Getting data back")
geos = getNodesByType("Obj").to_ndarray()
containers = getNodesByType("Cont", 3).to_ndarray()
print(containers)
makeConnection(geos, "usedIn", containers[0], "Obj", "Cont")
print("End job!")
globalEnd = time.time()
print("Done in :", globalEnd - globalStart)
05-20-2020 04:58 AM
You sent 100k in the same batch? You sould try to iterate the batch 100 times with 1k nodes 🙂
05-20-2020 09:27 AM
Yeh, I figured it would be better to send it all at once ? Else why batch... :- )
05-20-2020 09:29 AM
You should definitely use a batch to create nodes and relations:)
05-20-2020 10:14 AM
But not big batch because it will not work ? :- )))
05-20-2020 10:15 AM
@Dariusz1989 Yeah, it can not work in some case when th database is receiving too much things to create, try to send several batches instead of one big:)
All the sessions of the conference are now available online