cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Retrieving list of nodes & creating relationships

Hey

py2neo for the moment, But trying to build most of the stuff via queries. In any case, I'm struggling with this code... How can I properly and efficiently create 100k+ relationships?

# My getter to return list of nodes
def getNodesByType(type, limit=-1):
    if limit > 0:
        a = graph.run('''
        MATCH (n:{type}) 
        RETURN n 
        LIMIT {limit}'''.format(type=type, limit=limit))
        return a
    return graph.run('''
    MATCH (n:{type}}
    RETURN n'''.format(type=type))

def makeConnection(source, key, dest):
    tx = graph.begin()
    tx.run('''
    UNWIND $arrSource AS mSrc
    CALL apoc.create.relationship(mSrc,{k}, {d})    
    YIELD node
    RETURN node
                 '''
           , arrSource=source, k=key, d=dest)
     tx.commit()

nodeA = getNodesByType("Pets", 10).to_ndarray()
nodeB= getNodesByType("Objects", 3).to_ndarray()

makeConnection(nodeA,"livesIn",nodeB[0])
makeConnection(nodeA[0:5000],"livesNextTo",nodeB[1])
makeConnection(nodeA[50000:99999],"livedIn",nodeB[2])

I keep on getting

TypeError: Neo4j does not support JSON parameters of type ndarray

I'm lost, I mean I'm passing Node object to it, so maybe thats wrong? How should I execute this task ?

Regards
Dariusz

Edit now that I look at it, I'm not sure if I can use the apoc as I need CREATE UNIQUE as I want to only create a relationship if it's missing.

31 REPLIES 31

Hey
I'm starting at it & trying to wrap my head around, thank so much for fast help!

I take your Node has "iid" as "key" ? So Node(iid:"someid") ?
so the iid:row.id1/id2 standas for dics entries inside row mhmhmhmhmhmmm ok I see ! So I should not be passing directly Node form py2neo but rather grab node["myId"] instead and pass that. kkk gacia!.

I think I got it, will test it. As to other thing, how can I make sure that the connection is Unique ? I mean, I want to create relationship only if its missing here.

I don't see apoc.createRelationship.unique()

Regards
Dariusz

No problem, to be honest I never used py2neo, I prefer to use Neo4j Python driver (https://neo4j.com/docs/api/python-driver/current/index.html)

I don't know if I'm the good guy to help you with the py2neo thing but I can give you some advices:)

You can check if the relationship exists before to create it thanks to EXISTS (https://neo4j.com/docs/cypher-manual/current/functions/predicate/#functions-exists)

Hey
I'm not dead set on py2neo, it just was a lib that was proposed on neo youtube channel > https://www.youtube.com/watch?v=3JMhX1sT98U
So I can move to neo4j too.

As to the issue...
I've written my command as follow:

    cmd = '''
        MATCH (dest:{destTag}{{icId:"{destId}"}})
        UNWIND $arrSource AS row
        MATCH (source:{srcType}{{icId:row.id}})
        CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
        YIELD rel
        RETURN rel
                         '''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
    print(cmd)
    c = tx.run(cmd, arrSource=data)

As far as I understand UNWIND, its a "loop", so I decided to match dest outside it(I guess?) as dest is constant, and perform the func above. It does work, which is great but it does not handle existing connections as it should.

I started reading the docs but I'm lost, the 2 examples they show are not enough and other stuff I read is just uhhh.... where is "if else...".... :- (((( :- )

"big brain time" :- ) Lets see if I can crack it ^^ brb.

For the if-else, in Cypher it's called CASE https://neo4j.com/docs/cypher-manual/current/syntax/expressions/

And if you want to check if the relation exists, you can do something in this style after the second MATCH:

WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))

Hello

Nope, I can't get through that if statement system. It is making my head hurt :- )

This is as far as I got >

MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
CALL apoc.when([NOT EXISTS((gt)-[:LINKED_TO]->(ct)),
'RETURN ct'],
'RETURN gt')
YIELD VALUE
RETURN VALUE

I was testing if I can just "test" and return either object depending on test, I think I need apoc.do.when for creating relationship, but for now its just read.

Anyway all I get is
2X_b_b2035fed92cbddd4c48741e4ef820242aa4ced7e.png

Regards
Dariusz

cmd = '''
         MATCH (dest:{destTag}{{icId:"{destId}"}})
         UNWIND $arrSource AS row
         MATCH (source:{srcType}{{icId:row.id}})
         WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
         CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
         YIELD rel
         RETURN rel
     '''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
    c = tx.run(cmd, arrSource=data)

That make no sense at all :- )

So you do WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
And next statement is being executed if True, what statement happen if it exist? How would the else statement look like?

I'm going over as many if/else infos as I can but as far as I can tell its this language is "precious"...

You don't need ifelse, you just want to create the relation when it does not exist and it's what is doing the cypher request.

But if you will always get the same relation type, for example LINKED_TO, you can use MERGE or CREATE UNIQUE to create the relation:)

Yeh I was going to go with unique once I understand the if/else/case/where/when syntaxes...

On this doc, the two examples are pretty clear I think:) There are 2 syntaxes for CASE:
https://neo4j.com/docs/cypher-refcard/current/

Nope not getting the CASE example to work at all.

MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
CASE
WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct)) THEN ct
WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) THEN gt
ELSE THEN ct, gt
END

just gives me

Invalid input 'S': expected 'l/L' (line 3, column 3 (offset: 101))
"CASE"
   ^

😕

You don't need CASE, this request is good:

cmd = '''
         MATCH (dest:{destTag}{{icId:"{destId}"}})
         UNWIND $arrSource AS row
         MATCH (source:{srcType}{{icId:row.id}})
         WHERE NOT EXISTS((dest)-[:LINKED_TO]->(source))
         CALL apoc.create.relationship(source, "{k}", {{relation:"LINKED_TO"}}, dest)
         YIELD rel
         RETURN rel
     '''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
    c = tx.run(cmd, arrSource=data)

Yes I know, I'm just trying to understand how to use CASE at this point. To build more complex logic in future. :- )

MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"})
MATCH (gt:Objects{name:"geo_98608"})
RETURN
CASE
WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct))
THEN ct
ELSE ct, gt
END,
CASE WHEN EXISTS ((gt)-[:LINKED_TO]->(ct))
THEN gt
ELSE ct, gt
END

So I can wrap this around in my head...

MATCH (ct:Pets{icId:"800a1031-1102-49c4-abdd-f294ae867821"}) << variable allocator, allocate ct
MATCH (gt:Objects{name:"geo_98608"}) < variable allocator, allocate gt
RETURN { < return objects from function below
    CASE < if statement begin
        WHEN NOT EXISTS((gt)-[:LINKED_TO]->(ct)) < case 0
            THEN ct  < returns ct from this function meaning the "RETURN ct" above returns ct then
        ELSE ct, gt, < direct return to above, "RETURN ct, gt"
        , < why comma? what does it do ?
        "CASE WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) could be written as >"
        CASE < case 1
            WHEN EXISTS ((gt)-[:LINKED_TO]->(ct)) - check statement, case 1?
                THEN gt same return idea as case 0
            ELSE ct, gt same return idea as case 0
    END - end case? Which case ? we made 2 cases  don't we need 2x END ?

This is how I see it, but I don't get it. We made 2 cases, with 2 if else statements. Are they additive? If both cases are true, and case 0 return ct and case 1 return gt, does the return is then ct, gt?

As to the code above, I get this >

Invalid input ',': expected an identifier character, whitespace, '{', node labels, a property map, a relationship pattern, '.', '(', '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR or END (line 7, column 8 (offset: 164))
"ELSE ct, gt"

Apology for "silly" questions, but I'm just lost with that language so hard... : / I come from c++/Python but this is just black magic :- )

You can see this like a switch case:)
You should try to understand the example on the doc first before to apply it to your code:)

CASE n.eyes
WHEN 'blue' THEN 1
WHEN 'brown' THEN 2
ELSE 3
END
CASE
WHEN n.eyes = 'blue' THEN 1
WHEN n.age < 40 THEN 2
ELSE 3
END

Hello

Apology for slow reply. But as I'm new user apparently I'm limited to how much I can post :- )

Right so I had a "fun" day of if else tests. In any case I ended up running some tests but I hit a bit of a snag...

While I tried to connect 100k nodes 1 in the loop above, even without the WHERE NOT EXISTS test It was unable to perform the task ...

I'm not quite sure how to bite it at this point. Could neo not be the tool for me ? I'm looking to have millions of nodes with relationships to each other that could be updated/changed at a 50-100k per user request.

Regards
Dariusz

Hello,

I think Neo4j is the tool you need but it requires time to learn how to use it correctly:)

You can ask your database to give you the nodes that have no relationships and after create a relation between them like this you don't need to check at the creation, that's anoter solution:)

HMmmm no idea, maybe its the py2neo ? All I did was send 100k {id:"vdsvds"}dict with message to create relation to 1 node. No idea why he would not execute that command. Its the "simplest" thing I can think off to test after creating the 100k nodes in 1st place.

How can I debug this? To see why it did not make the relationships?

Someting I do when I work on a new Cypher request is that I first test it in Neo4j Desktop and when it works correctly, I adapt it to my Python program:)

Hmmm its strange, when I make 20 nodes relationships it work, 500, it works, 100k it does not

You tried 100k in Neo4j Desktop or in Python?
Do there is an error message?

In python, using the same code that generated the 100k nodes and was able to connect 20 nodes, but at 100k it breaks. No error code. just hangs for ever.

Could we see the code?

I've cleaned it up, will try running it again today, but for now >

import uuid
from json import dumps, loads

from py2neo import Node, Relationship, Graph, NodeMatcher
import time

graph = Graph("http://localhost:7474", password="graph")
globalStart = time.time()
print(graph)

def make3Manual():
    print("Deleting old graph")
    graph.delete_all()
    print("Deleted old graph")

    nData = {}
    nData["type"] = ""
    nData["name"] = "name"
    nData["reference"] = 1
    nData["icId"] = ""

    nodeSize = 100000 * [None]
    matSize = 100 * [None]
    containerSize = 100 * [None]
    sceneSize = 200 * [None]

    for id in range(0, len(nodeSize)):
        n = nData.copy()
        n["type"] = "Obj"
        n["name"] = "geo_" + str(id)
        n["refType"] = 1
        n["icId"] = str(uuid.uuid4())

        nodeSize[id] = n

    for id in range(0, len(matSize)):
        n = nData.copy()
        n["type"] = "animal"
        n["name"] = "ani_" + str(id)
        n["refType"] = 1
        n["icId"] = str(uuid.uuid4())

        matSize[id] = n

    for id in range(0, len(containerSize)):
        n = nData.copy()
        n["type"] = "Cont"
        n["name"] = "Cont_" + str(id)
        n["refType"] = 1
        n["icId"] = str(uuid.uuid4())

        containerSize[id] = n

    for id in range(0, len(sceneSize)):
        n = nData.copy()
        n["type"] = "Scene"
        n["name"] = "someani_" + str(id)
        n["refType"] = 1
        n["icId"] = str(uuid.uuid4())
        sceneSize[id] = n

    allData = nodeSize + matSize + containerSize + sceneSize
    print("Allocated arrays")
    tx = graph.begin()
    start = time.time()

    tx.run('''
    UNWIND $mapEntry AS mItem
    CALL apoc.create.node([mItem["type"]], {name:mItem["name"], art:mItem["atr"],icId:mItem["icId"], refType:mItem["refType"]})
    YIELD node
    RETURN node
                 '''
           , mapEntry=allData)

    tx.commit()
    print("Processed in : ", time.time() - start)


def getNodes(nodeClass, lim=-1):
    nMatcher = NodeMatcher(graph)
    if (lim):
        a = nMatcher.match(nodeClass).limit(lim)
    else:
        a = nMatcher.match(nodeClass)
    print("Found nodes : ", len(a), "  ", nodeClass)
    return list(a)


def buildRelation(source, key, dest):
    rela = len(source) * [None]
    x = 0
    for src in source:
        rela[x] = Relationship(src, key, dest)
        x = x + 1
    tx = graph.begin()
    for r in rela:
        tx.create(r)
    tx.commit()


def makeConnectionQuery(source, key, dest, sourceType, destType):
    for s in source:
        r = graph.run('''
        MATCH a={s}, b={d}
        CREATE UNIQUE (a)-[r-{k}]-()b
        RETURN a,b
        '''.format(s=s, k=key, d=dest))
        print("new con, key : ", key, " Reply : ", r)


def getNodesByType(type, limit=-1):
    if limit > 0:
        a = graph.run('''
        MATCH (n:{type}) 
        RETURN n 
        LIMIT {limit}'''.format(type=type, limit=limit))
        return a
    return graph.run('''
    MATCH (n:{type})
    RETURN n'''.format(type=type))


def makeConnection(source, key, dest, sourceType, destType):
    data = len(source) * [None]
    for a, item in enumerate(source):
        data[a] = {"id": item[0]["icId"]}
    print(dest[0]["icId"])
    print(data[0]["id"])
    print("processing count : ", len(source))
    tx = graph.begin()  #
    cmd = '''
        MATCH (dest:{destTag}{{icId:"{destId}"}})
        UNWIND $arrSource AS row
        MATCH (source:{srcType}{{icId:row.id}})
        
        CALL apoc.create.relationship(source, "{k}", {{relation:"usedIn"}}, dest)
        YIELD rel
        RETURN rel
                         '''.format(destId=dest[0]["icId"], destTag=destType, srcType=sourceType, k=key)
    #        WHERE NOT EXISTS((source)-[:usedIn]->(dest))

    print(cmd)
    c = tx.run(cmd, arrSource=data)
    print("con result : ", c)
    tx.commit()


print("Building DB")
make3Manual()

print("Getting data back")

geos = getNodesByType("Obj").to_ndarray()
containers = getNodesByType("Cont", 3).to_ndarray()
print(containers)
makeConnection(geos, "usedIn", containers[0], "Obj", "Cont")

print("End job!")
globalEnd = time.time()
print("Done in :", globalEnd - globalStart)

You sent 100k in the same batch? You sould try to iterate the batch 100 times with 1k nodes 🙂

Yeh, I figured it would be better to send it all at once ? Else why batch... :- )

You should definitely use a batch to create nodes and relations:)

But not big batch because it will not work ? :- )))

@Dariusz1989 Yeah, it can not work in some case when th database is receiving too much things to create, try to send several batches instead of one big:)