Neo4j

kalyan_b_aninda · ‎05-20-2020

Hello I am trying to find the preferential Attachment score for a large sample of nodes.

def Prefer_Attachment_query2(listval):
    customer_id=listval[0]
    merchant_id=listval[1]
    #print(x,y)
    prefquery="""MATCH (p1:CUSTOMER {WALLETID: '%s'})
                 MATCH (p2:MERCHANT {WALLETID: '%s'})
                 RETURN gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"}) as score"""%(customer_id,merchant_id)
    #print(prefquery)
    return prefquery

This function is called from a nested loop. My Customer_id sample is 1000 and marchant_id is nearly 50k. so each id is going to be compared with 50k Merchant_id for getting its pref_score with particular merchant_id. the code is working but performance is very slow. for 2 customer_id with 50k merchant_id pref score is calculated in 500 sec. and i have tried 1000 sample but didnt completed after 24 hours plus running the machine. my log showed that only 350 of them are processed .
I have tried multiprocessing package as well. But got some unstable outputs and comes up with this error

Failed to read from defunct connection Address(host='localhost', port=7687) (Address(host='127.0.0.1', port=7687))
Failed to read from defunct connection Address(host='localhost', port=7687) (Address(host='127.0.0.1', port=7687))
ServiceUnavailable: Failed to read from defunct connection Address(host='localhost', port=7687) (Address(host='127.0.0.1', port=7687))

i have searched and found the issue that using multiprocessing package is the issue in neo4j bolt driver

https://github.com/neo4j/neo4j-python-driver/issues/260.
for issue 260 i have changed dictionary key into strings but didn't solve the multiprocessing problem

is there any way to use apoc library for this query to run parallel or batch-wise. Kindly help me out

Neo4j

Can i run Preferential Attachment for large sample inputs