Neo4j

kalyan_b_aninda · ‎05-22-2020

Hello I am trying to get preferential attachment score between nodes Customer and Merchant which have at least one 'payment' relationship between them. I wanted to use APOC library for using parallelism features. so far i have seen a post 'HOW best to do parallel processing' in community and comes up with this arranged code

MATCH (p:CUSTOMER)-[r:PAYMENT]->(p2:MERCHANT)
WHERE p2.BMCC IN ['1','10','11','12','14','18','19','2','20','21','22','23','24','25','27','2741','28','29','30','32','3351','36','4','4131','4215','4511','4722','4812','4814','4816','4899','4900','5','5039','5047','5065','5111','5199','5200','5211','5251','5261','5399','5411','5441','5511','5533','5541','5641','5651','5661','5697','5712','5722','5732','5733','5734','5811','5813','5814','5912','5940','5941','5942','5944','5947','5948','5950','5977','5992','6','7','7011','7210','7221','7230','7379','7399','7531','7629','763','7832','7911','7996','7997','7999','8021','8062','8099','9']
with collect(distinct(p2))as grouplist 
call apoc.cypher.mapParallel2("OPTIONAL MATCH (c:CUSTOMER)-[r2:PAYMENT]->(m:MERCHANT) return distinct(c.WALLETID) as customer_wallet,m.WALLETID as merchant,m.BMCC as BMCC_CODE,gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"}) as score",{parallel:True, batchSize:5000, concurrency:20},grouplist,4)yield value
return value.customer_wallet,value.merchant,value.BMCC_CODE,value.score

so far here in first phase i am filtering CUSTOMER and PAYMENT through BMCC_CODE, and then i n apoc library i am calculatating preferential attachment score and return distinct customer and merchant with their preferential attachment score with its BMCC_CODE. i have taken distinct merchant value as grouplist here cause distinct customer wallet could be connected same Merchant number with same BMCC codes
I think i am doing some wrong here but could not figure it and. i am getting bellow syntax error

Invalid input 'P': expected whitespace, '.', node labels, '[', "=~", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, ',' or ')' (line 4, column 253 (offset: 918))
"call apoc.cypher.mapParallel2("OPTIONAL MATCH (c:CUSTOMER)-[r2:PAYMENT]->(m:MERCHANT) return distinct(c.WALLETID) as customer_wallet,m.WALLETID as merchant,m.BMCC as BMCC_CODE,gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"}) as score",{parallel:True, batchSize:5

000, concurrency:40},grouplist,4)yield value"

I am learning neo4j right now. So i am not that much good in debugging. can anyone help me to correct out the query. My cpu core is 48 and my data size is 5.8 mill rows

intouch_vivek · ‎05-22-2020

kalyan.b.aninda:

MATCH (p:CUSTOMER)-[r:PAYMENT]->(p2:MERCHANT)
WHERE p2.BMCC IN ['1','10','11','12','14','18','19','2','20','21','22','23','24','25','27','2741','28','29','30','32','3351','36','4','4131','4215','4511','4722','4812','4814','4816','4899','4900','5','5039','5047','5065','5111','5199','5200','5211','5251','5261','5399','5411','5441','5511','5533','5541','5641','5651','5661','5697','5712','5722','5732','5733','5734','5811','5813','5814','5912','5940','5941','5942','5944','5947','5948','5950','5977','5992','6','7','7011','7210','7221','7230','7379','7399','7531','7629','763','7832','7911','7996','7997','7999','8021','8062','8099','9']
with collect(distinct(p2))as grouplist 
call apoc.cypher.mapParallel2("OPTIONAL MATCH (c:CUSTOMER)-[r2:PAYMENT]->(m:MERCHANT) return distinct(c.WALLETID) as customer_wallet,m.WALLETID as merchant,m.BMCC as BMCC_CODE,gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"}) as score",{parallel:True, batchSize:5000, concurrency:20},grouplist,4)yield value
return value.customer_wallet,value.merchant,value.BMCC_CODE,value.score

Hi @kalyan.b.aninda,

In the call gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"})
p1is not defined. Looks like is a typo error from line 1

Neo4j

Facing problem with parallelism with apoc