Neo4j

m-kiuchi · ‎09-23-2018

Hi, comms.

I have 8 core and 32GBMEM machine and going to run MATCH query as follows, but this query consumes only 1 core and takes long time.

bar = df.to_dict(orient='records') #df is Pandas dataframe and have 1M rows
with n4jses.begin_transaction() as tx:
    result = tx.run("""UNWIND {bar} as d
                       MATCH (a:AD_ID) WHERE a.adid = d.Ad_id RETURN a.adid""",
                    parameters={'bar': bar})
    print(list(result))

Is there any way to run them in parallel ?

Regards,
MK

michael_hunger · ‎09-24-2018

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.

View solution in original post

stefan_armbrust · ‎09-23-2018

That by design that a Cypher query runs on one single CPU. You can either split up work into multiple cypher statements on client side or use some parallel execution procedures from the apoc library, see https://neo4j-contrib.github.io/neo4j-apoc-procedures/.

m-kiuchi · ‎09-23-2018

Woa ! Thanks much ! I divided source dataset and my query works fine (like this).

def matchNodes(pbar):
    with n4jses.begin_transaction() as tx:
        tx.run("""UNWIND {bar} as d
                  MATCH (a:AD_ID) WHERE a.adid = d.Ad_id""",
                parameters={'bar': pbar})

start=datetime.now()
print(len(bar))
nbulk=5000

for (idx,i) in enumerate(range(int(len(bar)/nbulk))):
    nstart = idx*nbulk
    nend = nstart+nbulk-1
    
    matchNodes(bar[nstart:nend])
    
    dur = (datetime.now() - start).total_seconds()
    perf = int(nend/dur)
    est = datetime.now() + timedelta(seconds=int((len(bar)-nend)/perf))
    print("{0} nodes processed({1} ids per sec, est comp {2})".format(nend, perf, est))
nstart = (idx+1)*nbulk

matchNodes(bar[nstart:])

APOC is new world for me, so I'll learn later... Anyway, thanks again !

MK

michael_hunger · ‎09-24-2018

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.

m-kiuchi · ‎09-24-2018

It looks clean and easy to use ;-). Thanks !

Neo4j

Run MATCH query for multi core machine