cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Run MATCH query for multi core machine

m-kiuchi
Node Clone

Hi, comms.

I have 8 core and 32GBMEM machine and going to run MATCH query as follows, but this query consumes only 1 core and takes long time.

bar = df.to_dict(orient='records') #df is Pandas dataframe and have 1M rows
with n4jses.begin_transaction() as tx:
    result = tx.run("""UNWIND {bar} as d
                       MATCH (a:AD_ID) WHERE a.adid = d.Ad_id RETURN a.adid""",
                    parameters={'bar': bar})
    print(list(result))

Is there any way to run them in parallel ?

Regards,
MK

1 ACCEPTED SOLUTION

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.

View solution in original post

4 REPLIES 4

That by design that a Cypher query runs on one single CPU. You can either split up work into multiple cypher statements on client side or use some parallel execution procedures from the apoc library, see https://neo4j-contrib.github.io/neo4j-apoc-procedures/.

Woa ! Thanks much ! I divided source dataset and my query works fine (like this).

def matchNodes(pbar):
    with n4jses.begin_transaction() as tx:
        tx.run("""UNWIND {bar} as d
                  MATCH (a:AD_ID) WHERE a.adid = d.Ad_id""",
                parameters={'bar': pbar})

start=datetime.now()
print(len(bar))
nbulk=5000

for (idx,i) in enumerate(range(int(len(bar)/nbulk))):
    nstart = idx*nbulk
    nend = nstart+nbulk-1
    
    matchNodes(bar[nstart:nend])
    
    dur = (datetime.now() - start).total_seconds()
    perf = int(nend/dur)
    est = datetime.now() + timedelta(seconds=int((len(bar)-nend)/perf))
    print("{0} nodes processed({1} ids per sec, est comp {2})".format(nend, perf, est))
nstart = (idx+1)*nbulk

matchNodes(bar[nstart:])

APOC is new world for me, so I'll learn later... Anyway, thanks again !

MK

You should use this instead:

MATCH (a:AD_ID) WHERE a.adid IN [d IN {bar} | d.Ad_id] RETURN a.adid

or even better just send the IDs in, not the dicts.

It looks clean and easy to use ;-). Thanks !