cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Slow streaming with neo4j driver compared to py2neo

I run gds.labelPropagation.stream on a virtual graph I created, contains 1 million nodes, with both neo4j and py2neo drivers.

The returned data and the number of fetched nodes was identical with both drivers, except the time took for neo4j driver to fetch all the nodes was significantly slower than with py2neo, almost x3 times slower. The same difference occurred also with gds.wcc.stream.

After creating the virtual graph, I used the following snippet to measure the durations (python 3.9):

# py2neo driver
graph = py2neo.Graph(f"bolt://{db_host}:{db_port}", auth=(user, password))
start = time.time()
graph.run(f"CALL gds.labelPropagation.stream('{graph_name}') YIELD nodeId").data()
logging.info("py2neo driver: nodes fetched after: %s seconds", time.time() - start)

# neo4j driver
graph = neo4j.GraphDatabase.driver(f"bolt://{db_host}:{db_port}", auth=(user, password))
session = graph.session()
start = time.time()
session.run(f"CALL gds.labelPropagation.stream('{graph_name}') YIELD nodeId").data()
logging.info("neo4j driver: nodes fetched after: %s seconds", time.time() - start)

The output is:

INFO:root:py2neo driver: nodes fetched after: 14.329415798187256 seconds
INFO:root:neo4j driver: nodes fetched after: 40.44703483581543 seconds


I tried to increase the fetch_size of the neo4j.Session object but it barely changed the result.

The drivers versions:
py2neo==2021.1.5
neo4j==4.3.3 and also tested with 4.3.2


I used a local neo4j docker image: neo4j:4.3.2-community
The GDS library version is: 1.6.2

I created the graph with the attached data, duplicated x1000 in order to create 1 million nodes:
graph_dataset.txt (31.0 KB)

I will appreciate your help in understanding why is the neo4j driver so slow compared to py2neo, and if there is any way to improve its performance.

Thanks a lot

0 REPLIES 0
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online