Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-26-2021 06:16 AM
I run gds.labelPropagation.stream
on a virtual graph I created, contains 1 million nodes, with both neo4j
and py2neo
drivers.
The returned data and the number of fetched nodes was identical with both drivers, except the time took for neo4j
driver to fetch all the nodes was significantly slower than with py2neo
, almost x3 times slower. The same difference occurred also with gds.wcc.stream
.
After creating the virtual graph, I used the following snippet to measure the durations (python 3.9):
# py2neo driver
graph = py2neo.Graph(f"bolt://{db_host}:{db_port}", auth=(user, password))
start = time.time()
graph.run(f"CALL gds.labelPropagation.stream('{graph_name}') YIELD nodeId").data()
logging.info("py2neo driver: nodes fetched after: %s seconds", time.time() - start)
# neo4j driver
graph = neo4j.GraphDatabase.driver(f"bolt://{db_host}:{db_port}", auth=(user, password))
session = graph.session()
start = time.time()
session.run(f"CALL gds.labelPropagation.stream('{graph_name}') YIELD nodeId").data()
logging.info("neo4j driver: nodes fetched after: %s seconds", time.time() - start)
The output is:
INFO:root:py2neo driver: nodes fetched after: 14.329415798187256 seconds
INFO:root:neo4j driver: nodes fetched after: 40.44703483581543 seconds
I tried to increase the fetch_size
of the neo4j.Session
object but it barely changed the result.
The drivers versions:
py2neo==2021.1.5
neo4j==4.3.3
and also tested with 4.3.2
I used a local neo4j docker image: neo4j:4.3.2-community
The GDS library version is: 1.6.2
I created the graph with the attached data, duplicated x1000 in order to create 1 million nodes:
graph_dataset.txt (31.0 KB)
I will appreciate your help in understanding why is the neo4j
driver so slow compared to py2neo
, and if there is any way to improve its performance.
Thanks a lot
All the sessions of the conference are now available online