Neo4j

jfisher12 · ‎03-04-2021

I'm working my way through ~50 metrics for potential use as machine learning features. I want to compare different queries that access the information and compare resource allocation when the data is in different graph models and indexing.

I've created three versions of the data (Version 4.1.3) and stored it in Neo4j Desktop. I access the databases individually using the Neo4j Python driver and dump the results into a Jupyter Notebook.

The Cypher Workflow document indicates the queries return a stream of records, header and footer metadata, and a result summary that contains additional information relating to query execution and result content (which includes the information for EXPLAIN and PROFILE).

The Knowledge Base has an article on how to get to much of this information using the cypher shell, but I can't find the directions/functions that will allow me to access this through the Python Driver.

To help optimize my queries, graph models, and indexing, I want to compare query performance information such as heap size, physical memory, run time, caching, CPU use, thread count, and records returned.

This article on data science stack exchange references references something similar done in Neo4j in Action.

I'd like to pull this information programmatically in real time into the Jupyter Notebook when I run each specific query (performance information alone can be returned, or returned with the query results) and not have to monitor an application like the Halin Monitoring one.

jfisher12 · ‎03-12-2021

For those interested, I was able to work out the answer.

In the Jupyter notebook I ran the query and stored the Result object, then was able to access the metadata:

from neo4j import GraphDatabase

uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=('neo4j', 'password'))
session = driver.session()

count1 = session.run('''PROFILE MATCH (c)
RETURN count(c) as node_count''')

count1.consume().metadata

Other information available using the .consume() method can be found by replacing metadata with options like result_available_after, result_consumed_after, or profile.

Additional details can be found in the Result section of the Python Driver API Documentation.

A working example of the code can be found in a notebook on my GitHub page.

Neo4j

Assess query performance with python driver