Neo4j

bears · ‎04-01-2021

Hi, I'm currently using the official python driver in an application.

A part of this app allows people to upload files that will then be merged into neo4j, usually a CSV, or other consistent structure. Right now, for every file type I come across, we write a new custom cypher query for that datatype going forward. We do that to be as efficient as possible, as everything online seems to indicate that parameterized queries are the only way to do so correctly.

We started out with a models approach, similar to how things like py2neo do it, but saw obscene performance issues, as it was effectively starting a new session / transaction for each column of each row.

What I would like to know is where is the "actual" optimization taking place with parameterization? Is it the database doing a type of caching of the query? Is there something else the driver is doing I don't know of? How this part functions seems to be difficult to find any documentation on.

My goal is to have a part of the application where a user can tell the application the names of their columns, what type of data is in them, and what the relationships between them are. Then I am hoping I can do something similar to string building of my query on the backend, to create a "dynamic", yet efficiently parameterized query to use for that upload.

dana_canzano · ‎04-01-2021

not exactly, maybe what you are after but

and to which we cache of generated query plans. We do not cache of query results.
Further with reference to you comment of

My goal is to have a part of the application where a user can tell the application the names of their columns, what type of data is in them, and what the relationships between them are. Then I am hoping I can do something similar to string building of my query on the backend, to create a "dynamic", yet efficiently parameterized query to use for that upload.

but parameters generally refer to values of properties. for example you might write a query similar to

match (n:Person) where n.age=$param1 return n;  {param1: 42}

and if you then resubmit

match (n:Person) where n.age=$param1 return n;  {param1: 49}

then we would not need to replan the query. But if you are suggesting someone might run

match (n:Person) where n.age=42

and someone else might run

match (n:Person) where n.zipode=02112;

then each of these 2 would need to be planned

View solution in original post

dana_canzano · ‎04-01-2021

not exactly, maybe what you are after but

and to which we cache of generated query plans. We do not cache of query results.
Further with reference to you comment of

My goal is to have a part of the application where a user can tell the application the names of their columns, what type of data is in them, and what the relationships between them are. Then I am hoping I can do something similar to string building of my query on the backend, to create a "dynamic", yet efficiently parameterized query to use for that upload.

but parameters generally refer to values of properties. for example you might write a query similar to

match (n:Person) where n.age=$param1 return n;  {param1: 42}

and if you then resubmit

match (n:Person) where n.age=$param1 return n;  {param1: 49}

then we would not need to replan the query. But if you are suggesting someone might run

match (n:Person) where n.age=42

and someone else might run

match (n:Person) where n.zipode=02112;

then each of these 2 would need to be planned

bears · ‎04-01-2021

Thanks for quickly replying @dana.canzano, this helps .

So from my understanding, if I pre-build a query for a file structure, which is set up with $params, then as long as that doesn't change it will be maintained in the cache and be performant?

So if I build it as something like..

label1 = "Person"
label2 = "Something"
rel = "has_rel"

query = "merge  (n:" + label1 + ") where n.data=$param "
query += "merge (n)-[:" + rel + "]->(m:" + label2 + " {data: $param2}) "
query += "return n"

...

tx.run(query, param= myAge , param2= myOtherData)
tx.run(query, param= myAge2 , param2= myOtherData2)

Then as long as the query doesn't change, it won't need to replan?

dana_canzano · ‎04-01-2021

correct. However and just for specifics, parameters can not be used for label names. i.e. one can not run

match (n:$param1) where n.age=42  {$param1: Person}

Not that your example is doing this but.

bears · ‎04-01-2021

Great, thank you! This will help us optimize how we structure a lot of queries.

Neo4j

Efficient query building