Neo4j

ap · ‎06-17-2022

I have 2 (very similar) questions regarding the WITH wildcard (*): Note that I'm not concerned by variable name clashes, as any variable introduced is guaranteed to have a unique name.

Let's say I have 50 variables in scope, but only a few need to passed through a WITH clause. Would there be any performance implications by passing them all through using the wildcard (*), or are there any significant advantages to manually specifying the ones that need to be passed through?
Similarly, let's say I have 50 variables in scope, and I need a few of these to be imported into a CALL {} subquery. Would there be any performance implications of importing them all using the wildcard (*)?

The purpose of this is to find out wether I can simplify the query generation tool I'm building by always using the wildcard. Otherwise it requires me to keep track of which variables are used where, and which need to passed between scopes.

michael_hunger · ‎06-17-2022

From a clarity perspective it's better to call the ones out that are passed through.

I mostly use WITH * when I want to apply an in-between filter or pagination.

Performance wise - it can reduce the width of the "register" that cypher has to carry through so it can free up some memory for those things that are no longer needed.

50 variables sounds also really dangerous, like generated query, you should be careful with those, and make sure you profile them.

Also if you generate unique names (i.e. UUIDs) then the cypher planner and parser cannot cache your query plans as every query will be a unique new one and you'll lose a lot of performance from replanning on every request.

View solution in original post

glilienfield · ‎06-17-2022

I don’t know the answer, but I highly doubt it with any practical query, where the number of variables would be small. Of course, this would not work for a ‘with’ clause with aggregate functions.

not sure what your project is, but maybe neo4j DSL will help you build your queries easier than creating them as strings.

https://neo4j-contrib.github.io/cypher-dsl/current/

ap · ‎06-17-2022

Thanks, your point about the aggregation functions is an important one.

michael_hunger · ‎06-17-2022

From a clarity perspective it's better to call the ones out that are passed through.

I mostly use WITH * when I want to apply an in-between filter or pagination.

Performance wise - it can reduce the width of the "register" that cypher has to carry through so it can free up some memory for those things that are no longer needed.

50 variables sounds also really dangerous, like generated query, you should be careful with those, and make sure you profile them.

Also if you generate unique names (i.e. UUIDs) then the cypher planner and parser cannot cache your query plans as every query will be a unique new one and you'll lose a lot of performance from replanning on every request.

ap · ‎06-17-2022

Thanks Michael, your comments have led me to the decision to avoid the wildcard, except for the use case you mention of pagination or filtering. Related to this, it made me wonder if there's any reason, other than the clarity and readability, why Cypher doesn't allow one to use a LIMIT or WHERE clause without a prior WITH (or RETURN/MATCH) clause?

I highly doubt a query will ever get close to 50 variables, I just invented a large number to demonstrate my question. Regarding the unique names, they will be created sequentially (i.e. var1, var2...), so there shouldn't be an issue with replanning.

Neo4j

Are there any performance implications of using the wildcard (*) in a WITH statement"