cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to execute query fully and discarding results for performance comparison?

Hi 🙂 TL;DR: I'm looking for how to execute a query in a way that makes neo4j do all the work, inform me about time to produce first and last row, but discard the results without actually sending them to the client. Is there any way to profile the amount of milliseconds needed until first and last result row/object is available from neo4j without streaming the dataset to the client? I haven't found anything by searching.

Edit: I will probably be using dotnet core for these tests, so if there are any solutions for that scenario I would be happy to know about them 🙂

Background: I've recently tried out the Community edition in order to evaluate neo4j for a particular use case of ours involving Active Directory group memberships. The performance I see is simply astounding, and I'm pretty sure we'll end up using it in production. The graph is probably small (~300k objects and ~2.6m relationships), but the variable length path logic we need turns out to be pretty resource intensive for a regular RDBMS. The logic involved to make it performant is also very complicated, but the main problem is the variable cardinality involved when considering variable path lengths for groups that can have ~350k indirect members and suddenly become members of another group themselves.

However, when I'm tuning our cypher queries and the model I like to be able to quick-and-dirty compare different methods by execution time, and I suspect that the numbers I see in the browser and cypher-shell isn't very helpful in that regard. I'd like to profile the amount of milliseconds needed until first result and last result is available without streaming them to the client. Line bandwith will be sufficient for the task when I'm done, for now I'm just interested in a quick way to compare how much resources/time neo4j requires for different variations of queries over a substantial set of test parameters in a local test environment.

I have a lot of experience tuning regular RDBMS queries, and have learnt how to evaluate neo4j plans. Still, over the years I've found that for tasks that can have very varying execution times based on input parameters it can be useful to generate test results based on execution times for different variations of a query over a huge number of different input parameters in order to identify problem areas.

Thanks
-Anders Liane

2 REPLIES 2

Unfortunately I'm not aware of a toggle or something to discard results, but you could alter your query to return count() at the end. Count() is cheap, and ensures only a single row with the count is returned, so time to transfer the result should be minimal.

Thanks for your reply. It seems to change the plan, though, even if I try to make it count everything.

After some testing it seems like I'll get a pretty good idea of real-world performance by simply wall-timing from C# while not processing the results.

If I think I've found something useful I'll drop it here 🙂