Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-17-2019 05:00 PM
For example, we want to measure the response time or latency of queries, or we want to see how many queries a given hardware configuration can handle.
To help out with that, I wrote a JavaScript module called graph-workload. I use this for some internal benchmark testing, to help verify that Neo4j’s cloud distributions are working properly, and for other debugging tasks.
This article describes how to use it, how you can generate test data with it, and measure overall throughput. Questions? Comments? Come discuss this on the Neo4j Community Thread about graph-workload.
Generate light or heavy workloads with graph-workload!
Important tip: This tool performs writes to your database, of test data to simulate load. Do not run it against a production system, and please keep in mind it will modify your database!
We’ll use the javascript method. Head on over to the Github Repo and clone the repo, then install the dependencies.
git clone https://github.com/moxious/graph-workload.git
cd graph-workload
npm install
To run it, use the following command to get usage info. Below, we’ll explain what these options mean, and how to use them.
node src/run-workload.js --help
With this tool, a workload is the combination of two things:
Query table: A single Cypher query can work. A query table tells the app which strategies you want to run, and how frequently you want them run, by probability.
Run configuration: This tells the workload generator how much to do. It includes several settings, but the most important are:
If you start the program with defaults, it will choose a mixed workload that contains both reads and writes, and creates a fairly chaotic load pattern on your database at high levels of concurrency.
node src/run-workload.js -a my-neo4j-host.com -u neo4j -p secret \
--ms 5000
This specifies to run queries against my-neo4j-host.com for 5000ms (which is 5 seconds).
The output looks like this: (with some benchmarking output omitted for space)
{ address: 'bolt://my-neo4j-host.com',
username: 'neo4j',
concurrency: 10,
ms: 5000,
checkpointFreq: 5000 }
Connecting to bolt://my-neo4j-host.com
Creating session pool with { min: 1, max: 10 }
Progress: 0.00% 0 completed; 0 running 0 error
Starting main promise pool
Starting timer at 2019-03-14T19:44:12Z to expire at 2019-03-14T19:44:17Z after 5000 ms
Progress: 100.00% 1493 completed; 10 running 0 error
Timeout
Shutting down
{ complete: 1505, running: 0, errors: 0 }
On this run, we ended up running 1,505 queries in 5 seconds against this host.
If we want to run a single query over and over (for example to simulate new transactional records coming in), we can do that like this:
node src/run-workload.js -a localhost \
-u neo4j \
-p admin \
--concurrency 55 \
--ms 30000 \
--query 'CREATE (order:ProductOrder { date: datetime(), someData: rand() });'
This is going to be a lot harder on our database. We’ll run 55 concurrent queries for 30 seconds. Each query will create new nodes. On our sample test database, we were able to run this query 62,939 times in 30 seconds, or about 2,100 nodes per second without really trying to optimize for speed.
By using Halin to monitor the Neo4j instance while the workload is running, we can see the traffic spike while those queries were active:
(If you’d like to learn more about Halin, you can read this article)
Workloads can be specified as a simple JSON file that looks like this:
{
[ 0.5, "randomLinkage" ],
[ 1.0, "starWrite" ]
}
Graph-workload has a bunch of built-in strategies. The two strategy names are given as examples, the full list is described below. This strategy table tells the app to run the first strategy 50% of the time, and the second strategy 50% of the time. Basically, the app rolls a random number from 0–1.0 for each query it wants to run, and looks through the strategy table and picks according to this distribution.
In your workload, you may specify any of these strategies, to mix and match design any kind of workload you like.
The intent is to be able to use some combination of these strategies to simulate the kind of work that your database needs to do. If none of these fit, you can always either run your own custom queries as above, or implement a small JavaScript class which will create a new strategy. Strategies may have custom “setup actions” (for example, create an index on an ID property prior to merging based on the ID property)
Happy graph hacking!
Generating Test Workloads on Neo4j was originally published in neo4j on Medium, where people are continuing the conversation by highlighting and responding to this story.
All the sessions of the conference are now available online