Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-18-2018 12:07 PM
Hey all,
I was testing the speed of neo4j-driver (1.6.2 and 1.7.1) and py2neo (4.1.3) and I found that the simple HTTP requests that I was doing are 2-5 times faster for medium sized queries and up. I'll take you through what I did, so we can hopefully figure out what's going on here and when it makes sense to use the libraries.
Now that I hopefully have your attention, lets take a step back and give you some background info. When I started working with neo4j, I learned about by sending JSON HTTP requests to the API at the hostname:7474/db/data/transaction/commit
endpoint. I like knowing the guts of what I'm dealing with, so processing the raw JSON responses works well for me and over time I've added my own thin wrapper around the python requests library for some quality of life improvements.
I saw that some colleagues are using py2neo, so I wondered if using a driver library would make sense for me. For one they provide interactions through bolt
, which sounds to me like it would be more efficient than sending raw JSON. Also, the results are a bit nicer to work with (I got a bit bored of writing for datum in response['results'][0]['data']
but the downside is that it seems a bit awkward to execute many queries in a single transaction (e.g. one type of query but with varying parameters).
So I set up a small test bench. In python (3.7) I created a script that iterates through the different drivers and test scenarios. It executes the queries 10 times and averages the result (while also showing the time for each individual execution of a test scenario). The graph for these tests is a copy of our 'production' database with order of magnitude millions of nodes and version 3.0.6 of neo4j (outdated yes, but it's realistic for my usecase). As my driver libraries I tested py2neo 4.1.3, neo4j-driver 1.6.2 (which comes with py2neo) and neo4j-driver 1.7.1.
I created 3 of scenarios:
Note: I figured that the difference between reading and writing would be mostly down to neo4j, and not down to the library that I used to execute the commands, so I didn't go through the trouble of generating data to push.
Note 2: I chose the upper limits for scenarios 2 and 3 based on how long I had to wait for the 10 repetitions. I could've gone orders of magnitude larger on each scenario, but I didn't feel like waiting minutes.
Let me start by saying that I know that the order in which drivers are tested probably matters because of caching. I indeed see the response times dropping after the first query. This only seems to matter a few ms though, which isn't so important when we look at the queries that return more than 10 results.
Average request time in milliseconds after 10 repetitions.
HTTP requests wrapper: 2.2ms
py2neo (bolt): 0.5ms
neo4j-driver (1.6.2): 1.0ms
neo4j-driver (1.7.1): 0.7ms
As you can see, it's close together, but there's no question that py2neo and neo4j-driver are always faster than going through the HTTP requests wrapper. If we need many different, single queries then it would make a big difference, but the difference isn't human noticeable for a few queries.
Average request time in milliseconds, after 10 repetitions. Response times given in order for response limits of 10, 100, 1k and 10k nodes.
HTTP requests wrapper: 3.0, 4.8, 14.3, 95.1
py2neo (bolt): 1.6, 5.6, 42.9, 435.5
neo4j-driver (1.6.2): 2.3, 5.2, 29.0, 283.2
neo4j-driver (1.7.1): 2.5, 4.5, 30.8, 296.3
This came as a huge surprise to me, it seems that the simple HTTP requests are a lot faster than both other libraries, but also faster than py2neo running over HTTP (results not shown, as it was a bit slower than py2neo over bolt). The difference between the simple requests and neo4j-driver is a factor of 2.5-3x and py2neo is 4.5x slower. The driver libraries seem to scale linearly above 1k nodes, whereas the requests scale better than linear.
Average request time in milliseconds, after 10 repetitions. Response times given in order for 10, 100 and 1k queries in 1 transaction.
HTTP requests wrapper: 2.8, 7.5, 44.0
py2neo (bolt): 2.8, 26.6, 243.0
neo4j-driver (1.6.2): 2.9, 14.0, 147.3
neo4j-driver (1.7.1): 3.3, 17.6, 170.8
Again, quite a big difference between the simple HTTP requests and the libraries. The neo4j-drivers are around 3.5x slower than simple requests, and py2neo is a whopping 5.5x slower. Again the driver libraries seem to scale linearly with the size of the test.
My first instinct was that the driver libraries present the results in a more usable way (list of dicts) than the raw JSON response through the HTTP API. So I built some additional logic to see how costly this transformation is. The result is that it takes 5-10% longer to return the nice list of dictionaries.
So, what is going on here? I imagine that the version of neo4j might play a role. Also the driver libraries might do some additional fancy processing, which can be nice but also costly if you just want to retrieve a lot of nodes and their properties.
12-19-2018 12:25 AM
Thanks a lot for the feedback.
I think mostly the difference is between the network & stream processing code. For requests this will go down to c-code while the bolt-drivers are doing it in python.
It would be good if you could re-run it in 3.5.x to see if the server makes a difference.
Also as far as I can remember the python driver has an option to use the "compile python to C" thingy, not sure if that's activated by default.
In the future we might switch to the c-connector (seabolt) as an optional underlying connector for other languages too (like python) which should alleviate this issue.
12-19-2018 03:27 AM
Hey hey,
I copied the data that I was testing with to the current docker version of neo4j:3.5 and the results are very similar. This is only tested with neo4j-driver 1.6.2, because this is the version that py2neo 4.1.3 requires, so I can test both drivers with the same virtual env.
For completeness I now also include the results of each repetition, to get a better feel of the variation between them (and see the effect of the 1st query).
For all scenario's I added the result for the variant of HTTP requests where the result is transformed from the raw JSON to a list of dicts for single queries per transaction and a list of a list of dicts for multiple queries per transaction. You'll see that it's faster for simple queries because it can use caching from the raw-response test that was run just before it, but for bigger queries it takes a small hit.
For scenario 2 I added results for a limit of 100k nodes in the response
For scenario 3 I added results for 'naively' performing the queries each in their own transaction with the py2neo library and the HTTP requests. py2neo actually doesn't perform all that badly, unlike HTTP requests, which is a total disaster if you take the naive approach. I guess that's the price of setting up a new HTTP connection over and over.
Scenario 1, retrieve total node count
HTTP requests average 3.69ms (14.26, 2.69, 3.12, 2.04, 2.02, 2.07, 2.23, 2.29, 3.20, 2.97)
HTTP requests dict average 2.44ms (2.77, 1.91, 1.97, 2.19, 2.45, 2.80, 2.56, 2.46, 2.58, 2.73)
py2neo-bolt average 0.61ms (1.57, 0.56, 0.50, 0.51, 0.50, 0.50, 0.52, 0.51, 0.46, 0.46)
py2neo-http average 1.52ms (2.50, 1.86, 1.76, 1.84, 1.71, 1.13, 1.05, 1.17, 1.14, 1.08)
neo4j-driver average 0.85ms (1.27, 0.60, 0.61, 0.62, 0.58, 0.54, 2.24, 0.57, 0.97, 0.50)
Scenario 2, retrieve all nodes with a specific label, limit 10
HTTP requests average 2.94ms (3.82, 3.61, 3.54, 2.87, 2.09, 2.13, 2.43, 2.59, 3.18, 3.15)
HTTP requests dict average 2.83ms (3.71, 2.31, 2.37, 2.19, 1.89, 2.05, 3.11, 4.12, 3.51, 3.03)
py2neo-bolt average 0.95ms (1.96, 0.80, 0.93, 0.79, 0.82, 0.97, 0.97, 0.71, 0.83, 0.67)
py2neo-http average 1.90ms (2.04, 2.51, 2.10, 2.10, 2.18, 1.69, 1.47, 1.22, 1.57, 2.07)
neo4j-driver average 1.19ms (2.39, 1.12, 1.16, 0.98, 1.01, 0.97, 1.07, 1.03, 1.16, 0.96)
Scenario 2, retrieve all nodes with a specific label, limit 100
HTTP requests average 3.18ms (4.45, 3.62, 3.67, 3.43, 3.11, 2.80, 2.70, 2.59, 2.56, 2.88)
HTTP requests dict average 3.61ms (4.35, 3.88, 3.95, 3.45, 2.82, 2.95, 2.81, 3.55, 4.19, 4.18)
py2neo-bolt average 5.29ms (11.01, 6.69, 4.00, 4.38, 5.53, 4.66, 4.09, 4.04, 3.66, 4.86)
py2neo-http average 5.78ms (9.05, 5.64, 5.12, 5.15, 5.17, 5.26, 4.84, 4.86, 5.43, 7.26)
neo4j-driver average 3.40ms (4.41, 3.77, 3.74, 3.76, 3.42, 3.26, 2.84, 2.88, 2.97, 2.94)
Scenario 2, retrieve all nodes with a specific label, limit 1k
HTTP requests average 11.68ms (21.70, 8.75, 8.52, 8.86, 9.07, 9.47, 9.73, 10.22, 10.06, 20.44)
HTTP requests dict average 10.77ms (13.03, 9.85, 9.51, 9.35, 9.15, 9.21, 8.86, 9.25, 20.41, 9.07)
py2neo-bolt average 36.86ms (52.81, 34.70, 35.79, 30.40, 30.35, 30.24, 30.22, 58.32, 36.02, 29.78)
py2neo-http average 43.15ms (44.46, 38.60, 38.10, 38.32, 76.22, 42.08, 37.55, 38.31, 40.00, 37.83)
neo4j-driver average 27.01ms (41.80, 25.98, 26.46, 25.23, 25.65, 25.36, 26.14, 25.27, 24.18, 24.00)
Scenario 2, retrieve all nodes with a specific label, limit 10k
HTTP requests average 97.91ms (118.27, 101.47, 99.13, 93.75, 94.61, 92.37, 105.23, 92.71, 91.13, 90.40)
HTTP requests dict average 101.82ms (98.47, 93.92, 114.48, 100.92, 98.12, 96.43, 94.82, 114.12, 101.78, 105.17)
py2neo-bolt average 405.88ms (537.93, 316.30, 320.56, 276.29, 537.42, 468.17, 279.08, 546.40, 495.03, 281.60)
py2neo-http average 711.81ms (755.73, 710.34, 726.62, 732.32, 706.83, 763.08, 640.70, 670.36, 681.37, 730.70)
neo4j-driver average 265.41ms (292.06, 274.46, 259.34, 261.72, 256.50, 258.42, 257.39, 272.20, 261.35, 260.72)
Scenario 2, retrieve all nodes with a specific label, limit 100k
HTTP requests average 1261.74ms (1349.77, 1276.86, 1302.08, 1256.24, 1188.57, 1224.66, 1235.40, 1270.25, 1242.93, 1270.70)
HTTP requests dict average 1405.74ms (1407.45, 1426.14, 1354.19, 1435.64, 1372.40, 1462.19, 1383.32, 1416.45, 1398.17, 1401.40)
py2neo-bolt average 5400.10ms (6125.14, 4507.51, 5423.34, 5682.48, 6076.66, 5330.21, 4629.22, 5446.57, 4215.68, 6564.16)
py2neo-http average 8652.15ms (8579.51, 8528.91, 8868.20, 8682.91, 8513.26, 8761.23, 8653.73, 8673.72, 8606.97, 8653.10)
neo4j-driver average 3084.31ms (3216.46, 3189.50, 3053.85, 3034.93, 3069.99, 3010.55, 3003.43, 3060.32, 3119.85, 3084.24)
Scenario 3, 10 queries in 1 transaction
HTTP requests average 2.47ms (3.48, 3.25, 3.36, 2.29, 1.98, 2.21, 2.05, 2.09, 1.98, 2.01)
HTTP requests dict average 3.43ms (4.34, 3.54, 3.16, 2.63, 2.96, 3.63, 3.42, 3.46, 3.61, 3.55)
HTTP requests naive average 19.6ms (27.48, 21.54, 18.54, 16.77, 16.18, 20.57, 17.61, 18.90, 18.00, 19.94)
py2neo-bolt average 4.16ms (6.10, 5.25, 3.51, 4.14, 3.47, 4.00, 3.68, 4.16, 3.48, 3.85)
py2neo-bolt-naive average 3.78ms (4.34, 3.87, 3.90, 3.31, 3.54, 3.45, 4.43, 3.86, 3.61, 3.50)
py2neo-http average 4.31ms (6.59, 4.39, 4.37, 5.29, 3.89, 3.60, 3.48, 3.92, 3.79, 3.81)
neo4j-driver average 3.09ms (3.98, 3.93, 3.89, 3.93, 3.45, 2.53, 2.29, 2.43, 2.34, 2.10)
Scenario 3, 100 queries in 1 transaction
HTTP requests average 7.74ms (9.36, 7.85, 7.77, 6.69, 6.87, 6.70, 7.16, 7.98, 9.86, 7.18)
HTTP requests dict average 8.46ms (10.77, 10.32, 8.02, 6.97, 6.89, 6.45, 7.37, 10.21, 10.53, 7.08)
HTTP requests naive average 196.5ms (208.14, 201.61, 213.52, 197.20, 183.17, 188.99, 178.00, 193.86, 200.97, 199.24)
py2neo-bolt average 23.41ms (35.94, 24.34, 20.70, 23.06, 21.63, 22.00, 20.55, 19.49, 23.58, 22.81)
py2neo-bolt-naive average 38.09ms (41.16, 39.47, 40.22, 38.37, 33.41, 40.30, 40.21, 36.62, 36.01, 35.10)
py2neo-http average 19.41ms (25.11, 17.31, 17.69, 19.93, 19.69, 17.91, 20.26, 17.98, 20.10, 18.12)
neo4j-driver average 12.96ms (15.12, 12.38, 12.29, 12.74, 13.25, 12.83, 12.33, 12.76, 12.35, 13.55)
Scenario 3, 1k queries in 1 transaction
HTTP requests average 65.89ms (266.10, 52.71, 39.58, 38.81, 53.00, 39.07, 39.89, 53.34, 38.16, 38.20)
HTTP requests dict average 47.20ms (42.49, 50.50, 40.36, 39.03, 53.77, 62.25, 47.82, 43.14, 52.22, 40.40)
HTTP requests naive average 1910.4ms (1934.91, 1887.20, 1867.37, 1874.45, 1945.37, 1945.95, 1912.87, 1899.61, 1894.48, 1941.94)
py2neo-bolt average 257.31ms (301.57, 218.61, 288.41, 250.30, 243.90, 277.40, 259.34, 239.23, 274.16, 220.14)
py2neo-bolt-naive average 389.61ms (427.79, 365.23, 365.64, 432.78, 350.89, 404.32, 371.76, 376.86, 431.84, 368.98)
py2neo-http average 204.95ms (222.85, 219.66, 198.05, 154.01, 222.29, 219.18, 221.86, 208.73, 170.42, 212.47)
neo4j-driver average 140.06ms (144.31, 121.12, 149.71, 149.34, 122.40, 161.14, 121.69, 157.97, 122.00, 150.92)
12-19-2018 03:31 AM
Also, I just noticed that if you're married to py2neo, then it's better to use it with the http scheme if you're firing multiple queries in a single transaction, whereas the bolt scheme is more preferable for retrieving many nodes in 1 query.
12-20-2018 02:32 AM
I have now also tested the neo4jrestclient library (v 2.1.1), and it's pretty similar to the HTTP requests, except for larger queries where it's almost exactly a factor 2 slower.
I've also tested what happens when I add the 'X-Stream: true' header in the HTTP request (as suggested in the developer documentation), but for the scenarios above this doesn't make a difference.
12-20-2018 05:36 AM
X-Stream: true
was only needed for the old (now removed REST API).
The transactional endpoint streams automatically (in and out) (but you should stream-process the results to benefit from that).
neo4jrestclient is no longer maintained though.
And you can use the tx endpoint with multiple transactions.
You can start a tx by posting against db/transaction
then you get back a tx id (url) that you continue to post against db/transaction/<id>
until at the end you finish with /db/transaction/<id>/commit
or /db/transaction/<id>/rollback
12-20-2018 03:00 PM
Hi @jjbankert
Thank you massively for the time and effort you've put into this. There's a lot of interesting data to sift through there and it's extremely valuable for us, and no doubt for other people too.
Firstly, it's worth pointing out that the original Bolt project was never about trying to outperform HTTP. The main focus, at least during the 3.x series, has been to embed a clean, Cypher-like type system throughout the stack (which JSON isn't ideal for) as well as promote usage of Neo4j in a number of significant non-Java ecosystems by providing officially supported drivers. Indeed HTTP is an extremely mature technology with a lot of high-performing implementations; we would have had to go a long way to beat it!
That said, we shouldn't settle for performance differences of the scale that you identify (and that we have seen internally as well). We definitely need to make sure we nail performance properly during the 4.x series of Neo4j.
It's worth noting a couple of things. There are a lot of variables at play that can affect performance; to that end, it's not always clear where to optimise. There are huge differences by client language, by size and shape of result and by the data types used (integers and nodes are significantly different on the wire, for example). There are also differences in how transactions are used and when network sync occurs. And then of course there's routing, which is available for clusters with Enterprise Edition. This brings extra processing complexity, and doesn't have a direct equivalent in any of the available HTTP drivers.
You may have seen our new Seabolt project (https://github.com/neo4j-drivers/seabolt). This is a low-level C Connector library that we've introduced as a high-performance component on which drivers can be built. This already powers our Go driver and we plan to underpin the Python driver with it as well in an upcoming release.
We're also working on plans for future versions of the Bolt protocol itself. We can definitely take on board the extensive information you've provided here as part of those designs.
All that said, it's still probably not unreasonable to assert that there are a multitude of reasons to choose one driver over another. Raw performance can certainly be one of those reasons, but isn't always the bottleneck. Type safety, feature set, usability, maturity, availability of support, documentation, existing skill sets, network policies and licensing are just some of the possible reasons to pick one over the other. Hopefully there's an option for everyone, and over time we can improve the overall experience across the board!
Thanks again
Nigel
12-21-2018 07:09 AM
I'll check-out seabolt!
I agree that there are a multitude of reasons to choose one driver over another. My use-case is to perform big updates on many nodes at once. Batch queries that return 10k-100k nodes and batch updates of 10k-100k statements are pretty standard for me. I was hoping that there would be a faster/better maintained option than my solution with HTTP requests, but my current conclusion is that raw post requests in a single transaction outperform all the drivers that I tested. I'm thinking about open sourcing my driver, but it's only 50 lines of code, so shouldn't be too hard to build for yourself either.
As suggested, I also tested performing scenario 3 (executing 10^x queries) with multiple statements each in their own request, but all together in a single transaction. This is different from the 'HTTP requests naive' result in the sense that I now maintain a single transaction instead of performing each query in a separate transaction. This performs even worse though, because for some reason there are times when it slows down a lot (even when running it 10 times more).
Data:
HTTP requests took 42.5ms (50.12, 35.68, 31.74, 30.64, 32.42, 33.02, 33.19, 68.16, 45.67, 64.58)
HTTP requests naive took 2239.2ms (2305.16, 2330.41, 2262.66, 2224.94, 2233.65, 2204.41, 2224.47, 2223.69, 2176.05, 2206.72)
HTTP requests transaction took 4734.4ms (2293.53, 3181.96, 9448.37, 9627.36, 9914.09, 3024.93, 2967.48, 2230.68, 2457.48, 2197.90)
With regard to the X-Stream header, that explains why I didn't see any difference haha. Maybe the documentation on the HTTP API should be updated then (since that's where I found the suggestion)?
01-08-2019 05:35 AM
Is there somewhere I can see your test code?
01-14-2019 11:33 AM
I performed some similar testing for API Endpoints using a Py2neo Bolt + Lambda/Chalice app in AWS vs using the Barebones HTTP request and found the following results. The request being tested is to pull entire subgraphs of upto 4 hops away from a specific Node Id in the database. The Ids are indexed and most subgraphs are below 1000 nodes. I believe the last graph is most telling in the performance difference for these two API's.
Lambda/Chalice | Native Http | Best Method | |
---|---|---|---|
Average time across all observations | 0.62666 | 0.30027 | Native Http |
Average Time of the sample means (s) | 0.62666 | 0.30027 | Native Http |
Number of users not in graph or with runtime error is | 6871 | 6945 | |
The average cliquesize of the observations was | 60.7148 | 144.287 | Native Http |
The average cliquesize of the sample means was | 60.7148 | 144.287 | Native Http |
The fastest sample mean time was | 0.43533 | 0.21119 | Native Http |
The fastest time was | 0.40276 | 0.19285 | Native Http |
The largest cliquesize was | 40912 | 141456 | |
The number of users with cliquesize 1 is | 38768 | 38949 | |
The percent of users with cliquesize <= 5 is | 0.92956 | 0.92728 | |
The percent of users with cliquesize <= 25 is | 0.97772 | 0.97612 | |
The percent of users with cliquesize <= 50 is | 0.98422 | 0.9831 | |
The percent of users with cliquesize <= 75 is | 0.9872 | 0.98562 | |
The percent of users with cliquesize <= 150 is | 0.98956 | 0.98786 | |
The percent of users with cliquesize <= 300 is | 0.99072 | 0.9894 | |
The percent of users with cliquesize > 3000 is | 0.00374 | 0.00518 | |
The slowest sample mean time was | 1.74987 | 31.6964 | Lambda/Chalice |
The slowest time was | 35.637 | 76.9479 | Lambda/Chalice |
The smallest cliquesize was | 0 | 0 | |
The std.dev cliquesize of the observations was | 1066.65 | 2619.64 | Native Http |
The std.dev cliquesize of the sample means was | 150.148 | 354.536 | Native Http |
The std.dev time across all observations was | 1.36392 | 1.59279 | Lambda/Chalice |
The std.dev time of the sample means was | 0.19972 | 0.99563 | Lambda/Chalice |
Native Http Rest API is significantly faster than Lambda Chalice for accessing the Identity Graph. Despite a more verbose Http call compared to a cleaner designed API such as the lambda Chalice app, if latency is the main concern than using the Native Http Rest Api will be faster. The experiments also indicate that the size of clique's returned have a direct effect on response time, hence reducing hyper nodes further/ limiting graph expansion is important to lower latency.
01-16-2019 08:12 AM
Interesting stuff @benjamin.squire!
I'm in the final stages of open-sourcing my driver (got the go-ahead from management), so expect that soon. First time making 'production' python code, so lots to figure out, and of course there's the problem of coming up with a good name.
@technige somehow I missed your question. I've put the most important code into a single file on pastebin.
The code is written for python 3.6+ (due to f-strings). Also it doesn't contain comments or documentation, but hopefully it has meaningful enough names. This is not 'production' code, but I think it could help. Please also give me feedback on code quality if you notice anything (through pm if it's not useful to a broad audience), as I'm interested in improving!
Test 'framework' (under MIT licence): https://pastebin.com/wP5u4Yk0
01-22-2019 09:43 AM
As promised, our project is now open-source and available: Announcing neo4j-connector 1.0.0 (python 3.5+)
All the sessions of the conference are now available online