Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-17-2019 05:00 PM
Over the last few weeks we’ve released new functionality for the Neo4j Graph Algorithms Library, in versions 3.5.6.1 and 3.5.7.0.
These releases see the introduction of a procedure to compute the memory requirements of the PageRank, Louvain, and Union Find algorithms, as well as the in memory graph. We’ve also added support for different concurrency values for read, write, and compute tasks, and there are a bunch of bug fixes as well.
You can install the latest release directly from the Neo4j Desktop in the ‘Plugins’ section of your project. Jennifer Reif also has a detailed post explaining how to install plugins if you haven’t used any yet.
If you’re installing the library on a server installation of Neo4j you can download the JAR from the Neo4j Download Center.
The graph algorithms library operates completely on the heap, which means we’ll need to configure our Neo4j Server with a much larger heap size than we would for transactional workloads.
When we execute one of the algorithm procedures, an in memory graph representation of the graph is created before the algorithm is executed against that graph. The diagram below shows the steps involved:
The in memory projected graph contains the following pieces of data:
The library supports two types of in memory graph:
The Memory Requirements Procedure returns an estimate of the amount of memory required to run graph algorithms.
This procedure has the following signature:
CALL algo.memrec(label, relationship, algorithm, config)
If we want to find out the amount of memory required to build an in memory graph containing all the nodes and all the relationships using the heavy graph, we could compute this by running the following query:
CALL algo.memrec(null, null, "graph.load", {graph: "heavy"})
If we wanted to compute the memory requirements for the huge graph we can do that as well:
CALL algo.memrec(null, null, "graph.load", {graph: "huge"})
Below we can see the amount of memory needed for Dbpedia and Twitter 2010, two well known benchmark datasets:
Twitter 2010 Memory Requirements
We can also compute the memory requirements for three individual algorithms: PageRank, Louvain, and Union Find.
The following query computes the amount of memory needed by the PageRank algorithm:
CALL algo.memrec(null, null, "pageRank", {
graph: "heavy", concurrency: 8})
YIELD mapView
UNWIND mapView.components AS component
RETURN component.name AS component,
component.memoryUsage AS memoryUsage
Below we see the memory requirements of our benchmark datasets:
Dbpedia PageRank Memory Requirements
As mentioned at the beginning of this section, the library stores all its data structures on the heap, so we need to configure our memory settings accordingly.
This can be done via the following configuration parameters in the Neo4j Settings (neo4j.conf) file:
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=1G
So if we wanted to run PageRank on the Twitter 2010 dataset using the heavy graph, we’d need to make sure that we had a heap size of at least 10GB.
We’ve also made a usability improvement to all algorithms so that they won’t execute if provided non existent node labels or relationship types.
In previous versions the algorithm would have run against the full graph.
Before this release the number of threads to be used by graph algorithms could be controlled by the concurrency config parameter.
This parameter was used for reading data from Neo4j into the in memory projected graph, executing the algorithm itself, and writing results back to Neo4j.
This release adds two new config parameters:
CALL algo.pageRank(null, null, {
writeConcurrency: int,
readConcurrency: int,
concurrency: int
})
If these parameters aren’t specified they will default to the value for concurrency, which itself defaults to the number of cores on the machine on which the algorithm is executed.
Bug Fixed and Improvements
This release also contains bug fixes for relationship loading for the huge graph, and a null-pointer exception for PageRank when running on a graph loaded with incoming relationships.
We’ve also made performance improvements for node loading on huge graphs, increased histogram accuracy in community detection algorithms, and reduce memory allocation for the Louvain algorithm.
We hope these changes improve your experience with the library, and if you have any questions or suggestions please send us an email to devrel@neo4j.com
Free download: O’Reilly “Graph Algorithms on Apache Spark and Neo4j”
Neo4j Graph Algorithms Release — Memory Requirements, Concurrency Settings, Bug Fixes was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
All the sessions of the conference are now available online