Neo4j

mark_needham · ‎07-17-2019

Neo4j Graph Algorithms Release — Memory Requirements, Concurrency Settings, Bug Fixes

Over the last few weeks we’ve released new functionality for the Neo4j Graph Algorithms Library, in versions 3.5.6.1 and 3.5.7.0.

tl;dr

These releases see the introduction of a procedure to compute the memory requirements of the PageRank, Louvain, and Union Find algorithms, as well as the in memory graph. We’ve also added support for different concurrency values for read, write, and compute tasks, and there are a bunch of bug fixes as well.

You can install the latest release directly from the Neo4j Desktop in the ‘Plugins’ section of your project. Jennifer Reif also has a detailed post explaining how to install plugins if you haven’t used any yet.

Installing the Graph Algorithms Library

If you’re installing the library on a server installation of Neo4j you can download the JAR from the Neo4j Download Center.

Neo4j Download Center

Memory Requirements

The graph algorithms library operates completely on the heap, which means we’ll need to configure our Neo4j Server with a much larger heap size than we would for transactional workloads.

When we execute one of the algorithm procedures, an in memory graph representation of the graph is created before the algorithm is executed against that graph. The diagram below shows the steps involved:

Steps involved when executing graph algorithms

The in memory projected graph contains the following pieces of data:

In Memory Graph Model

The library supports two types of in memory graph:

heavy, which can process up to 2³¹ (2 billion) nodes
huge, which can process up to 2⁴⁵ (35 trillion) nodes

The Memory Requirements Procedure returns an estimate of the amount of memory required to run graph algorithms.

This procedure has the following signature:

CALL algo.memrec(label, relationship, algorithm, config)

If we want to find out the amount of memory required to build an in memory graph containing all the nodes and all the relationships using the heavy graph, we could compute this by running the following query:

CALL algo.memrec(null, null, "graph.load", {graph: "heavy"})

If we wanted to compute the memory requirements for the huge graph we can do that as well:

CALL algo.memrec(null, null, "graph.load", {graph: "huge"})

Below we can see the amount of memory needed for Dbpedia and Twitter 2010, two well known benchmark datasets:

Dbpedia Memory Requirements

Twitter 2010 Memory Requirements

We can also compute the memory requirements for three individual algorithms: PageRank, Louvain, and Union Find.

The following query computes the amount of memory needed by the PageRank algorithm:

CALL algo.memrec(null, null, "pageRank", { 
  graph: "heavy", concurrency: 8})
YIELD mapView
UNWIND mapView.components AS component
RETURN component.name AS component, 
       component.memoryUsage AS memoryUsage

Below we see the memory requirements of our benchmark datasets:

Dbpedia PageRank Memory Requirements

Twitter 2010 PageRank Memory Requirements

As mentioned at the beginning of this section, the library stores all its data structures on the heap, so we need to configure our memory settings accordingly.

This can be done via the following configuration parameters in the Neo4j Settings (neo4j.conf) file:

dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=1G

So if we wanted to run PageRank on the Twitter 2010 dataset using the heavy graph, we’d need to make sure that we had a heap size of at least 10GB.

Non existent node labels or relationship types

We’ve also made a usability improvement to all algorithms so that they won’t execute if provided non existent node labels or relationship types.

Running PageRank with a non existent node label

In previous versions the algorithm would have run against the full graph.

Concurrency Settings

Before this release the number of threads to be used by graph algorithms could be controlled by the concurrency config parameter.

This parameter was used for reading data from Neo4j into the in memory projected graph, executing the algorithm itself, and writing results back to Neo4j.

This release adds two new config parameters:

readConcurrency — the number of threads used to read data into the in memory projected graph
writeConcurrency — the number of threads used to write data into Neo4j

CALL algo.pageRank(null, null, {
  writeConcurrency: int,  
  readConcurrency: int,
  concurrency: int
})

If these parameters aren’t specified they will default to the value for concurrency, which itself defaults to the number of cores on the machine on which the algorithm is executed.

Bug Fixes and Improvements

Bug Fixed and Improvements

This release also contains bug fixes for relationship loading for the huge graph, and a null-pointer exception for PageRank when running on a graph loaded with incoming relationships.

We’ve also made performance improvements for node loading on huge graphs, increased histogram accuracy in community detection algorithms, and reduce memory allocation for the Louvain algorithm.

We hope these changes improve your experience with the library, and if you have any questions or suggestions please send us an email to devrel@neo4j.com

Free download: O’Reilly “Graph Algorithms on Apache Spark and Neo4j”

Neo4j Graph Algorithms Release — Memory Requirements, Concurrency Settings, Bug Fixes was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.