cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

neo4j loses connections when performing large-scale data graph depth/breadth searches

1.Summary

I have 30 million nodes, 80 million edge relationships.

There are 5 categories of node label classification, which are not uniform, and there are more than 20 million categories in one category.

More than 80 million sides, mainly divided into two categories, one is more than 80 million sides, and the other is 10,000 sides.

The neo4j version is community version 4.4.6, the operating system is ubuntu 20.04, and the memory is 128G.

database size 33G, dbms.memory.pagecache.size=81g, dbms.memory.heap.max_size=32g, dbms.memory.heap.initial_size=32g.

Now when I want to traverse the whole graph, when using apoc.path.expandConfig or gds.dfs.stream to implement depth-first traversal or breadth-first traversal, the query always fails to return results for a long time until the neo4j link is lost.

内存空间.png

2.The structure of database indexes

The index is added to the property of Compound (labels), in string format.

3.Cypher

MATCH(p:Compound{property:"value1"})
 
MATCH (joe:Compound{property:"value2"})
 
CALL gds.shortestPath.yens.stream("cfGraph", 
{sourceNode: source,targetNode: target,k: 3,relationshipWeightProperty: "score"}) 
 
YIELD index,totalCost,path 
 
RETURN index,totalCost,length(path),relationships(path) as 
 
relationships,nodes(path) as nodes ORDER BY index
MATCH(p:Compound{property:"value1"})
 
MATCH (joe:Compound{property:"value2"})
 
CALL apoc.path.expandConfig(start,
 
{labelFilter: "*",
 
minLevel: 1,
 
Uniqueness:"NODE_GLOBAL",
 
limit:10,
 
maxLevel: 4,
 
bfs: true,
 
endNodes: [end]}) 
 
YIELD path 
 
RETURN path,length(path),relationships(path) as relationships
MATCH(p:Compound{property:"value1"})
 
MATCH (joe:Compound{property:"value2"})
 
CALL gds.dfs.stream('cfGraph', {
 
sourceNode: p,
 
targetNodes: joe,
 
maxDepth:4
 
})
 
YIELD path
 
RETURN path

Q:

The problem now is that when executing some examples, even if there are only 4 or 3 steps, the query may not be found until neo4j loses the connection.

Some examples involving small sets of quantities or steps 2 are fine.

4. Errors and warnings in logs

2022-11-16 13:46:15.163+0000 ERROR [o.n.b.t.p.ProtocolHandshaker] Fatal error occurred during protocol handshaking: [id: 0x0867cdc7, L:/172.17.0.2:7687 - R:/172.17.0.1:41420]

java.lang.NullPointerException: null

 

2022-11-16 13:41:34.179+0000 ERROR [o.n.b.t.p.HouseKeeper] Fatal error occurred when handling a client connection: [id: 0x7968cefc, L:/172.17.0.2:7687 ! R:/172.17.0.1:40356]

org.neo4j.bolt.runtime.BoltConnectionFatality: Terminated connection '[id: 0x7968cefc, L:/172.17.0.2:7687 ! R:/172.17.0.1:40356]' as the server failed to handle an authentication request within 30000 ms.

 

2022-11-16 13:47:50.842+0000 WARN  [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=95348, gcTime=95524, gcCount=2}

Partial configuration:

#********************************************************************
#
# Memory settings are specified kilobytes with the 'k' suffix, megabytes with
# 'm' and gigabytes with 'g'.
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.

# Java Heap Size: by default the Java heap size is dynamically calculated based
# on available system resources. Uncomment these lines to set specific initial
# and maximum heap size.
dbms.memory.heap.initial_size=32g
dbms.memory.heap.max_size=32g

# The amount of memory to use for mapping the store files.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the Java heap size.
dbms.memory.pagecache.size=81g

# Limit the amount of memory that all of the running transaction can consume.
# By default there is no limit.
#dbms.memory.transaction.global_max_size=1g

# Limit the amount of memory that a single transaction can consume.
# By default there is no limit.
dbms.memory.transaction.max_size= 3g

# Transaction state location. It is recommended to use ON_HEAP.
dbms.tx_state.memory_allocation=ON_HEAP

#********************************************************************
# JVM Parameters
#********************************************************************

# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC

# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow

# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch

# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields

# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
# dbms.jvm.additional=-XX:+DisableExplicitGC

#Increase maximum number of nested calls that can be inlined from 9 (default) to 15
dbms.jvm.additional=-XX:MaxInlineLevel=15

# Disable biased locking
dbms.jvm.additional=-XX:-UseBiasedLocking

# Restrict size of cached JDK buffers to 256 KB
dbms.jvm.additional=-Djdk.nio.maxCachedBufferSize=262144

# More efficient buffer allocation in Netty by allowing direct no cleaner buffers.
dbms.jvm.additional=-Dio.netty.tryReflectionSetAccessible=true

# Exits JVM on the first occurrence of an out-of-memory error. Its preferable to restart VM in case of out of memory errors.
dbms.jvm.additional=-XX:+ExitOnOutOfMemoryError

# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
# This mitigates a DDoS vector.
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true

# Enable remote debugging
#dbms.jvm.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005

# This filter prevents deserialization of arbitrary objects via java object serialization, addressing potential vulnerabilities.
# By default this filter whitelists all neo4j classes, as well as classes from the hazelcast library and the java standard library.
# These defaults should only be modified by expert users!
# For more details (including filter syntax) see: https://openjdk.java.net/jeps/290
#dbms.jvm.additional=-Djdk.serialFilter=java.**;org.neo4j.**;com.neo4j.**;com.hazelcast.**;net.sf.ehcache.Element;com.sun.proxy.*;org.openjdk.jmh.**;!*

# Increase the default flight recorder stack sampling depth from 64 to 256, to avoid truncating frames when profiling.
dbms.jvm.additional=-XX:FlightRecorderOptions=stackdepth=256

# Allow profilers to sample between safepoints. Without this, sampling profilers may produce less accurate results.
dbms.jvm.additional=-XX:+UnlockDiagnosticVMOptions
dbms.jvm.additional=-XX:+DebugNonSafepoints

# Disable logging JMX endpoint.
dbms.jvm.additional=-Dlog4j2.disable.jmx=true

# Limit JVM metaspace and code cache to allow garbage collection. Used by cypher for code generation and may grow indefinitely unless constrained.
# Useful for memory constrained environments
# dbms.jvm.additional=-XX:MaxMetaspaceSize=64g
dbms.jvm.additional=-XX:ReservedCodeCacheSize=512m

#********************************************************************

5.Recommended configuration

./neo4j-admin memrec
# Assuming the system is dedicated to running Neo4j and has 125.4GiB of memory,
# we recommend a heap size of around 31500m, and a page cache of around 80800m,
# and that about 16200m is left for the operating system, and the native memory
# needed by Lucene and Netty.
#
# Tip: If the indexing storage use is high, e.g. there are many indexes or most
# data indexed, then it might advantageous to leave more memory for the
# operating system.
#
# Tip: Depending on the workload type you may want to increase the amount
# of off-heap memory available for storing transaction state.
# For instance, in case of large write-intensive transactions
# increasing it can lower GC overhead and thus improve performance.
# On the other hand, if vast majority of transactions are small or read-only
# then you can decrease it and increase page cache instead.
#
# Tip: The more concurrent transactions your workload has and the more updates
# they do, the more heap memory you will need. However, don't allocate more
# than 31g of heap, since this will disable pointer compression, also known as
# "compressed oops", in the JVM and make less effective use of the heap.
#
# Tip: Setting the initial and the max heap size to the same value means the
# JVM will never need to change the heap size. Changing the heap size otherwise
# involves a full GC, which is desirable to avoid.
#
# Based on the above, the following memory settings are recommended:
dbms.memory.heap.initial_size=31500m
dbms.memory.heap.max_size=31500m
dbms.memory.pagecache.size=80800m
#
# It is also recommended turning out-of-memory errors into full crashes,
# instead of allowing a partially crashed database to continue running:
dbms.jvm.additional=-XX:+ExitOnOutOfMemoryError
#
# The numbers below have been derived based on your current databases located at: '/var/lib/neo4j/data/databases'.
# They can be used as an input into more detailed memory analysis.
# Total size of data and native indexes in all databases: 29800m

6. Main problems and appeals

1. Can Neo4j support the depth and breadth traversal of large graphs? If the neo4j community cannot support the depth and breadth traversal search of large graphs, what is the upper limit of the number of nodes and edges?

2. My grammar in Cypher can still be improved or the configuration can be improved.

3. Or some nodes in my graph have a particularly large connection degree, and they are caught in non-stop queries.

4. Or whether Neo4j's apoc.path.expandConfig can cycle through a certain layer, there are a total of 400 nodes to cycle, and only 10 are randomly selected.

5. If it is not possible to search such a large graph, can there be some configurations that I am not clear about to cancel the search of this transaction in a short time without causing neo4j to crash.

6. Or increase memory can improve this situation! ! !

7. The main difference between gds.dfs.stream and apoc.path.expandConfig

7 REPLIES 7

I think it's better to switch it around.

Use 33 G pagecache and 90G heap, can you check the debug.log in /var/log/neo4j/debug.log for any errors, I presumes it's out of memory.

@michael_hunger Hi, this is an another account from me.

I set the configration according to the recommended configuration. Are you sure I need to set 90G for heap,33G pachecache, rather than the opposite. Do I need to give the more memory?

When I run some example, the query keep calculating a long long long time.And I get nothing, Why?

Is there some way I can truncate the timeout request by setting?

I see the logs

2022-11-28 06:53:51.969+0000 WARN  [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=697, gcTime=817, gcCount=1}
2022-11-28 06:53:55.409+0000 WARN  [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1838, gcTime=1982, gcCount=1}
2022-11-28 08:48:39.404+0000 WARN  [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1440, gcTime=1554, gcCount=1}

After a while, I came back from dinner to find that neo4j lost the connection and

2022-11-28 09:59:01.994+0000 ERROR [o.n.b.t.p.ProtocolHandshaker] Fatal error occurred during protocol handshaking: [id: 0xba3c7fba, L:/172.17.0.2:7687 - R:/172.17.0.1:40024]
java.lang.NullPointerException: null

2022-11-28 10:16:11.773+0000 ERROR [o.n.b.t.p.ProtocolHandshaker] Fatal error occurred during protocol handshaking: [id: 0x5bc5520b, L:/172.17.0.2:7687 ! R:/172.17.0.1:43372]
org.neo4j.bolt.runtime.BoltConnectionFatality: Terminated connection '[id: 0x5bc5520b, L:/172.17.0.2:7687 ! R:/172.17.0.1:43372]' as the client failed to authenticate within 30000 ms.

bennu_neo
Neo4j
Neo4j

Hi @Chevy_Xu ,

Considering your labelFilter, you may be placing on heap your entire db (and more). Do you have any condition on the relationships to be traversed that my narrow down the level of info that needs to be tracked?

Oh, y’all wanted a twist, ey?

@bennu_neo Hi, this is an another account from me.

Yes, I have narrowed the scope according to the label's classification settings, but I still need to perform graph search on large batches of data sets, thank you

Hi @Madeline_Han ,

So you have a version of the query that works between two specific points? Can you share it?

Oh, y’all wanted a twist, ey?

MATCH(p:Compound{property:"value1"})
 
MATCH (joe:Compound{property:"value2"})
 
CALL apoc.path.expandConfig(start,
 
{labelFilter: "My_Compound",

relationshipFilter: "Known_reaction>",

minLevel: 1,
 
Uniqueness:"NODE_GLOBAL",
 
limit:10,
 
maxLevel: 4,
 
bfs: true,
 
endNodes: [end]}) 
 
YIELD path 
 
RETURN path,length(path),relationships(path) as relationships

Works when the data is very small. My_Compound is 10k, Known_reaction is 10K. So what are the limits on the number of nodes and relationships?