cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

GDS Concurrency parameter doesn't impact performance

Hi,
I am running Neo Enterprise Edition with GDS Standard Edition on a machine with 16 cores and 122 GB mem.
I'm running DFS algorithm through the GDS and it looks like changing the concurrency parameter in the config seems to have no impact on performance.

I've checked OS things like Core Affinity, but everything looks ok.
Anyone have any ideas?

Thanks,
Naveh.

2 REPLIES 2

Hello @naveh and thanks for your question!

I don't there is a problem with your setup. The problem here is that the DFS algorithm is a single-threaded implementation. It has the concurrency parameter due to the general nature of the GDS API, but it does not respond to it.

To double-check that GDS can actually reach all your threads, I would recommend using an algorithm in the highest tiers of maturity, such as WCC or Label Propagation which definitely should respond to the concurrency parameter (you can read about tiers in the GDS Manual).

Hope this helps!
Mats

Hi, thanks for the detailed answer!

I couldn't find any documentation regarding DFS's single-thread execution. I don't see it in the official docs: Depth First Search - Neo4j Graph Data Science
Can you show me where it's written please?

Anyway, I've gone and checked the performance of Label Propagation algorithm, and received similar results:

I ran it 10 times for each concurrency value, and received pretty much the same execution times.

Concurrency = 1
#1 took 26.43699598312378 seconds
#2 took 29.63265037536621 seconds
#3 took 28.65972876548767 seconds
#4 took 34.18735432624817 seconds
#5 took 36.33429193496704 seconds
#6 took 29.397120475769043 seconds
#7 took 25.903148889541626 seconds
#8 took 26.159001350402832 seconds
#9 took 26.091997146606445 seconds
#10 took 25.92511487007141 seconds
**Concurrency = 1: Average time = 28.87274041175842**

Concurrency = 2
#1 took 26.201982021331787 seconds
#2 took 26.040571928024292 seconds
#3 took 25.8247971534729 seconds
#4 took 25.475574254989624 seconds
#5 took 26.346522569656372 seconds
#6 took 26.008761167526245 seconds
#7 took 26.390212535858154 seconds
#8 took 26.17672896385193 seconds
#9 took 25.59099769592285 seconds
#10 took 26.230574369430542 seconds
**Concurrency = 2: Average time = 26.02867226600647**

Concurrency = 4
#1 took 26.211217641830444 seconds
#2 took 25.623100996017456 seconds
#3 took 25.890212535858154 seconds
#4 took 25.458879709243774 seconds
#5 took 26.076539516448975 seconds
#6 took 25.747878074645996 seconds
#7 took 25.71400260925293 seconds
#8 took 25.68864107131958 seconds
#9 took 25.636645078659058 seconds
#10 took 25.96159338951111 seconds
**Concurrency = 4: Average time = 25.80087106227875**

What am I missing?