cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Concurrency while developing algorithms with Graph Data Science API

cuneyttyler
Ninja
Ninja

Hi,

I'm  trying to develop a multi-threaded version of Dijkstra Shortest-Path Algorithm for multiple source-target pairs using GDS Api. I used ParallelUtil class to run tasks in parallel. It seems to run correctly on tests. However, in my real graph which has about 1 million records, it runs very slowly. If I run the algorithm without multithreading with 5 source-target pairs it finishes in a reasonable time. With multi-threading it throws Transaction Closed error.

What I do in my code is simply to create a task for each pair and in each task, implement the Dijkstra's algorithm. So in each task, a copy of the graph is stored as well as huge long queues for algorithm's calculations. Maybe these cause a memory overhead and that's because the algorithm runs slower(even never finishes) than non-parallel version.

Do you have any ideas? Thanks.

4 REPLIES 4

Can you share your actual code? That would probably help a lot?

cuneyttyler
Ninja
Ninja

Here is my code : https://github.com/cuneyttyler/semanticspace-neo4j-gds/blob/master/algorithms/src/main/java/com/sema...

I simply followed the procedure in BetwennessCentrality.java to do parallelism.

EDIT: So I realised a few things and corrected my code. However, now a weird thing happens. It seems to work in normal speed but never finishes calculating because it never reaches the target node even if it should have. The weird thing is that, when I put a log line in line 220 with '% 100' to log in every 100 counts of processed relations, it converges in the right time - it manages to process the target node so it finishes. But when I increase this log condition to '% 10000', it somehow skips the target node and cannot find a path. So it's clearly related to multi-threading. That's why I put a syncronized block on queue, though it didn't work. Also I'm trying with one source-target pair so only one additional thread is running. I'm really confused here. What can cause the algorithm to skip some relations when working too fast(not printing anything that causes it to slow down)?

I found the issue : I defined local concurrent copy of graph as type 'Graph', although it wasn't like that in the other GDS core methods. I wanted to try something, but when I changed it to RelationshipIterator = graph.concurrentCopy(), now it works as desired and now we have a nice concurrent multiple pairs Dijkstra procedure.

You can access it here : https://github.com/cuneyttyler/semanticspace-neo4j-gds/ 

Procedure 

semanticspace.gds.dijkstraMultiplePairs.stream - parameters sourceNodes(long list), targetNodes(long list)

I wonder why using Graph instance creates such an issue though.