Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-02-2022 04:29 AM - edited 06-02-2022 11:47 AM
Hi,
I'm trying to develop a multi-threaded version of Dijkstra Shortest-Path Algorithm for multiple source-target pairs using GDS Api. I used ParallelUtil class to run tasks in parallel. It seems to run correctly on tests. However, in my real graph which has about 1 million records, it runs very slowly. If I run the algorithm without multithreading with 5 source-target pairs it finishes in a reasonable time. With multi-threading it throws Transaction Closed error.
What I do in my code is simply to create a task for each pair and in each task, implement the Dijkstra's algorithm. So in each task, a copy of the graph is stored as well as huge long queues for algorithm's calculations. Maybe these cause a memory overhead and that's because the algorithm runs slower(even never finishes) than non-parallel version.
Do you have any ideas? Thanks.
06-02-2022 12:00 PM
Can you share your actual code? That would probably help a lot?
06-02-2022 12:45 PM - edited 06-03-2022 12:59 AM
Here is my code : https://github.com/cuneyttyler/semanticspace-neo4j-gds/blob/master/algorithms/src/main/java/com/sema...
I simply followed the procedure in BetwennessCentrality.java to do parallelism.
EDIT: So I realised a few things and corrected my code. However, now a weird thing happens. It seems to work in normal speed but never finishes calculating because it never reaches the target node even if it should have. The weird thing is that, when I put a log line in line 220 with '% 100' to log in every 100 counts of processed relations, it converges in the right time - it manages to process the target node so it finishes. But when I increase this log condition to '% 10000', it somehow skips the target node and cannot find a path. So it's clearly related to multi-threading. That's why I put a syncronized block on queue, though it didn't work. Also I'm trying with one source-target pair so only one additional thread is running. I'm really confused here. What can cause the algorithm to skip some relations when working too fast(not printing anything that causes it to slow down)?
06-03-2022 04:11 AM
I found the issue : I defined local concurrent copy of graph as type 'Graph', although it wasn't like that in the other GDS core methods. I wanted to try something, but when I changed it to RelationshipIterator = graph.concurrentCopy(), now it works as desired and now we have a nice concurrent multiple pairs Dijkstra procedure.
You can access it here : https://github.com/cuneyttyler/semanticspace-neo4j-gds/
Procedure
I wonder why using Graph instance creates such an issue though.
06-03-2022 01:59 PM
I solved the issue, here is the answer https://community.neo4j.com/t5/neo4j-graph-platform/algorithm-developed-using-gds-core-api-gives-dif...
All the sessions of the conference are now available online