Neo4j

gigauser · ‎07-05-2021

Hi, I am trying to test almost all GDS to select the best algorithm for every about 50 various target tasks of a customer like finding some activity patterns on internet.

Some tasks can be accomplished by the several algorithms separately with the different result or time to perform it.
Is there any comparison or report like that already tested on the same data with the different GDS algorithm to compare its quality and performance? Those kind of information would be very helpful even though I must test almost all GDS anyway to build application for each task.

For example I found NodeSimilarity makes very good result quickly to compare thousands set of some news content but it cannot be used to compare hundreds thousands of sentences since it takes forever on my best test machine(Ryzen 5950x 32 threads/128GB RAM) if I am not doing wrong.

I am sorry I cannot open more specific detail of the tasks since it is very confidential project.

alicia_frame1 · ‎07-05-2021

If you're looking for run time estimates, you can check out our configuration guide, which includes run times for certain algorithms on a specified graph (LDBC100, ~300M relationships, 1B nodes) and provides the hardware we used to generate the benchmarks. It also provides some guidance on optimizing performance. In general, though, you want to set concurrency as high as possible (EE has unlimited concurrency), and make use of parameters like degreeCutoff topK and topN when available.

"Quality" is a much more nuanced metric - it's going to depend strongly on the data sets you're running an algorithm on, and the problem at hand. Usually we recommend tuning your algo call on a subset of the data to make sure that your parameter combination is giving you sensible results, before running over the full dataset.

View solution in original post

alicia_frame1 · ‎07-05-2021

If you're looking for run time estimates, you can check out our configuration guide, which includes run times for certain algorithms on a specified graph (LDBC100, ~300M relationships, 1B nodes) and provides the hardware we used to generate the benchmarks. It also provides some guidance on optimizing performance. In general, though, you want to set concurrency as high as possible (EE has unlimited concurrency), and make use of parameters like degreeCutoff topK and topN when available.

"Quality" is a much more nuanced metric - it's going to depend strongly on the data sets you're running an algorithm on, and the problem at hand. Usually we recommend tuning your algo call on a subset of the data to make sure that your parameter combination is giving you sensible results, before running over the full dataset.

gigauser · ‎07-05-2021

Thank you for replying to my question. Those information in the guide is what I searched now.

Thanks again, Alicia.

Neo4j

I am searching a performance comparison for all GDS algorithms