Neo4j

lavanya_kannan · ‎06-07-2020

I am wondering if the graph algorithms that are available for the cypher queries can be made "online".

For example, in the algorithm unionFind, when new nodes are added, the weakly connected components for the new nodes can be found by looking at the weakly connected components of the neighbors of the new nodes. If the cypher user is careful, they would collect all the weakly connected component of the neighbors of the new node and assign the same component if for all the neighboring connected component and for the newly added node. For other graph algorithms, it is possible to make the algorithms run online.

In summary, can we have online implementations of the graph algorithms readily available to run on the graph databases? https://en.wikipedia.org/wiki/Online_algorithm, https://www.amazon.com/Online-Algorithms-Lecture-Computer-Science/dp/3540649174. I guess such implementations would need storing and maintaining some additional data containers in the database - but I think it will be a greatly helpful for the users who focus on their usecases that deal with real data that can get updated on the graph every now and then.

Looking forward for your feedback and comments on this topic.

Best,
Lavanya

alicia_frame · ‎06-10-2020

@mark.needham's reply is spot on -- please see the documentation on our algorithms that support seeding (Louvain, WCC, label propagation). These use existing community labels to be used to "seed" a graph with new data, making it faster to assign new nodes to existing communities.

We endeavour to support seeding whenever practical, and it's typically added as a feature of our product supported algorithms.

GDS does not include the Newman-Girvan algorithm.

View solution in original post

mark_needham · ‎06-08-2020

Could this be achieved by using seeded components? https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/#algorithms-wcc-examples-seeding

That could be combined with graph mutation to achieve something close to what you're suggesting I think - https://neo4j.com/docs/graph-data-science/current/common-usage/running-algos/#running-algos-mutate

lavanya_kannan · ‎06-08-2020

Hello Mark,

Thanks for sharing the example. To answer you question on using seeds on nodes to to make the wcc algorithm online : Yes (if re-running the algorithm for the whole graph is feasible), No (if the edges of the original graph will not be available for some reason, and running the algorithm for the whole graph becomes infeasible).

Here is the suggestion to make it run for the later case:
We can re-run the wcc algorithm only for the new nodes and its neighbors like so:
Set the myGraph-seeded only for the new user (Bridget) and her neighbor(s): Matt. This can be done by setting a new label (say) "new_user" to Bridget and Matt. However a complexity arises if Bridget is connected to more than one previous user. Let's say (Bridget) --> (Mark) with weight 2.0. In this case, running the algorithm on ONLY myGraph_seeded will group (Bridget), (Matt) and (Mark) in component 1 and (Michael) and (Doug) still left in component 3 - UNDISERABLE. To fix this issue, one can maintain a component node for each wcc and connect it to all the nodes in it. This approach will work when the myGraph will not be available anymore (there are use-cases for this scenario and I am using this approach for one such).

Additional suggestion: Can we identify algorithms that can be made "online" (where inputs arrive in piece-meals at a time) and those cannot be made "online" (where even seeding and preserving the already existing graph would not be helpful to run the algorithm only on new nodes and some neighbors of certain depth)? For example, wcc can be made online - as we just saw above; while I am not sure about lpa, Louvain and Newman-Girvan (that is newly available in Data Science library!). Btw, triangle centrality can be made "online", but other centrality measures will be very complex to compute online - and only computing some heuristics for them would be feasible.

Thanks,
Lavanya

alicia_frame · ‎06-10-2020

@mark.needham's reply is spot on -- please see the documentation on our algorithms that support seeding (Louvain, WCC, label propagation). These use existing community labels to be used to "seed" a graph with new data, making it faster to assign new nodes to existing communities.

We endeavour to support seeding whenever practical, and it's typically added as a feature of our product supported algorithms.

GDS does not include the Newman-Girvan algorithm.

Neo4j

Online implementations of graph algorithms