Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-20-2020 12:50 PM
Hello,
I am trying to run closeness centrality on 624985 nodes and 54191395 edges with the following query (which worked fine for a smaller instance on 3566981 edges within minutes):
CALL algo.closeness('alias', 'through_citations', {graph:'huge', direction: 'BOTH', write:true, writeProperty:'GraphProperty_closeness.centrality_throughCitations'})
YIELD nodes,loadMillis, computeMillis, writeMillis;
This query is running for more than 2 hours and has not written the graph properties for any nodes. What are the next steps I could take to make sure if it will run ok? Is there a way to make this run with any APOC procedure?
Thanks,
Lavanya
Solved! Go to Solution.
02-24-2020 06:38 AM
You could either:
CALL algo.closeness('MATCH (n:alias) WHERE n.GraphProperty_wcc_throughCitations = [value] RETURN id(n) AS id',
'MATCH (n)-[:through_citations]-(m:alias) where n.GraphProperty_wcc_throughCitations = [value] RETURN id(n) AS source, id(m) AS target',
{graph:'huge', direction: 'BOTH', write:true, writeProperty:'GraphProperty_closeness_centrality_coauthors'})
Closeness centrality is parallelized, but if you want to make the loop over the communities itself parallel, you could use something like apoc.mapParallel2
.
I would probably use a threshold and ignore any components with fewer than (for example) 5 members just to limit the number of communities you inspect (and if they're small the closeness will be low anyways).
02-20-2020 01:02 PM
Update: I tried
CALL algo.memrec('alias', 'through_citations', "algo.closeness", {graph: "huge"}) YIELD nodes, relationships, requiredMemory, bytesMin, bytesMax RETURN nodes, relationships, requiredMemory, bytesMin, bytesMax
to see if I can check any memory requirements for running the algorithm, with no luck:
Error
Neo.ClientError.Procedure.ProcedureCallFailed
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `algo.memrec`: Caused by: java.lang.IllegalArgumentException: The procedure [algo.closeness] does not support memrec or does not exist, the available and supported procedures are {beta.k1coloring, beta.modularityOptimization, beta.wcc, graph.load, labelPropagation, louvain, nodeSimilarity, pageRank, unionFind, wcc}.
02-21-2020 08:08 AM
This is probably not an answer. I am just sharing my experience with running graph algorithms against large graphs:
algo.memrec
is not supported for all algorithms. You can see a list of the supported ones in the error message you got.concurrency
parameter if you have the enterprise version.closeness
separately on every subgraph.02-21-2020 11:37 AM
You can also turn on the debug log and you'll see output/progress report as your algorithm executes.
02-21-2020 01:59 PM
@shan @alicia.frame Thanks for the suggestions.
I now computed the connected components of the graph and want to try to speed up the below query by running the below query for each individual component:
CALL algo.unionFind('alias', 'through_citations', {graph:'huge', seedProperty:'GraphProperty_wcc_throughCitations', write:true, writeProperty:'GraphProperty_wcc_throughCitations'})
YIELD nodes AS Nodes, setCount AS NbrOfComponents, writeProperty AS PropertyName;
Is cypher projection the only way out? I do not think the below code is parallelizing the computations, does it?
CALL algo.closeness('MATCH (n:alias) RETURN id(n) AS id',
'MATCH (n)-[:through_citations]-(m:alias) where n.GraphProperty_wcc_throughCitations == GraphProperty_wcc_throughCitations RETURN id(n) AS source, id(m) AS target', {graph:'huge', direction: 'BOTH', write:true, writeProperty:'GraphProperty_closeness_centrality_coauthors'})
YIELD nodes,loadMillis, computeMillis, writeMillis;
Thanks,
Lavanya
02-24-2020 06:38 AM
You could either:
CALL algo.closeness('MATCH (n:alias) WHERE n.GraphProperty_wcc_throughCitations = [value] RETURN id(n) AS id',
'MATCH (n)-[:through_citations]-(m:alias) where n.GraphProperty_wcc_throughCitations = [value] RETURN id(n) AS source, id(m) AS target',
{graph:'huge', direction: 'BOTH', write:true, writeProperty:'GraphProperty_closeness_centrality_coauthors'})
Closeness centrality is parallelized, but if you want to make the loop over the communities itself parallel, you could use something like apoc.mapParallel2
.
I would probably use a threshold and ignore any components with fewer than (for example) 5 members just to limit the number of communities you inspect (and if they're small the closeness will be low anyways).
02-24-2020 09:03 AM
Here is what I am using:
MATCH (n:alias) WITH DISTINCT n.GraphProperty_wcc_coauthors as value
CALL algo.closeness('MATCH (n:alias) WHERE n.GraphProperty_wcc_coauthors = $value RETURN id(n) AS id',
'MATCH (n)-[:co_authors]-(m:alias) where m.GraphProperty_wcc_coauthors = $value RETURN id(n) AS source, id(m) AS target',
{graph:'cypher', params: {value: value}, write:true, writeProperty:'GraphProperty_closeness_centrality_coauthors'})
YIELD nodes,loadMillis, computeMillis, writeMillis
RETURN nodes,loadMillis, computeMillis, writeMillis
Will incorporate apoc.mapParallel2
soon.
Thanks
02-24-2020 10:15 AM
The above query returned the following error:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `algo.closeness`: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
kindly let me know how I may troubleshoot this.
Best,
Lavanya
02-25-2020 09:23 AM
@shan @alicia.frame @andrew.bowman
Update:
CALL apoc.periodic.iterate(
"MATCH (comp:GraphProperty_wcc_throughTopic) RETURN comp.GraphProperty_component AS component",
"CALL algo.closeness('MATCH (n:alias {GraphProperty_wcc_throughTopic : $component}) RETURN id(n) AS id',
'MATCH (n)-[r:through_topic]-(m:alias) RETURN id(n) AS source, id(m) AS target, r.weight as weight',
{graph:'cypher', params: {component: component}, write:true, writeProperty:'GraphProperty_closeness_centrality_throughTopic'})
YIELD nodes,loadMillis, computeMillis, writeMillis
RETURN nodes,loadMillis, computeMillis, writeMillis", {batchSize:5000, parallel:true})
YIELD batches, total, errorMessages;
I got this above query working for smaller instance. For my bigger instance on 624985 nodes and 54191395 edges broken down to 390639 connected components on which closeness centrality is set to run, the query is running for more than 30 min now. Do I have to switch gears and maybe try apoc.mapParallel2
as @alicia.frame suggested in this thread?
Thanks,
Lavanya
All the sessions of the conference are now available online