Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-02-2021 02:32 AM
Can anyone tell me how long I should roughly expect call apoc.meta.graph()
to run on a graph with 500 million nodes and 1.5 billion relationships?
I ran into the bug with call db.schema.visualize()
and apoc.meta.graph()
gave the correct answer before but is taking a while this time.
Thanks!
03-04-2021 04:07 AM
It's probably gonna take ages to return. db.schema.visualize
is using pre-computed data, whereas apoc.meta.graph
is computing it all from scratch. Maybe you can take a look at apoc.meta.graphSample
instead?
03-04-2021 04:35 AM
I tried apoc.meta.graphSample
but similarly to db.schema.visualize
(and as stated in the documentation) it returned extra relationships.
I also played around with apoc.meta.subGraph
a bit which I got to yield a satisfactory result in the end. I'm still a bit confused though where the computational cost is coming from; for many subsets of nodes and relationships the result was instant while including some labels with fairly small sets of nodes/relationships resulted in long runtimes that I stopped after a while.
03-04-2021 04:38 AM
I don't know this code off by heart, but this is the function that it's calling:
that then calls the metaGraph
function:
And actually it doesn't look like it computes everything from scratch like I thought it did. It's kinda hard to say why it would be working better for some labels than others.
03-04-2021 04:58 AM
Thanks for the pointer, I don't really know any Java, though.
It actually only started being slow after I recently added some new labels that about doubled the number of existing nodes. With the already pretty large number of nodes before that it worked instantly and returned the correct result.
03-04-2021 06:07 AM
And on that graph you said apoc.meta.graphSample
returns quickly but has extra relationships?
The only difference between apoc.meta.graphSample
and apoc.meta.graph
is a post processing step where missing relationships are removed (or not) so that's where the time must be spent.
Reading the code of that function I can see that it's doing a scan of all the nodes with each label and then checking all of the relationships for 1 in 1000 of those nodes, which would be time-consuming. You can configure the sampling rate via the sample
key e.g. sample: 10000
would make it sample every 10,000 nodes instead of every 1,000 nodes.
03-10-2021 02:14 AM
Sorry for the delay.
Yes, for the full graph, apoc.meta.graphSample
runs quickly but has extra relationships. I tried running it with different sample sizes but I must have been doing it wrong as there was no difference in both runtime and result. Is call apoc.met.graphSample({sample: 1000})
the correct syntax?
03-10-2021 03:11 AM
Yup, just gotta fix the typo on here:
call apoc.meta.graphSample({sample: 1000})
03-10-2021 06:32 AM
Ah my bad. Still, even if I call it with sample: 1
(which I guess would mean it checks every node), it returns instantly and contains additional relationships.
03-10-2021 06:58 AM
Can you try:
call apoc.meta.graph({sample: 1000})
03-10-2021 07:22 AM
That seems to run slowly irrespective of what I set sample to. I haven't let it run longer than a minute or so, though.
03-10-2021 08:43 AM
I'm playing around with it on a dummy graph with 40m nodes/relationships and I can see different speeds of response when specifying sample
.
03-11-2021 12:42 AM
That's strange, not exactly sure what's going on. Anyways, it's not a pressing issue for me at the moment so I don't want to steal too much of your time. If I can help by providing more info I'd be happy to. Thanks again for your help!
All the sessions of the conference are now available online