cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Call apoc.meta.graph() expected runtime

JaHo
Node Clone

Can anyone tell me how long I should roughly expect call apoc.meta.graph() to run on a graph with 500 million nodes and 1.5 billion relationships?

I ran into the bug with call db.schema.visualize() and apoc.meta.graph() gave the correct answer before but is taking a while this time.

Thanks!

12 REPLIES 12

It's probably gonna take ages to return. db.schema.visualize is using pre-computed data, whereas apoc.meta.graph is computing it all from scratch. Maybe you can take a look at apoc.meta.graphSample instead?

I tried apoc.meta.graphSample but similarly to db.schema.visualize (and as stated in the documentation) it returned extra relationships.
I also played around with apoc.meta.subGraph a bit which I got to yield a satisfactory result in the end. I'm still a bit confused though where the computational cost is coming from; for many subsets of nodes and relationships the result was instant while including some labels with fairly small sets of nodes/relationships resulted in long runtimes that I stopped after a while.

I don't know this code off by heart, but this is the function that it's calling:

that then calls the metaGraph function:

And actually it doesn't look like it computes everything from scratch like I thought it did. It's kinda hard to say why it would be working better for some labels than others.

Thanks for the pointer, I don't really know any Java, though.
It actually only started being slow after I recently added some new labels that about doubled the number of existing nodes. With the already pretty large number of nodes before that it worked instantly and returned the correct result.

And on that graph you said apoc.meta.graphSample returns quickly but has extra relationships?

The only difference between apoc.meta.graphSample and apoc.meta.graph is a post processing step where missing relationships are removed (or not) so that's where the time must be spent.

Reading the code of that function I can see that it's doing a scan of all the nodes with each label and then checking all of the relationships for 1 in 1000 of those nodes, which would be time-consuming. You can configure the sampling rate via the sample key e.g. sample: 10000 would make it sample every 10,000 nodes instead of every 1,000 nodes.

Sorry for the delay.
Yes, for the full graph, apoc.meta.graphSample runs quickly but has extra relationships. I tried running it with different sample sizes but I must have been doing it wrong as there was no difference in both runtime and result. Is call apoc.met.graphSample({sample: 1000}) the correct syntax?

Yup, just gotta fix the typo on here:

call apoc.meta.graphSample({sample: 1000})

Ah my bad. Still, even if I call it with sample: 1 (which I guess would mean it checks every node), it returns instantly and contains additional relationships.

Can you try:

call apoc.meta.graph({sample: 1000})

That seems to run slowly irrespective of what I set sample to. I haven't let it run longer than a minute or so, though.

I'm playing around with it on a dummy graph with 40m nodes/relationships and I can see different speeds of response when specifying sample.

That's strange, not exactly sure what's going on. Anyways, it's not a pressing issue for me at the moment so I don't want to steal too much of your time. If I can help by providing more info I'd be happy to. Thanks again for your help!