Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-11-2020 05:34 AM
Hello,
I am using Neo4J community version 3.2.2. I am trying to identify synthetic identity fraud with the following data. I have nodes with 4 major labels - :Person, :Email, :Phone, :Identifier. Every :Person node also has either :Fraud or :Non-Fraud label. There are in all 150 million person nodes out of which 1.1 million nodes are fraud nodes. The data is confidential. I need to find the number of neighbour nodes for every :Person node. I run the following query in cypher-shell and pass its output to a csv.
profile MATCH (n:Person) return n.fid, size((n)-[]-());
fid is an id property of :Person node. It runs like a shell process endlessly. However, if I run this similar query on the smaller label :Fraud, it runs in 15 secs.
profile MATCH (n:Fraud) return n.fid, size((n)-[]-());
If we see the profile of this query, we will see that this query is linear in scale if db hits and rows are considered. Thus, it is expected that the query on :Person label should take about 2000 secs which is less than an hour. Thus, it seems the problem is due to the size of the output.
Max and initial heap size are set to 30 GB and pagecache size is left commented which I assume would be 50% RAM minus max heap size(0.5*500 - 30 = 220GB). Also, this is not a dedicated neo4J server. Please correct me if I am wrong anywhere and help me with this problem.
Thanks and Regards,
Kevin Kunnapilly
Solved! Go to Solution.
04-11-2020 06:03 AM
My suspicion is that your query is using the compiled
runtime. This implementation is pretty fast but does materialize the results - something you want to avoid when returning 150M rows.
Therefore you need to force a different runtime implementation. On enterprise edition you would choose slotted
. On community there's no slotted
, so interpreted
is the next best choice.
To do so, prefix your statement:
cypher runtime=interpreted MATCH (n:Person) return n.fid, size((n)-[]-());
I guess this will perform much better - looking forward to hear your feedback.
On a different notice: 3.2.2 is pretty much outdated, please consider a upgrade.
04-11-2020 06:03 AM
My suspicion is that your query is using the compiled
runtime. This implementation is pretty fast but does materialize the results - something you want to avoid when returning 150M rows.
Therefore you need to force a different runtime implementation. On enterprise edition you would choose slotted
. On community there's no slotted
, so interpreted
is the next best choice.
To do so, prefix your statement:
cypher runtime=interpreted MATCH (n:Person) return n.fid, size((n)-[]-());
I guess this will perform much better - looking forward to hear your feedback.
On a different notice: 3.2.2 is pretty much outdated, please consider a upgrade.
04-12-2020 11:55 AM
Hi Stefan,
Thank you for your speedy reply. It worked wonderfully and the query ran in under 1100 secs as expected.
Sorry for the extra trouble but I have two more questions-
Thanks and Regards,
Kevin Kunnapilly
04-12-2020 12:19 PM
The main difference between compiled and interpreted runtime is that the first processes the query serverside and collects all the results into in-memory data structures. Once finished, results are streamed to the client. Interpreted runtime can stream directly.
Note that things have changed in more recent version. Therefore retry your statement on a up-to-date release without prefixing a runtime implementation.
If you use a runtime that streams directly the memory overhead of that query will be close to 0.
04-12-2020 12:27 PM
Hi Stefan,
I have switched to 3.5.17 as of now. 4.0 wasn't possible on my current server due to outdated Java. Will that work?
Also please do advise me on server side optimisations if any or RAM advisable for my current situation(Around 0.5 to 1.5 billion nodes or about 50 to 1000 times more db hits in a query).
Thanks,
Kevin
04-12-2020 12:54 PM
Compiled runtime has been removed in 4.0, see https://neo4j.com/docs/cypher-manual/current/deprecations-additions-removals-compatibility/ - so you'll still have it in 3.5.
Precise sizing estimations is normally the result of a workshop. You can read up on https://neo4j.com/docs/operations-manual/3.5/performance/ for the basics on this.
All the sessions of the conference are now available online