cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Slowness in graph retrieval

Hi,
I am using the Neo4j community edition for my POC and I have configured Neo4J in HA. There are two high-end servers one of which is acting as master and another one as slave. I have a data simulator that is pumping data continuously to Kafka and through Kafka consumer, we are storing data in batch and inserting into Neo4J.

We have more than 2 Billion of nodes and kind of relationships. It is like, (card:Card)-[r:Transaction]->(terminal:Terminal). I have read the official documentation of Neo4J and it states that it is good for fraud detection. I am actually trying out the CPP (compromised point of purchase).

I have some properties for Transaction relation such as txndatetime, isFraud, location, etc. Now, from my graph database, i.e, out of those 2 Billion nodes, I am trying to find out 200 cards (these 200 card numbers are my input to the cypher query) and their relationships.

Like, let's say there are 2 cards c1 and c2 as my input to the cypher Query. In DB, I have relations such as:
c1-[Transaction]->(t1) [isFraud = false],
c1-[Transaction]->(t3) [isFraud = true],
c1-[Transaction]->(t11) [isFraud = true],
c1-[Transaction]->(t100) [isFraud = false],
c2-[Transaction]->(t10) [isFraud = false],
c2-[Transaction]->(t100) [isFraud = false],
c2-[Transaction]->(t150) [isFraud = true],
c2-[Transaction]->(t500) [isFraud = true],
.....
and so on ( other cards relations exists )which comprises of 2 Billion nodes.

Now given the input c1 and c2... I want to retrieve those relations for which card number is c1 and c2 and for which isFraud = false.
Here is my cypher.

WITH {batch_list} AS batch UNWIND batch AS row
MATCH (card)-[transact:Transaction]->(anotherTerminal)
WHERE card.Cardno = row[0] 
AND transact.isFraud = false 
WITH card, transact, anotherTerminal RETURN *

The above query is taking more than 12 hours or more if I pass 200 card numbers in batch_list.. So graph display is not possible.. still, if I get the data I can prepare reports out of it.
Note that, I was in the impression that the results will come out within 3 to 5 minutes and I can display in graph.

Kindly let me know where I am doing wrong. Would really appreciate any help.

1 ACCEPTED SOLUTION

MuddyBootsCode
Graph Steward

Hello welcome to the community. Some basic steps to making sure your query is performant would be to ensure that you have things like indexes in your schema etc. The issue with your query I see immediately is that you're searching for a pattern every single time before you're using the card number. I think if you have your indexes set up correctly i.e. your Card Number is a unique key you'd probably have a much faster query doing something like:

WITH {batch_list} AS batch UNWIND batch AS row
Match (c:card {Cardno: row[0]})-[transact:Transaction]->(anotherTerminal)
WHERE transact.isFraud
RETURN c, transact, anotherTerminal

This avoids the pattern matching behavior. You can also get good information in the docs https://neo4j.com/docs/cypher-manual/current/query-tuning/.

View solution in original post

2 REPLIES 2

MuddyBootsCode
Graph Steward

Hello welcome to the community. Some basic steps to making sure your query is performant would be to ensure that you have things like indexes in your schema etc. The issue with your query I see immediately is that you're searching for a pattern every single time before you're using the card number. I think if you have your indexes set up correctly i.e. your Card Number is a unique key you'd probably have a much faster query doing something like:

WITH {batch_list} AS batch UNWIND batch AS row
Match (c:card {Cardno: row[0]})-[transact:Transaction]->(anotherTerminal)
WHERE transact.isFraud
RETURN c, transact, anotherTerminal

This avoids the pattern matching behavior. You can also get good information in the docs https://neo4j.com/docs/cypher-manual/current/query-tuning/.

Thanks Michael. Your suggestion really helps in optimising the query and improving the performance.