Neo4j

rich2 · ‎04-07-2022

Hi all,

Newbie question here about modeling so thanks in advance for any help. I'm building an ontology laying out our company's information and I have what I think is a simple question but I can't convince myself to go one way or the other.

Here is the graph from the GraphGists Fraud Detection article.

My question is around the "HAS_*" relationships. Why is having "HAS_CREDITCARD", "HAS_BANKACCOUNT", and "HAS_ADDRESS" better than simply having a single "HAS" type that points to things like CREDITCARD, BANKACCOUNT, and ADDRESS?

In other words, why the specificity when I can easily tell from the associated node type?

I can certainly understand the need for the additional relationship types if there are properties on the relationship that make it specific to the end node type. For example, if the HAS_BANKACCOUNT has a property on it that identifies the branch or something associated with the link.

But in general, is it better to have more specific or more general relationships if possible?

Thanks for your help,
Rich'

glilienfield · ‎04-07-2022

One reason I can think off is performance. Let’s say you want to know the number of bank accounts a person has. You can write the following query (ignoring the ‘where’ condition to get the specific bank holder):

match(n:AccountHolder)-[r:HAS_BANK_ACCOUNT]->()
return count(r)

If you used only ‘HAS’ relations, you would need to specify an bank account node on the other end as follows:

match(n:AccountHolder)-[r:HAS]-(:BankAccount)
return count(r)

So what is the different? In the first case, only the relation types of the account holder entity need to be interrogated to find the HAS_BANK_ACCOUNT relationships for the count. In the latter case, the endNode of each relationship needs to be interrogated to determine if the node has a label equal to BankAccount. This is more processing and data retrieval.

This is true for any complex match pattern, where the traversal algorithm can determine which relationships of a node to consider by their type, and not have to interrogate the end nodes of each to determine which paths to include when traversing to obtain the matching paths.

It can also make the code more understandable. Let’s assume you want the account holders that have a bank account, you could write the following queries for each scenario:

match(n:AccountHolder)
where exists( (n)-[:HAS_BANK_ACCOUNT]->() )
return n

Instead of

match(n:AccountHolder)
where exists( (n)-[:HAS]->(:BankAccount) )
return n

Of the two, I think performance would be the reason.

View solution in original post

glilienfield · ‎04-07-2022