Neo4j

fabio · ‎08-28-2022

TLDR: Is there a way to create a node (or to better say an hyperedge) only if it doesn't already exists but exactly with the specified series of relationships?

The long version is this one: suppose you have the following graph model, in which you describe different attacks. For each one of them you relate also the consequences of that particular attack (i.e. one "Attack" can have multiple "Consequence" node). Each consequence is also related to particular aspects such as security properties or impacts.

1)

So, to create this particular example I would do something like (suppose that the "Attack" node already exists):

MATCH (a:Attack {name:"Attack1"})
MERGE (i:Impact {name:"Execute unauthorized code"})
MERGE (p1:Property {name:"Confidentiality"})
MERGE (p2:Property {name:"Availability"})
MERGE (c:Consequence)-[:HAS_IMPACT]->(i)
MERGE (c)-[:AFFECTS]->(p1)
MERGE (c)-[:AFFECTS]->(p2)
MERGE (a)-[:HAS_CONSEQUENCE]->(c)

Now, suppose you want to add a second attack like this one:

2)

If I run the following Cypher code I don't get the expected result:

MATCH (a:Attack {name:"Attack2"})
MERGE (i:Impact {name:"Execute unauthorized code"})
MERGE (p:Property {name:"Confidentiality"})
MERGE (c:Consequence)-[:HAS_IMPACT]->(i)
MERGE (c)-[:AFFECTS]->(p)
MERGE (a)-[:HAS_CONSEQUENCE]->(c)

Basically due to the fact that the consequence of the "Attack2" is a subset of the "Attack1" I get this graph:

3)

To solve this problem I could use "CREATE" instead of "MERGE" when creating the "Consequence" node, but even this solution is not the perfect one, since it will create every time new nodes even if the right one already exists (with "right one" I mean the one that have the exact properties/impacts already related).

So, I was wondering if there's a specific approach to this kind of problem/situation.

glilienfield · ‎08-28-2022

Are you stating that you want to use an existing 'Consequence' node if it has the same set of 'impact' and 'property' nodes as the new 'Attack' node will have? In your example, Attack1 is related to all three nodes, while Attack2 is related to only two node, thus resulting in a new Consequence node?

fabio · ‎08-28-2022

Yeah, that's exactly my goal!

glilienfield · ‎08-28-2022

That is complicated. As an input to the query, can you provide a list of the 'impact' and 'property' nodes and their properties that can be used to determine if they exists? Will there be additional properties to persist with these nodes outside the ones used for matching? If so, they need to be differentiated.

fabio · ‎08-29-2022

At the time of creation I can easily provide a list with all the properties of the "impact" and "property" nodes related to a particular "consequence". A different story is when I will be querying the graph, since I will only have the "attack" name (that is unique) and I would like to retrieve those nodes.

The only property that I didn't mentioned before and that might be presents is a "description" of the "consequence" node. I said might because it's not always provided and sadly that's most of the cases.

glilienfield · ‎08-29-2022

I thought you wanted help on the query to build the grand , given a new Attack node. The query to get the consequence and other nodes for a given attack is straight forward. Is this what you were referring to?

What information is needed to identify the impact and property nodes? Is it just the label and the name, or are there more properties to match. Performance will degrade with more properties to match on?

Will the impact and property nodes exist or do they need to be created if they don’t exist?

If you have the description of the Consequence node and all had descriptions, would that be enough to know you had one that could be used or need to create a new one? Or, do you always need to compare the Impact and Property nodes?

fabio · ‎08-29-2022

Yeah, you're right, the help is in regarding the query to build the nodes. I mentioned also the part for retrieving them because I was mislead by the term "query" in your last post, my bad 😅 (and I totally agree with you on the fact that retrieving them is quite easy given an "attack" node).

What information is needed to identify the impact and property nodes? Is it just the label and the name, or are there more properties to match. Performance will degrade with more properties to match on?

I've just the label and the name for those nodes, so no additional properties (sadly).

Will the impact and property nodes exist or do they need to be created if they don’t exist?

Yeah, if they don't exists they need to be created.

If you have the description of the Consequence node and all had descriptions, would that be enough to know you had one that could be used or need to create a new one? Or, do you always need to compare the Impact and Property nodes?

No, I don't feel like that the "description" will be enough to decide that. I think the second way is better, especially cause the "description" is only provided for a handful of attacks

glilienfield · ‎08-29-2022

So, we will have something like this to define the query inputs:

{
  "attack": "attack1",
  "nodes": [
    {
      "label": "Impact",
      "name": "name1"
    },
    {
      "label": "Property",
      "name": "name2"
    },
    {
      "label": "Property",
      "name": "name3"
    }
  ]
}

fabio · ‎08-29-2022

Yeah, that's sounds correct!

Neo4j

[Cypher] Creating an hyperedge with exact series of relationships