Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-21-2022 06:57 AM
Hi, I've just started with a prestudy project for a data lineage use case.
Right now I'm using a rather simple graph of Data- and Process-Objects having impact on each other (the graph model consists of a node "Object" with one self-reference "has_impact_on").
There are only 1,548 Object-nodes and 3,352 has_impact_on-references in the graph (although with some cycles), but my queries don't terminate, e.g.:
MATCH p=(n)-[*]->()
where n.Obj_MEMBER = 'CT_MAPPING'
RETURN p
MATCH p=(n)-[*]->(m)
where n.Obj_MEMBER = 'CT_MAPPING'
RETURN m.Obj_TYPE_CAT, m.Obj_MEMBER, length(p), [x in nodes(p)|x.Obj_ID]
Could the cycles be the problem? Are there better alternatives, or is it because I'm currently testing on a free aura instance?
Thanks a lot for your help.
Best regards, Thomas
11-21-2022 07:42 AM
given the usage of
where n.Obj_MEMBER = 'CT_MAPPING'
this would lend itself to indexing Obj_MEMBER. But to do so w would need to know the label this property is associated with. And for example if this property `Obj_MEMBER` and if asssociated with a label named 'Label101` then create an index as
create index :Label101(Obj_Member);
``and then also change the query to reference this label and thus
MATCH p=(n:Label101)-[*]->()
where n.Obj_MEMBER = 'CT_MAPPING'
RETURN p
your current cypher is going to walk the entire graph and expand all paths.
11-22-2022 03:25 AM - edited 11-23-2022 09:01 AM
Thanks a lot, the Nodes' label is "Object", so I've created an Index using this statement:
CREATE INDEX IdxObjectMember
FOR (n:Object)
ON (n.Obj_MEMBER)
, then I slightly modified my query by specifying n, following your recommendation (I hope):
MATCH p=(n:Object)-[*]->()
where n.Obj_MEMBER = 'CT_MAPPING'
RETURN p LIMIT 1000
May I ask another Noob-Question, because I'm a little bit confused:
As far as I've understood, neo4j uses simple pointer operations for traversing the Graph-Relationships.
Why should an index on Nodes improve the execution, if - as in my case - a unique starting point for the traversal is provided (here: Object.Object_MEMBER = 'CT_MAPPING'?
Thanks a lot for your help and understanding 🙂
11-22-2022 05:15 AM
@harmonie_shorts the original
MATCH p=(n)-[*]->()
where n.Obj_MEMBER = 'CT_MAPPING'
RETURN p
effectively says `match p=(n) .....` i.e. look at every node. If your graph has 100 million nodes but only 20k are with label `:Object` then we look at all 100 million nodes. Whereas a `match p=(n:Object)....` says only look at the 20k nodes with a label of `:Object'.
Further if you create a index on `:Object', and as described in prior posts above, then we use the index to find which nodes have said property/value per index. And since this index is in some b-tree like form if you have 20k `:Object' nodes even traversing to the bottom leaf of the 20k index could take a significantly small number of comparisons (i.e with 1 comparison you can eliminate 10k entries, 2 comparisons you can eliminate an additional 5k, 3 comparisons, another 2.5k etc)
11-22-2022 05:19 AM - edited 11-22-2022 08:02 AM
hmm... but I've only got 1,550 Object-Nodes and 3,350 relationships?
Here's the simple Graph Model with query and Index, it's been running for 4,5 hours now.
What's wrong?:
11-22-2022 11:45 PM - edited 11-22-2022 11:46 PM
Here's an explain query:
11-22-2022 05:10 AM
PS: it didn't quite work as expected, either my way is still incorrect, or something else is missing:
This cypher still hasn't terminated after more than 1 hour:
All the sessions of the conference are now available online