Neo4j

undefined21 · ‎05-25-2021

Hi, I am quite new with that technology and I would like to extract from my small graph the following session(blue ones) nodes.

This is my query: get the sessions between 2021-05-20 to 2021-05-28. I execute the following query

MATCH (u:Profile { uuid: "xxxxxxxxx" })-[rel:HAS_AUDIT]->(a:Audit)
MATCH 
  (a)-[:HAS_YEAR]->(:Year {year: 2021})-[:HAS_MONTH]->(:Month {month: 5})-[:HAS_DAY]->(:Day {day: 20})-[:ACTIVITY_AT]->(s1:Session),
  (a)-[:HAS_YEAR]->(:Year {year: 2021})-[:HAS_MONTH]->(:Month {month: 5})-[:HAS_DAY]->(:Day {day: 28})-[:ACTIVITY_AT]->(s2:Session),
  (s1)-[:NEXT*]->(d)-[:NEXT*]->(s2)
RETURN d

But because the 20 and 28 do not exist I don't get any nodes back. If I change from 20 to 24 and 28 to 27, the result is the same, no nodes back

Any light why it is happening that?

andrew_bowman · ‎05-25-2021

If no such :Day nodes exist, then this is expected.

We can simplify the query:

WITH range(20, 28) as days
MATCH (u:Profile { uuid: "xxxxxxxxx" })-[rel:HAS_AUDIT]->(a:Audit)
MATCH (a)-[:HAS_YEAR]->(:Year {year: 2021})-[:HAS_MONTH]->(:Month {month: 5})-[:HAS_DAY]->(d:Day)-[:ACTIVITY_AT]->(s:Session)
WHERE d.day IN days
RETURN s

If you need the nodes ordered, then you can do that by whatever datetime property exists on the :Session nodes before returning. If you don't need the sessions present on the start and end days, then filter those out after you match to the sessions, but before you return.

In any case, it would be far simpler to ditch the time tree and use indexed temporal properties, the query becomes much easier.

View solution in original post

andrew_bowman · ‎05-25-2021

If no such :Day nodes exist, then this is expected.

We can simplify the query:

WITH range(20, 28) as days
MATCH (u:Profile { uuid: "xxxxxxxxx" })-[rel:HAS_AUDIT]->(a:Audit)
MATCH (a)-[:HAS_YEAR]->(:Year {year: 2021})-[:HAS_MONTH]->(:Month {month: 5})-[:HAS_DAY]->(d:Day)-[:ACTIVITY_AT]->(s:Session)
WHERE d.day IN days
RETURN s

If you need the nodes ordered, then you can do that by whatever datetime property exists on the :Session nodes before returning. If you don't need the sessions present on the start and end days, then filter those out after you match to the sessions, but before you return.

In any case, it would be far simpler to ditch the time tree and use indexed temporal properties, the query becomes much easier.

undefined21 · ‎05-25-2021

So then the solution might be the redesign of the graph not the execution of the query.

But is it not more expensive to set an index in session property instead of traverse through the nodes as we do now (year => month => day). Also would not be too messy to have all the sessions hanging just from an Audit node and would not we lose one of the properties of the graph databases: the visualisation. All of these are my assumptions which I could be wrong

The questions that I would do against that graph would be:

Get all the users last session
Get user sessions between dates
Get user session events (API requests, GET, POST, DELETE,...)

As you suggested, would this be the new graph?

What do you think?

Thanks for your time!

andrew_bowman · ‎05-26-2021

Ah, I see. I confess I had overlooked that this was specific to a given audit.

It could still work if each :Session node also had the Audit or profile id/uuid, then you could create a composite index. Your lookup would then be able to lookup only sessions in the given date range associated with that audit or profile.

If all :Session nodes are connected to the :Audit, then you could do without the index and just MATCH from audit to connected sessions and filter to those with activityAt within the given date range.

If a visualization of your data including the time tree is important to you, then you can stick with your existing model, the query I provided should work for it.

undefined21 · ‎05-28-2021

I think, I get the point.

Now think that we have millions of sessions, each one with its timestamp (which would be the index) related with its own audit node and another tree but the sessions would be related as in the above design (first post).

Now I would like discuss a bit about performance:

Query: Search sessions between dates

Wouldn't be faster the second option (the design above) because we follow the path or would it be more expensive because we have to check in each node level their property (.year, .month, .day) to reach the desired sessions.
Meanwhile, having the indexes in the session nodes, I suppose, which that indexes are in the memory, we will go straight to that nodes and we can get the desired sessions faster

Thanks for your time!

andrew_bowman · ‎06-02-2021

Presumably the execution time would be bounded for the traversals through the time tree, since there would be a limited number of years to filter, at most a year would have 12 month nodes, at most a month would have 31 day nodes.

I don't know how many activities you're anticipating per day at the high end, though, so you might consider that. And this is a separate time tree per audit, which gives you separation of data between those of other audits, which is good.

Probably the most major impact is the number of nodes you would need to maintain time trees per audit in your graph. Standard format allows 34 billion nodes and 34 billion relationships, so you may need to get some idea of the rate of growth as data is added, and if that approaches the limits, you may need to convert to the high limit format, which is unbounded.

Index efficiency does have a logarithmic relationship to the number of entries in the index, so that's something to keep in mind, as well as consider whether the elements indexed are enough to bullseye the relevant nodes or if additional filtering is needed. Index lookup is going to be easier to use in Cypher, comparatively.

undefined21 · ‎06-03-2021

Thanks andrew, that was a helpful answer.

One of the weak points that I see with time tree design, is that most probably for each user, I would have duplicated the years and months nodes because they will access to the system at least once a month. So in that case, the session approach with the index, it would need less nodes and relationships to track all the session events.

Now looking from performance, it looks again that the index design is the way to go. If it has logarithmic relationship to the number of entries in the index, more sessions that we will have less impact would have in the performance while in the time tree design, we will need to traverse more times to get the desired sessions

Thanks for your time, really appreciate

Neo4j

Neo4j TimeTrees cannot extract the result