Neo4j

mojo2go · ‎01-31-2020

I need some help with a model.

I want to model walking trails (or paths) in Neo4j where trails can cross one another at points. So where the trails overlap the points are just shared by both. And in fact a 'run' or sequence of points can be shared between trails, as shown with the blue trail and yellow trail below. Each trail would have a name, and the user should be able to know distance to the next point of the trail she is following (the trail she started on). This model is meant to be used by a mobile app that tracks the mobile user's progress along a trail, and not accidentally switch trails, and to show opportunities to explore a different trail.

Also, a user should be able to add or record her own trails using existing points, and adding one or more new points. There are thousands of trails, and that number will be grow as users create their own. These all need to coexist in the graph.

Below is a white-board style drawing of how two trails might intersect, but the drawing does not necessarily represent how I would model it in Neo4j

What is the best way to model this?

mike_r_black · ‎02-03-2020

I think I'd model it exactly how you've white-boarded it. Each node would be the distinct points along route. I'd take advantage of the latitude & longitude data types so you can use distance formulas. The relationships are building a linked list of nodes that define a trail. I'd keep the meat of the information of the trail in the node (lat & long, etc...) and keep the attributes inside the relationship light. This would allow you to, if you need to introduce a new third trail that split off, you just insert a node with it's lat & long into the linked list and branch off from that node.

You could apply secondary labels to the nodes that participate in a named trail. Nodes can have many labels, or you just apply a secondary label to the starting & ending nodes and let the match query return all the nodes in between the two points. Or you can introduce a different node type that contained the trail name, description, etc... and it could point the beginning and ending nodes. Every node doesn't have to maintain the same attributes so if you can add extra information on the starting node or make a new type of node for the trail description, that comes down to personal preference and how your coding layer handles the data model.

mojo2go · ‎02-16-2020

Thanks @mike.r.black. That first model does basically follow what normally works well for Neo4j, namely making the nodes nouns and the relationships verbs. So in this case it would have been something like (a:Location)-IS_FOLLOWED_BY ->(b:Location). But the overlapping paths really caused problems. Just like me walking a real trail, when it overlaps with another trail there's a chance I might follow the wrong path.

So I'm trying something I never have...flipping the paradigm and making the relationships the noun, and the nodes just connect them. Take a look at the image below. It's just a different take on the one above. But now all yellow relationships share the same relationship id, so it's not a unique id, but it unique to a trail. And now that we have the Lucene index option we can index these relationships.

Blue and yellow paths cross and share part of their paths, but internally they do not share the segments.

Locations are a ‘natural resource’ even in the graph. They are not duplicated, they are owned by nobody, and shared by all. Finding a trailhead, or trailend would just be pattern matches based where the start node has no incoming relationship carrying the trail id, and similarly the trail end has no outgoing relationship carrying that id.

Opinion(s) welcomed. I will try it out and maybe report back.

D_C · ‎07-05-2021

Can you just add a unique constraint for the trail on each of the nodes and edges? That way you get to keep the individual data points in the graph, but you can just do queries that pick out the waypoints if that's all you want.

inverting nodes to be the links seems a bit counter-intuitive?

I'm trying a similar thing to model user flows where multiple 'sessions' go through the same pages. The question is if I should aggregate in advance, and just put totals on each page and path into neo4j, or keep every single session and page view in the DB separately. not clear yet how much value the 2nd option gives despite exploding the data requirements and complicating queries.

mojo2go · ‎07-06-2021

Hi David,

It's a cool analogy and totally works; hikers walking from landmark to landmark vs users surfing between web pages.

I can elabortate more if this doesn't help, but in my example above when you search for and return a relationship you also get both of its nodes. So the uniqueness need not be with the nodes...rather you use the uniqueness of the relationships to access them: relationships AND their nodes. In the diagram above searching for either the blue trail or the yellow trail will return that middle series of connected nodes where the trails overlap--node2, node3, node4--along with the rest of the nodes of the yellow or blue trail.

But to directly answer your question, yes you can. And you have a couple of choices. A unique trail name would be best implemented as a label, and a node can have as many lables as you wish to give it. So node2 could be defined as CREATE (n:TrailBlue:TrailYellow {name: node2}). That would allow you to use the same node for different trails. But then the next question is would you want to store additional trail information? Since it's a graph you have infinite choices...and extra node hanging off of the trailhead node solely to hold metadata about that trail, or you could create a key:value pair of data for each trail a give node is part of.

I personally would resist aggregating in advance. It's cheap for Neo4j to run down the appropriate path and sum up the numbers for you on the fly. Exploding the model is often what allows you to expand what you use your graph for.

You're domain is a bit more complicated than mine; a user will likely hit the back button and pass over a web page multiple times in a session, and in that scenario you would probably want to know what the sequence was. The node labels alone won't tell you that. If you use the relationship TYPEs like I did above, as the unique trail (session) identifier and then put a timestamp on each directed relationship as a property, then you can be sure you capture even the most gnarly websurfing sessions.

D_C · ‎07-06-2021

interesting idea using multiple labels to indicate the trail / session. I wonder how that would scale? I saw this thread that talked about the pros/cons of using route labels vs intermediate nodes

It just seems the name of the label is a 'key' whereas your trails are more like the 'value', so using labels in the way you're proposing seems like it would run out of steam, to say nothing about how indexes work?

For my project, with cypher I'm having trouble just getting back info on arbitrary sized paths with a different number of nodes in-between, as per:

If you're going with a node-first approach I'd be interested to see your queries to read that stuff back (and what driver you're using).

mojo2go · ‎07-06-2021

You make a good point regarding scaling. I don't know the whole of what you want to achieve but if you are recording visitor sessions then you're likely going to be storing data from millions of visits. That's probably not exactly perfect for labels. You could index a property that is the name of a particular visit. And for that matter you may want to use a combination of label and property to identify different visits if using a label can provide more context.

Regarding "node first approach"..... I assume it means that every time a user registers a page 'hit' during a visit it generates a new node for that hit. For me, it doesn't really take advantage of the connectedness of a graph.

Maybe you are looking for something that is not more than what a web log reporting tool does. You could do this entirely without relationships. Would it work to simply generate one node per visit, and in that node you hold an array that records all the pages visited by the visitory (in sequence)? And you can store any other properties on that node such as user_id, start-time, end-time. Its treating the 'hit' node as a document as in a document store. Of course there's plenty of opportunity for using the graph features, like, all sessions could be connected to their user node.

D_C · ‎07-06-2021

for my application I'm working more on conversation analysis, for chatbots, so that's a sequence of user utterances, which are classified as intents, which navigate through different nodes or pages.

so the intents map very well onto edges.

by "node first" - I have considered using nodes where there were edges, in order to add extra properties and hopefully get the queries to work better.

mojo2go · ‎07-06-2021

I see. So you are doing NLP.
You may not need to make such a dramatic change as replacing your edges with nodes. If you are happy with your model but simply want to store more information about an edge, you can add properties to the edge. Or you can connect an additional node to existing nodes to give you a 'place' to put additional data.
Often we want to see a pattern match as a single sequence of nodes and edges, but Neo4j is not limited to one dimension. You can design a pattern that has nodes hanging off of your main sequences (like branches, or perhaps little buds).

DL100 · ‎07-07-2021

Hello, sorry if I'm breaking etiquette here but I have a question related to the earlier posts.
Can you suggest a sample query to return the different paths, say if you chose a specific starting point?
I'm trying to make something which would also have distinct paths like this, but this is new to me and have so far not found much to go on.

mojo2go · ‎07-07-2021

Okay DL100, run the two code snippets below to 1.) generate an example graph that has multiple paths and 2.) find those paths given only a starting point and following the direction of the relationships. And this is really just one way to model that thought. If you have further questions I suggest starting a new thread. I figure this might be useful to Dcollier too.

generate a sample graph

CREATE (start:Loc {name:"start"})-[:TO]->(x:Loc {name:"x"})-[:TO]->(w:Loc {name:"w"})-[:TO]->(end:Loc {name:"end"})
CREATE (w)-[:TO]->(m:Loc {name:"m"})
CREATE (x)-[:TO]->(j:Loc {name:"j"})
MERGE (m)-[:TO]->(x)
MERGE (j)-[:TO]->(end)

find all the paths

MATCH path = (:Loc {name:'start'})-[:TO*]->(leaf)
WHERE NOT (leaf)-->()
RETURN nodes(path)

view sequences of nodes

You might want to check the documentation here: Patterns - Neo4j Cypher Manual

D_C · ‎07-07-2021

thanks for posting, I'm just learning cypher and find these snippets very helpful! A cypher cookbook would be a good thing. maybe github copilot can help

DL100 · ‎07-08-2021

Thanks for the prompt response, mojo2go. What you have posted is clear and makes sense to me, but I'm not sure how I'd implement it on a larger scale. If I go down this route, I'll maybe make a new thread to ask about that like you suggest. Thanks again

Neo4j

Modelling trails that share point locations and segments