cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Classification of nodes and normalization of path lengths

chris23
Node Clone

I am relatively new to Neo4j and graph databases. Maybe someone could help and steer me in the right direction.

In general, we need a multi-label classification of nodes according to certain criteria/rules for creating a normalized reasoning mechanism between node classes. Between classified nodes there will be edges with weights.

Example:

Node A has class/label A’ and A’’, node B has only class B’.

In our knowledge domain

  • A’ -> links to, weight =0,8 -> B’

  • A’’ -> links to, weight =0,4 -> B’

Our issues/questions

  • How can we normalize path lengths or manual weights for our reasoning approach? Normalize = [0;1]

  • Maybe we could also employ a kind of cluster similarity between classes. But for each class we would need to compute the similarity between all classes. We guess that’s slow!

Maybe someone who had to do a similar thing can give me some thoughts or point me to some resources. That would be very much appreciated.

 

9 REPLIES 9

You can have multiple labels on a node. For your case, maybe you have ‘A’ and ‘B’ labels (and an other classification). You can then mark any of them with a ‘Prime’ or ‘DoublePrime’ labels.  Your match for an A’ node would be match(n:A:Prime). 

what do you want to normalize? 

At the moment we were thinking about having a node for each class and connecting the classified nodes via a relationship or multiple ones if they get classified as more than one. This way we could give the relationships a weight property. Currently we are trying to figure out how we can best classify nodes and how to do the weighting. 

The weighting is also what we want to normalize, because if we want to compare different taxonomies we need to normalize the weights. Because we want to say the longer the path length the more specific something is. But what if in one taxonomy the max path length is 3 and in another one it is 5, it would be hard to compare them. 

I think you will have to perform the normalization at the time when you want to compare multiple taxonomies, or extract metrics from a taxonomy.  If not, every time you modify a taxonomy you will have to renormalize every weight if the max length of the taxonomy changed. This would not be hard using a custom procedure.

Are there any functions built in for that matter into neo4j ?

And regarding the classification, I am looking into using gdsl and training a model, do you think that this is a viable approach ?

Sorry, I don’t have any experience with gdsl.  You have a reference. 

chris23
Node Clone

As it seems there is a small problem with the rendering on windows machines, so i just wanted to clarify that if there are appearing some weird symbols in the first two bullet points that they should be an arrow pointing to the right.    ( - > )

chris23
Node Clone

Maybe these 2 illustrations are helpful:

GeneralGraph.jpgWeightsGraph.jpg

Assuming you have the classification nodes and weights, what is it you want to do with the above graph?

So basically the general graph regarding the task is the one with the round nodes. So what it should do based on the classification is some kind of sophisticated recommendation of products for the user based on personal characteristics. So for example we classify the user as overweight, which would be the general type, based on the weight and height. And on the other side we have our products which have to get classified as well, primarily based on their description and keywords, so that products regarding overweight get classified as the same general class. Then we bring these two general classes together. So we can recommend those products. The weighting should determine the order in which the products get recommended. A person can be associated with more than on general class and general product classes are of course associated with many products. So we need some kind of order besides one based on the time factor (whats the newest thing we know about the person).

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online