Neo4j

MeghanaS · ‎07-19-2022

Hi,

Kindly clarify the following:
1) Does projecting all the nodes and relationships of a graph also project their properties?

2) Is it not possible to use <string> type properties as one of the features while training a graph algorithm?

3) I have a KG from an ecommerce website. I have user node, session node, orders node and their relationships etc. I am trying to use link prediction algorithm to predict the link between an order node and a recipient node. Is there any useful resource that I could refer?

I am a beginner in neo4j GDS. Any help is appreciated. Thanks in advance!

Cobra · ‎07-19-2022

Hello @MeghanaS 😊

No, you must specify the properties you want to project: example.
No, but you can use One-Hot Encoding function to use string values in Machine Learning algorithms.
You can start with the Product recommendation engine using FastRP and kNN example.

In addition, I advise you to do these free courses before starting:

Regards,
Cobra

MeghanaS · ‎07-19-2022

Thanks @Cobra for the reply,
3. I went through the example actually and was able to understand quite a bit. The issue I am not able to relate it to my use case which is link pred. Main problem is how to split the graph for training and testing. i.e., I have a particular relationship and this relationship is missing between few nodes. I need my train set to have a graph with the relation and test set to be without the relation. So that the model is able to predict missing relations of the test graph. There are only a few uses which are very simple.

Cobra · ‎07-19-2022

I think you have two options:

If you want to use Link Prediction algorithms of GDS, you don't need to split your graph into train/test/validation.
If you want to use Machine Learning, you should have a look here where everything is explained.

Regards,
Cobra

MeghanaS · ‎07-19-2022

1. This makes sense. But here it is used only for a pair of similar nodes. My use case is to predict relation between nodes but of different type. Let me define my problem statement first.
I have a graph something like this.
(:session)-[:contains]->(:order), (:customer)-[:has]->(:session),(:order)-[:has]->(:product), (:order)-[:to]->(:relation)
There are many customers who have placed orders. Some of the orders specify to whom the order was intended to (relation) i.e., mother/father etc. and some orders do not. For these orders my intention is to predict to whom the order was likely intended to. Any suggestions?

Cobra · ‎07-19-2022

I guess you can start without a machine learning pipeline, last time I worked with a recommendation engine I used Adamic Adar to recommend sessions of formation. Here are some articles if you need more explanation:

You can compute the score between orders which will give you a score then create the relation between the order and the relation with the score in it. The issue with this method is that you will have the same score for everyone. For heterogeneous graph, I think you will have to use machine learning pipeline.

Regards,
Cobra

MeghanaS · ‎07-19-2022

Yes. I have a heterogenous graph and need to use a pipeline.

Cobra · ‎07-19-2022

Then, if you follow this example, it should help you solve your use case.

MeghanaS · ‎07-19-2022

Yes correct. But again 2 issues here .
1) I want to the train set to have only positive samples i.e., graph containing the relation between order & relation. Test set to have only negative samples. i.e., graph not containing the relation between order & relation.
The way we do in classic ML and DL. It should be able to learn from the train and predict them in test. But here I see both train and test will have negative samples.
2) I do not want the model to predict missing links between every other node rather it should be able to predict between only order node and relation node
How do I achieve this?

Cobra · ‎07-20-2022

You should have a look at this setting.

MeghanaS · ‎07-20-2022

I did an estimate before training, and the mem available is less than required.

Cobra · ‎07-20-2022

You need to increase the the power allocated to the database and you should estimate the procedure before starting it.

MeghanaS · ‎07-20-2022

I have deleted the unused nodes/relations in the projected graph. Lets see😊

MeghanaS · ‎07-22-2022

So, I was able to train the model and the model is now ready for predictions.
There are 2 ways of prediction: Exhaustive search, Approximate search. The first one predicts for all unconnected nodes and the second one applies KNN to predict. I do not want both; rather I want the model to predict the link only between 2 specific nodes 'order' node and 'relation' node.. Is there any way to achieve this?

Cobra · ‎07-22-2022

With the stream method, you can add a WHERE clause after the YIELD to only return the score between the nodes you are interested in.

MeghanaS · ‎07-22-2022

Okay.. Is it not possible to make the model predict only for specified nodes before hand?

Also,

Below is an example of exhaustive search prediction given in the doc. Here relationshipType means that the graph is filtered on KNOWS relation. There would be no disconnected nodes for prediction in such case am I right? Please correct me if I am wrong.
CALL gds.beta.pipeline.linkPrediction.predict.mutate('myGraph',
{ modelName: 'lp-pipeline-model',
relationshipTypes: ['KNOWS'],
mutateRelationshipType: 'KNOWS_EXHAUSTIVE_PREDICTED',
topN: 5, threshold: 0.45 })
YIELD relationshipsWritten, samplingStats

Cobra · ‎07-22-2022

I know link prediction algorithms can predict between two nodes but I don't know for machine learning pipeline.

Yeah, according to the documentation: relationshipTypes means: Filter the named graph using the given relationship types.

Regards,
Cobra

MeghanaS · ‎07-25-2022

Hi again,
How do I query the relationships from a projected graph? i.e., I have a few relationships predicted from my LP model and I want to see between how many nodes is this new relation created.

Cobra · ‎07-25-2022

You can use this but it is in alpha tiers.

MeghanaS · ‎08-02-2022

Hi,
I resumed the work today and am able to stream my predicted relationships and their probabilities also. Below is the code

CALL gds.graph.streamRelationshipProperty(

'mygraph',

'predictied_probablity_score',

['predicted_relationship_name']

)

YIELD

sourceNodeId, targetNodeId, relationshipType, propertyValue

RETURN

gds.util.asNode(sourceNodeId).name as source, gds.util.asNode(targetNodeId).name as target, relationshipType, propertyValue

ORDER BY source ASC, target ASC

The issue here is now, I am getting all the source and target nodes as NULL.

Cobra · ‎08-02-2022

Hello @MeghanaS 😊

Do you have a name property on your source and target node?

Regards,
Cobra

MeghanaS · ‎08-03-2022

Yes, The mistake was something else. Thank you 🙂

MeghanaS · ‎08-04-2022

Is gds.alpha.ml.oneHotEncoding() not present in Neo4j 4.4.2 with gds 2.1.4? I am getting the below error. I have installed all gds.* procedures and can find other alpha procedures but this.

There is no procedure with the name `gds.alpha.ml.oneHotEncoding` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

Cobra · ‎08-04-2022

That's weird, GDS version 2.1 has this procedure. Can you share your query?

MeghanaS · ‎08-04-2022

I just did a call , `call gds.alpha.ml.oneHotEncoding()` without any parameters. It should have thrown me parameter missing error but instead it says procedure not found.

Cobra · ‎08-04-2022

Remove the CALL keyword front of the function:

Example:

RETURN gds.alpha.ml.oneHotEncoding(['Chinese', 'Indian', 'Italian'], ['Italian']) AS embedding

MeghanaS · ‎08-04-2022

yes, working now. Thanks

MeghanaS · ‎08-04-2022

How do I add existing Node properties in the projection to the ML pipeline? The

gds.beta.pipeline.linkPrediction.addNodeProperty procedure creates node embeddings as new node properties to the pipeline but how do I add existing ones? Getting the below error even though the properties are projected in the memory graph.

Failed to invoke procedure `gds.beta.pipeline.linkPrediction.train`: Caused by: java.lang.IllegalArgumentException: Node properties [property1, property2, property3] defined in the feature steps do not exist in the graph or part of the pipeline

Cobra · ‎08-04-2022

With the gds.beta.pipeline.linkPrediction.addNodeProperty() procedure?

MeghanaS · ‎08-04-2022

I have used this to create a new node property. What I want is to add existing node property from my projected graph to the pipeline

Cobra · ‎08-04-2022

And gds.beta.pipeline.linkPrediction.addFeature() function? Otherwise I don't know.

MeghanaS · ‎08-08-2022

Okay thank you

Neo4j

Graph projections