Neo4j

jasonyip · ‎12-14-2022

Hi,

I am trying to follow the below example with a different graph. The graph is found at the following location

https://github.com/grandintegrator/gnn-db-blogpost

I am pretty sure I built the right graph:

// Nodes
LOAD CSV WITH HEADERS FROM 'file:///db_suppliers.csv' AS row
MERGE (n:Supplier{SupplierId:row.Supplier});

// Edges
LOAD CSV WITH HEADERs FROM 'file:///gnn_blog_full.csv' AS row
MATCH (a:Supplier{`SupplierId`:row.Purchaser}), (b:Supplier{`SupplierId`:row.Seller})
MERGE (b)-[r:SUPPLIES_TO{Probability:toFloat(row.probability)}]->(a);

However, all the probabilities came out the same:

https://colab.research.google.com/drive/1fscHls4qVOtW5-1pOaNDK9atQXvIm7rP?usp=sharing

Is there anything I did wrong here? Please help.

Thanks!

zach_blumenfeld · ‎12-22-2022

Hi @jasonyip a couple things to check:

1. How do your embeddings look? If there are many embeddings with all zeros it could reduce the quality and create issues with calibration. You can use the stream node properties operation to read all of them into your notebook, or, alternatively, if the graph is really large you can use the `write` execution mode with FastRP instead of `mutate` which will write them back to the database, and then you can sample with Cypher.

2. Having the model train longer could help with calibration. Increase max epochs, increase patience, increase batchSize, decrease tolerance are all things you can try tuning.

Neo4j

Link prediction pipeline output the same probability for all pairs