cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Is the graphSage epochLosses wrong in the documentation?

lingvisa
Graph Fellow

For this training procedure:

CALL gds.beta.graphSage.train(
  'persons',
  {
    modelName: 'exampleTrainModel',
    featureProperties: ['age', 'heightAndWeight'],
    aggregator: 'mean',
    activationFunction: 'sigmoid',
    sampleSizes: [25, 10]
  }
) YIELD modelInfo as info
RETURN
  info.name as modelName,
  info.metrics.didConverge as didConverge,
  info.metrics.ranEpochs as ranEpochs,
  info.metrics.epochLosses as epochLosses

The results shown are below:

didConverge ranEpochs epochLosses
yes                 1                [186.0494816886275, 186.04946806237382]

For only one epoch, how can the losses be two in the list? Shouldn't only be one loss?

Also, I am using GDS 1.7 and Neo4j 4.3.4, I copied the same training code from the tutorial, but got this result:

didConverge ranEpochs epochLosses
false               1                 [186.0494681481392]

So only one loss for one epoch, which makes more sense. But 'didConverge' is false instead of 'true'. I also tried all other examples in the graphSage examples, the 'didConverge' is always 'false', but the embedding numbers look the same as in the example. Also, in my own dataset, the 'didConverge' is always 'false' for different tryout of hyperparameters. Based on this info, the implementation of didConverge may be not quite right?

Is there a real dataset of good size that shows the implementation of graphSage is largely right according to the original paper?

1 REPLY 1

lingvisa
Graph Fellow

One experiment on my dataset:

CALL gds.beta.graphSage.train(
               'nodeGraph',
               {
                modelName: 'graphSageModel',
                aggregator:'mean',
                batchSize:32,
                activationFunction:'relu',
                epochs:10,
                searchDepth:3,
                sampleSizes:[5,10],
                learningRate:0.1,
                embeddingDimension:128,
                featureProperties:['degree', 'pageRank'],
                projectedFeatureDimension: 2,
                randomSeed: 46,
                concurrency: 4
              }
            )
            YIELD modelInfo as info
            RETURN
              info.modelName as modelName,
              info.metrics.didConverge as didConverge,
              info.metrics.ranEpochs as ranEpochs,
              info.metrics.epochLosses as epochLosses

The modelinfo is:
{'modelName': 'graphSageModel', 'didConverge': True, 'ranEpochs': 3, 'epochLosses': [849.3584970289646, 849.5691272174374, 849.5691272174374]}
The epoch setting is 10, but it only ran 3 epochs. Is this because of Early Stopping? Although the loss doesn't decrease, but the 'didConverge' is True. Are these two indicators consistent?

Now, I change 'relu' to sigmoid in hyperparameters above (nothing else changed). The model info becomes now:
{'modelName': 'graphSageModel', 'didConverge': False, 'ranEpochs': 10, 'epochLosses': [662.4599686179197, 570.060768166288, 564.1066471036239, 556.3447809177796, 558.2486011867961, 533.2741633777479, 514.223316328711, 511.2540127347001, 508.9211852755711, 507.7682415264585]}

These shows that it ran 10 epoches and the loss are decreasing, and the didConverge is False. This seems to make more sense.