Neo4j

gigauser · ‎05-18-2022

Hi, I created a graph with Document nodes and Word(lemmatized) nodes which is contained in the document, so Documents are connected through common Word nodes.

I projected the part of them like this:
call gds.graph.project.cypher(
'documentRootword',
'
match (n) where (n:Document and date("20220416")<=n.datePublished<=date("20220515")) or n:RootWord
return id(n) as id
',
'
match (d)-[:FROM_DOCUMENT]->(r:RootWord) where date("20220416")<=d.datePublished<=date("20220515")
return id(d) as source, id(r) as target
'
)

Then run fastRP like this:
call gds.fastRP.write(
'documentRootword',
{
embeddingDimension: 128,
iterationWeights: [1.0, 0.5, 0.5],
normalizationStrength: 0,
writeProperty: 'embedRootword',
randomSeed: 7
}
)

I know fastRP is designed for homogeneous graph. But I can usually create embeddings in heterogeous graph like this before and get useful similarities in them when I did it with GDS last year.

match (d:Document) where date("20220416")<=d.datePublished<=date("20220515")
return d.embedRootword limit 100
d.embedRootword
[0.049813542515039444, -0.033209025859832764, -0.06641805171966553, 0.03320902958512306, 0.016604511067271233, 3.2323665966060844e-9, 0.09962708503007889, -0.08302256464958191, 0.03320902958512306, 3.2323665966060844e-9, 0.14944063127040863, 3.2323665966060844e-9, 0.033209025859832764, 0.033209025859832764, 0.016604512929916382, -0.14944063127040863, -0.09962708503007889, 0.049813542515039444, -0.049813542515039444, 0.04981353506445885, 0.049813542515039444, -0.08302256464958191, 0.049813542515039444, -0.03320903703570366, -0.11623159050941467, 0.06641805171966553, 0.049813542515039444, 0.09962708503007889, 0.049813542515039444, 3.2323665966060844e-9, 0.11623159050941467, 0.23246318101882935, 0.13283610343933105, 0.06641805171966553, 0.06641805171966553, -0.033209025859832764, -0.016604512929916382, 0.0, -0.08302255719900131, -0.01660451665520668, -0.03320902958512306, -0.19925417006015778, 0.01660451665520668, 0.14944063127040863, -0.09962708503007889, -0.03320902958512306, -0.016604511067271233, -0.16604512929916382, 0.03320903703570366, 0.049813542515039444, -0.03320902958512306, 0.03320902958512306, -0.04981353506445885, 0.0830225721001625, -0.24906770884990692, -0.049813542515039444, 0.016604511067271233, 0.1826496422290802, 0.24906770884990692, 0.0830225721001625, -0.033209025859832764, 0.016604511067271233, -0.016604511067271233, 0.21585866808891296, -0.09962708503007889, -0.049813542515039444, 9.697100011862858e-9, -0.01660451665520668, -0.03320902958512306, -0.08302256464958191, 0.03320902958512306, -0.049813542515039444, 0.03320902958512306, -0.11623158305883408, -0.06641805171966553, 0.11623159050941467, 0.06641805171966553, -0.16604512929916382, 0.09962707757949829, -0.24906770884990692, -0.11623158305883408, -0.11623159050941467, -0.049813542515039444, -0.1826496422290802, 0.049813542515039444, 0.06641805171966553, 0.01660451665520668, 0.11623158305883408, -0.016604511067271233, -0.01660451665520668, 0.03320902958512306, 0.08302255719900131, 0.0, -0.049813542515039444, 0.09962708503007889, -0.049813542515039444, 0.049813542515039444, 0.033209025859832764, 0.016604511067271233, -0.03320903703570366, -3.2323665966060844e-9, 0.016604511067271233, -0.049813542515039444, 0.016604512929916382, -0.03320902958512306, 0.049813542515039444, 0.033209025859832764, -0.09962707757949829, 0.13283610343933105, 0.06641805917024612, 0.0830225721001625, 0.08302256464958191, -0.08302256464958191, -0.03320902958512306, 3.2323665966060844e-9, 0.06641805171966553, -0.14944063127040863, -0.06641805917024612, 0.0, -0.03320903703570366, 0.16604512929916382, -0.11623159050941467, -0.06641805171966553, -0.016604511067271233, -3.2323665966060844e-9, -0.033209025859832764, -0.0830225721001625, -0.08302256464958191]
....

But now with GDS 2.0.3 or 2.0.4, I cannot get the same embeddings in the memory from mutate procedure even though I can get the embeddins by Write procedure:
call gds.fastRP.mutate(
'documentRootword',
{
embeddingDimension: 128,
iterationWeights: [1.0, 0.5, 0.5],
normalizationStrength: 0,
mutateProperty: 'embedRootword',
randomSeed: 7
}
)

call gds.graph.streamNodeProperty(
'documentRootword',
'embedRootword'
)
yield nodeId, propertyValue
return nodeId, propertyValue limit 10
==> It returns all zeros for the values in the embeddings.

nodeId

propertyValue

25

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

27

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

...

Am I doing something wrong?

florentin_dorre · ‎05-31-2022

Hello @gigauser ,
At the first glance you queries look correct.
Do I understand you correctly, that exactly the same workflow works if you use the write mode of FastRP instead of mutate?

Otherwise, my first idea is that your nodes are orphan nodes with no relationships.
To check that, you can use gds.degree.stream.
Also try to use a non-zero nodeSelfInfluence to avoid 0 embeddings for orphan nodes? (Fast Random Projection - Neo4j Graph Data Science)

batkhishig · ‎11-01-2022

hi @gigauser ,

This problem has been bugging me for the past few days and I think I have found the solution for my case.

I have changed the relationship orientations to 'UNDIRECTED' when projecting a graph. This somehow fixed the issue. I guess FastRP prefers 'UNDIRECTED' graphs as it is mentioned in the docs.

Neo4j

gds.fastRP.mutate creates all zeros!?