Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-21-2021 05:54 AM
Hello,
I am trying to run a node classification on a fraud dataset.
The relevant properties are splitted between two nodes: Customer [age, gender, fastRP) and Transaction [amount, fraud]. If I run the code, I get this error:
The feature properties ['age_group', 'amount', 'fastrp_embedding', 'gender_group'] are not present for all requested labels. Requested labels: ['Customer', 'Transaction']. Properties available on all requested labels: ['']
CALL gds.alpha.ml.nodeClassification.train('fraud_model_data', {
nodeLabels: ['Transaction','Customer'],
modelName: 'fraud-model-properties',
featureProperties: ['age_group', 'fastrp_embedding', 'gender_group','amount'],
targetProperty: 'fraud',
metrics: ['F1_WEIGHTED','ACCURACY'],
holdoutFraction: 0.2,
validationFolds: 5,
randomSeed: 2,
params: [
{penalty: 0.0625, maxIterations: 1000},
{penalty: 0.125, maxIterations: 1000},
{penalty: 0.25, maxIterations: 1000},
{penalty: 0.5, maxIterations: 1000},
]
}) YIELD modelInfo
If I only select a single nodeLabel ('Transaction' or 'Customer'), I am able to see the properties of the selected node but not the properties from the other node.
This is the code to create the in-memory graph:
CALL gds.graph.create(
'fraud_model_data', {
Customer: {
label: 'Customer',
properties: {
fastrp_embedding:{property:'fastRPExtended-embedding', defaultValue:0},
gender_group:{property:'gender_group', defaultValue:0},
age_group:{property:'age_group', defaultValue:0}
}
},
Transaction: {
label: 'Transaction',
properties: {
fraud:{property:'fraud', defaultValue:0},
amount:{property:'amount', defaultValue:0},
category_group:{property:'category_group', defaultValue:0}
}
},
Bank: {
label: 'Bank',
properties: {
}
}
},
'*'
)
YIELD graphName, nodeCount, relationshipCount;
Do you have any solution for this problem? Thank you very much!
07-21-2021 01:41 PM
You'll need to either:
Bank
nodes have a fraud
property but it's always 0, for example).If you choose the second option, you'll likely need to post process your predictions, because there's no easy way to tell the node classification model not to predict banks could be fraudulent. Although, using bank nodes as part of your negative dataset - and making sure they aren't incorrectly predicted to be fraudsters - could be part of your model tuning and evaluation.
07-22-2021 04:31 AM
Thanks @alicia.frame1 for the tips.
Unfortunately, I am not sure how to create a mono-partite projection since for example the same customer did 5 normal transactions and 1 fraud transaction. In theory, I would need to replicate the a customer node as often as they did a transaction and project every attribute of the Transaction node (fraud, amount) to the specific Customer node. Do I understand it correctly?
Sadly I don't know how to implement it in Neo4j - could you help me with this?
11-15-2021 09:24 PM
Hi Philip,
I encountered the same problem. I was wondering, did you ever resolved this issue? and if yes, may I know how you did it?
Thank you.
All the sessions of the conference are now available online