cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to retrieve missing relationship property in Neo4j graph

Actually I am facing a weird situation . the scenario is like this
I have a huge volume data base over 10 million nodes and relationship. I am loading nodes and relationship from csv file using load csv with apoc. my relationship csv file has some attriubute. One of them are Timestamp . Time stamp column is properly there but when I load into graph db . I can connect nodes with relationship but some Timestamp value are missing which actually exist in csv file. I have figured it out when I was going to filter it with some timestamp value .

  1. Is there any way to retrieve these missing timestamp value in neo4j or is there anyway to find it cause. I am daily uploading csv file to database using cronjob. so tracking explicit csv file will be tiresome . a Smart guidance,diagnosis and solve will be very helpful for me
15 REPLIES 15

intouch_vivek
Graph Steward

Hi Kalyan,
Is this problem will all the node property with datatype as Timestamp?
Could you please share few records and the way you are trying to load this property into the graph.

Regards
vivek

hello @intouch.vivek I am loading relationship by bellow query
"""CALL apoc.periodic.iterate('

load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""

There is column name Relationship,Timestamp, CID,AID, REGDATE, TYPES
I am using it right now , I think I could also include row.TIMESTAMP here. Some records relationships attribute were added but some of them are not ... I have counted al missing Timestamp property relationship .There are 312527595 missing values in total 1033404681 relationships. How could I get those without csv tracking . Kindly help me

anthapu
Graph Fellow

You could try

MATCH ()-[r]->() WHERE not exists(r.timestamp) RETURN r

This will return all relationships where timestamp property does not exist.

Be aware that it can take some time as it needs to do a DB scan to complete this functionality

Thank you for your guidance I am trying it and let you know 🙂

roberto1
Graph Buddy

as Anthapu said that could be the best solution and it going to take sometime, I would recommend to use kettle to had more control over the files that you are uploading.
I could share with you some links

let me know if that could help or if you need help to use it.

@anthapu I have counted the Missing Timestamp value
312527595
MATCH ()-[r]->() WHERE not exists(r.TIMESTAMP) RETURN count(r)
My Total relationship was
1033404681
It seems bit silly to ask ,
so is there any way to retrieve these without csv tracking

Could it be that there is no Timestamp value for those rows in CSV?

in fact the operation was timestamp related . So every CSV i get have timestamp column and have "2019-09-21T22:37:23Z" (sample ) value like this

Could you please put a sample CSV and Cypher you are using?

@anthapu

RELATIONSHIP AID CID TXNID TIMESTAMP AMOUNT CHANNEL
C2W 303 343 abfrt#tyt 2019-03-12T10:55:24 100 APP

code:

load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""

Can you modify the query to

load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {timestamp: row. TIMESTAMP}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""

Since you are not passing any parameters they are not added to the relationship.

You can add any other parameters you need next to the timestamp property.

Thanks for help,yeh that's I figured that as well. but how to recover those previous missing time stamp property . will it needs to be manual csv loading and rewriting or is there any other way to do that . I can do this modification from now on. Actually my previous codes put timestamp values too but some of them are now missing . So I didn't notice it before.

MATCH (p1:A {ID: row.AID})
MATCH (p2:C {ID: row.CID})
WITH p1, p2, row
MATCH (p1)-[r]->(p2)
WHERE not exists(r.timestamp) and type(r) = row. RELATIONSHIP
SET r.timestamp=row.TIMESTAMP

This can fix the missed timestamp value.

well I have got a confusion here. wouldn't it create only a blank Timestamp field here ? would it recover the desired "Missing" timestamp value for the particular nodes and relationship ? May be my understanding is wrong but can you explain it please . I will try it out of course

This query is trying to find a relationship of given type ad which does not have timestamp property and try to set that property.

One caveat is that if he relationship is not unique between these nodes (you are using create relationship), only option is to delete and recreate the relationships. This is because there is no way to uniquely identify the relationship.

It might be easy to drop db and re-run the whole ingest script with corrections. That might be faster.