Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-17-2020 08:50 PM
Actually I am facing a weird situation . the scenario is like this
I have a huge volume data base over 10 million nodes and relationship. I am loading nodes and relationship from csv file using load csv with apoc. my relationship csv file has some attriubute. One of them are Timestamp . Time stamp column is properly there but when I load into graph db . I can connect nodes with relationship but some Timestamp value are missing which actually exist in csv file. I have figured it out when I was going to filter it with some timestamp value .
02-18-2020 12:22 AM
Hi Kalyan,
Is this problem will all the node property with datatype as Timestamp?
Could you please share few records and the way you are trying to load this property into the graph.
Regards
vivek
02-19-2020 07:01 AM
hello @intouch.vivek I am loading relationship by bellow query
"""CALL apoc.periodic.iterate('
load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""
There is column name Relationship,Timestamp, CID,AID, REGDATE, TYPES
I am using it right now , I think I could also include row.TIMESTAMP here. Some records relationships attribute were added but some of them are not ... I have counted al missing Timestamp property relationship .There are 312527595 missing values in total 1033404681 relationships. How could I get those without csv tracking . Kindly help me
02-18-2020 04:33 AM
You could try
MATCH ()-[r]->() WHERE not exists(r.timestamp) RETURN r
This will return all relationships where timestamp property does not exist.
Be aware that it can take some time as it needs to do a DB scan to complete this functionality
02-19-2020 02:23 AM
Thank you for your guidance I am trying it and let you know 🙂
02-18-2020 11:29 AM
as Anthapu said that could be the best solution and it going to take sometime, I would recommend to use kettle to had more control over the files that you are uploading.
I could share with you some links
let me know if that could help or if you need help to use it.
02-19-2020 03:09 AM
@anthapu I have counted the Missing Timestamp value
312527595
MATCH ()-[r]->() WHERE not exists(r.TIMESTAMP) RETURN count(r)
My Total relationship was
1033404681
It seems bit silly to ask ,
so is there any way to retrieve these without csv tracking
02-19-2020 07:48 AM
Could it be that there is no Timestamp value for those rows in CSV?
02-19-2020 07:58 AM
in fact the operation was timestamp related . So every CSV i get have timestamp column and have "2019-09-21T22:37:23Z" (sample ) value like this
02-19-2020 08:22 AM
Could you please put a sample CSV and Cypher you are using?
02-19-2020 08:34 AM
RELATIONSHIP | AID | CID | TXNID | TIMESTAMP | AMOUNT | CHANNEL |
---|---|---|---|---|---|---|
C2W | 303 | 343 | abfrt#tyt | 2019-03-12T10:55:24 | 100 | APP |
code:
load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""
02-19-2020 08:42 AM
Can you modify the query to
load csv with headers from "file:///relationshiptest20190610.csv" AS row return row
','
MERGE (p1:A {ID: row.AID})
MERGE (p2:C {ID: row.CID})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.RELATIONSHIP, {timestamp: row. TIMESTAMP}, p2) YIELD rel return rel
',{batchSize:10000, iterateList:true, parallel:true})"""
Since you are not passing any parameters they are not added to the relationship.
You can add any other parameters you need next to the timestamp property.
02-19-2020 08:48 AM
Thanks for help,yeh that's I figured that as well. but how to recover those previous missing time stamp property . will it needs to be manual csv loading and rewriting or is there any other way to do that . I can do this modification from now on. Actually my previous codes put timestamp values too but some of them are now missing . So I didn't notice it before.
02-19-2020 09:05 AM
MATCH (p1:A {ID: row.AID})
MATCH (p2:C {ID: row.CID})
WITH p1, p2, row
MATCH (p1)-[r]->(p2)
WHERE not exists(r.timestamp) and type(r) = row. RELATIONSHIP
SET r.timestamp=row.TIMESTAMP
This can fix the missed timestamp value.
02-19-2020 09:23 AM
well I have got a confusion here. wouldn't it create only a blank Timestamp field here ? would it recover the desired "Missing" timestamp value for the particular nodes and relationship ? May be my understanding is wrong but can you explain it please . I will try it out of course
02-19-2020 09:32 AM
This query is trying to find a relationship of given type ad which does not have timestamp property and try to set that property.
One caveat is that if he relationship is not unique between these nodes (you are using create relationship), only option is to delete and recreate the relationships. This is because there is no way to uniquely identify the relationship.
It might be easy to drop db and re-run the whole ingest script with corrections. That might be faster.
All the sessions of the conference are now available online