cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How do you handle first seen and last seen properties on import when not in first import statement?

I am working with email data and importing the data in 5 parts in this order: NO ATTACHMENT OR LINK , URL ONLY , ATTACHMENT ONLY , URL AND ATTACHMENT , and Attachment to Attachment Name, FileName Node. The following is the import that worked fine for the NO ATTACHMENT OR LINK portion:

// NO ATTACHMENT OR LINK
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld})
MERGE (c:Recipient { name: row.To})
WITH a,c,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, c) YIELD rel as rel1
RETURN a,c

I was requested to add a count property of times seen for relationships between nodes, and a second requirement to add a first seen and last seen to each relationship and every node but the recipient(the data is external to internal so the recipient does not require the first or last seen).

So I started working with the following imports. This seems to work fine for the NO ATTACHMENT OR LINK if the sender is in the first import, but if the sender is not in the first import the first and last seen portion is messed up because the initial set is in the first import.

// NO ATTACHMENT OR LINK - FIRST IMPORT
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld, last_seen: datetime(row.DateTime) })
SET a.first_seen = coalesce(a.last_seen)
MERGE (b:Recipient { name: row.To, last_seen: datetime(row.DateTime) })
SET b.first_seen = coalesce(a.last_seen)
WITH a,b,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {last_seen: datetime(row.DateTime)}, b, {}) YIELD rel as rel1
SET rel1.first_seen = coalesce(rel1.last_seen)
SET rel1.times_seen = coalesce(rel1.times_seen, 0) + 1
RETURN a,b
// NO ATTACHMENT OR LINK - REST OF IMPORTS
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_2.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false" 
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b) YIELD rel
SET rel.last_seen = dt
SET rel.times_seen = coalesce(rel.times_seen, 0) + 1
RETURN a, b

Anyways for the way I am importing this data, is there a better way to do this, so that I dont have to break up the data into an initial import and following imports with a different import statement. And how should I handle the first seen and last seen properties if I go about it this way.

1 ACCEPTED SOLUTION

This seems to address the problem I was having:

// NO ATTACHMENT OR LINK
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
SET a.timesseen = coalesce(a.timesseen, 0) + 1
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt 
, CASE WHEN row.Outcome2 = 'FAILED_TO' THEN [1] ELSE [] END AS fail
, CASE WHEN row.Outcome2 = 'TO' THEN [1] ELSE [] END AS success
WHERE row.Url = "false" AND row.FileHash = "false"
FOREACH ( x in fail | 
  MERGE (a)-[rel1:FAILED_TO]->(b)
    ON CREATE SET rel1.firstseen = dt
  SET rel1.lastseen = dt
  SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
)
FOREACH ( x in success | 
  MERGE (a)-[rel2:TO]->(b)
    ON CREATE SET rel2.firstseen = dt
  SET rel2.lastseen = dt
  SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
)
RETURN a, b

View solution in original post

1 REPLY 1

This seems to address the problem I was having:

// NO ATTACHMENT OR LINK
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
SET a.timesseen = coalesce(a.timesseen, 0) + 1
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt 
, CASE WHEN row.Outcome2 = 'FAILED_TO' THEN [1] ELSE [] END AS fail
, CASE WHEN row.Outcome2 = 'TO' THEN [1] ELSE [] END AS success
WHERE row.Url = "false" AND row.FileHash = "false"
FOREACH ( x in fail | 
  MERGE (a)-[rel1:FAILED_TO]->(b)
    ON CREATE SET rel1.firstseen = dt
  SET rel1.lastseen = dt
  SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
)
FOREACH ( x in success | 
  MERGE (a)-[rel2:TO]->(b)
    ON CREATE SET rel2.firstseen = dt
  SET rel2.lastseen = dt
  SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
)
RETURN a, b