Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-09-2020 08:26 AM
I am working with email data and importing the data in 5 parts in this order: NO ATTACHMENT OR LINK
, URL ONLY
, ATTACHMENT ONLY
, URL AND ATTACHMENT
, and Attachment to Attachment Name, FileName Node
. The following is the import that worked fine for the NO ATTACHMENT OR LINK
portion:
// NO ATTACHMENT OR LINK
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld})
MERGE (c:Recipient { name: row.To})
WITH a,c,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, c) YIELD rel as rel1
RETURN a,c
I was requested to add a count property of times seen for relationships between nodes, and a second requirement to add a first seen and last seen to each relationship and every node but the recipient(the data is external to internal so the recipient does not require the first or last seen).
So I started working with the following imports. This seems to work fine for the NO ATTACHMENT OR LINK
if the sender is in the first import, but if the sender is not in the first import the first and last seen
portion is messed up because the initial set is in the first import.
// NO ATTACHMENT OR LINK - FIRST IMPORT
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_1.csv") AS row
MERGE (a:Sender { name: row.From, domain: row.Sender_Sub_Fld, last_seen: datetime(row.DateTime) })
SET a.first_seen = coalesce(a.last_seen)
MERGE (b:Recipient { name: row.To, last_seen: datetime(row.DateTime) })
SET b.first_seen = coalesce(a.last_seen)
WITH a,b,row
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {last_seen: datetime(row.DateTime)}, b, {}) YIELD rel as rel1
SET rel1.first_seen = coalesce(rel1.last_seen)
SET rel1.times_seen = coalesce(rel1.times_seen, 0) + 1
RETURN a,b
// NO ATTACHMENT OR LINK - REST OF IMPORTS
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_2.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
SET a.last_seen = dt
MERGE (b:Recipient {name: row.To})
SET b.last_seen = dt
WITH a, b, row, dt
WHERE row.Url = "false" AND row.FileHash = "false"
CALL apoc.merge.relationship(a, row.Outcome2, {}, {}, b) YIELD rel
SET rel.last_seen = dt
SET rel.times_seen = coalesce(rel.times_seen, 0) + 1
RETURN a, b
Anyways for the way I am importing this data, is there a better way to do this, so that I dont have to break up the data into an initial import and following imports with a different import statement. And how should I handle the first seen
and last seen
properties if I go about it this way.
Solved! Go to Solution.
01-14-2020 05:34 AM
This seems to address the problem I was having:
// NO ATTACHMENT OR LINK
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
SET a.timesseen = coalesce(a.timesseen, 0) + 1
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt
, CASE WHEN row.Outcome2 = 'FAILED_TO' THEN [1] ELSE [] END AS fail
, CASE WHEN row.Outcome2 = 'TO' THEN [1] ELSE [] END AS success
WHERE row.Url = "false" AND row.FileHash = "false"
FOREACH ( x in fail |
MERGE (a)-[rel1:FAILED_TO]->(b)
ON CREATE SET rel1.firstseen = dt
SET rel1.lastseen = dt
SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
)
FOREACH ( x in success |
MERGE (a)-[rel2:TO]->(b)
ON CREATE SET rel2.firstseen = dt
SET rel2.lastseen = dt
SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
)
RETURN a, b
01-14-2020 05:34 AM
This seems to address the problem I was having:
// NO ATTACHMENT OR LINK
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///sessions/new_neo_test_3.csv") AS row
WITH row, datetime(row.DateTime) AS dt
MERGE (a:Sender {name: row.From, domain: row.Sender_Sub_Fld})
ON CREATE SET a.firstseen = dt
SET a.lastseen = dt
SET a.timesseen = coalesce(a.timesseen, 0) + 1
MERGE (b:Recipient {name: row.To})
ON CREATE SET b.firstseen = dt
SET b.lastseen = dt
WITH a, b, row, dt
, CASE WHEN row.Outcome2 = 'FAILED_TO' THEN [1] ELSE [] END AS fail
, CASE WHEN row.Outcome2 = 'TO' THEN [1] ELSE [] END AS success
WHERE row.Url = "false" AND row.FileHash = "false"
FOREACH ( x in fail |
MERGE (a)-[rel1:FAILED_TO]->(b)
ON CREATE SET rel1.firstseen = dt
SET rel1.lastseen = dt
SET rel1.timesseen = coalesce(rel1.timesseen, 0) + 1
)
FOREACH ( x in success |
MERGE (a)-[rel2:TO]->(b)
ON CREATE SET rel2.firstseen = dt
SET rel2.lastseen = dt
SET rel2.timesseen = coalesce(rel2.timesseen, 0) + 1
)
RETURN a, b
All the sessions of the conference are now available online