Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-16-2019 05:59 AM
I have one graph with user nodes and follows relations.
there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar :
START_ID, END_ID
1 , 2
1 , 3
1 , 4
2 , 1
2 , 5
4 , 3
this csv file has 3,000,000 lines. my cypher take long time. can I write cypher that was faster?
my cypher is:
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
match (a:User)-[f:FOLLOWS]->(b:User) where a.id=toInt(line["START_ID"]) and b.id=toInt(line["END_ID"]) with collect(f) as rels
where size(rels) > 1
unwind tail(rels) as t
delete t
also graph has index on node ids.
06-16-2019 12:00 PM
You can probably get rid of the toInt()
in the query so you're not doing function calls within your match query. If you suspect you have nodes using inconsistent data types, correct it first before your duplicate clean up operation. You may want to also consider use apoc.periodic.commit()
to commit batches of completed work.
LOAD CSV WITH HEADERS FROM 'file:///user_followers.csv' as line
MATCH (a:User)-[f:FOLLOWS]->(b:User)
WHERE a.id = line["START_ID"]
AND b.id = line["END_ID"]
WITH collect(f) AS rels
WHERE size(rels) > 1
UNWIND tail(rels) as t
DELETE t
12-06-2020 06:05 AM
Hello Mike,
Do you know if tail is performed first or is the unwind first performed?
I just used your code to solve the same issue I had, and am trying to understand how Neo4j does the order of things.
Here is how I repurposed it
MATCH (t:Toy)-[rel:SOLD_BY]->(s:Supplier)
WITH COLLECT(rel) AS RELS,t
WHERE SIZE(RELS) >1
UNWIND tail(RELS) as reltail
DELETE reltail
12-06-2020 07:58 AM
have a general question about the SIZE() function. It seems that is size gets a list from collect that is a nested list, size seems to first unwind the list and then counts what is in each row.
Am I seeing this correct?
I can't find an explanation no where in the manual about this. The manual fustrates me with this type of information. The stuff that it is doing in the smart ways.
Thanks in advance,
Jeffrey
12-07-2020 08:25 AM
Here's the documentation.
The explanation here is a bit skimpy though.... and doesn't explain your question. It should be improved.
12-07-2020 08:54 AM
Hello Clem,
Thank you for the reply. I read that entry in the manual, and that is why I came here hoping someone would have a better insight into my question. This is exactly my point the manual is so scares of information.
Is there someone from the company on the forum or is there a support department that I can contact concerning the manual. because the manual needs to be fixed. I have been reading it since version 3 and it hasn't gotten better.
06-06-2021 12:41 AM
I use this to delete duplicate relationships between two types (labels) of nodes, but I think you could just use 2 aliases of the same node (label):
MATCH (:Location)-[r:LOC_IN_DIV]-(:Division)
WITH collect(r) as rels
WHERE size(rels) > 1
CALL apoc.refactor.mergeRelationships(rels) YIELD rel
RETURN COUNT(*)
06-17-2019 11:31 PM
there is duplicate relations in graph and I want remove oldest duplicate relations.
I want investigate relations according to csv file similar
could you please explain this with example so we can suggest you better
06-18-2019 12:57 AM
I find this solution:
I list end ids in relation file and then sort uniq them in follower-end.csv. then run this query:
End_DI
1
2
3
4
5
load csv WITH HEADERS from 'file:///tmp/follower-end.csv' as line with toInt(line["END_ID"]) as e_id
match (s:User)-[f:FOLLOWS]->(e:User) where e.id =e_id with e_id, s.id as s_id, collect(f) as rels where size(rels) > 1 unwind tail(rels) as t delete t
10-06-2021 07:46 AM
Hi I am trying to deduplicate relations with attributes
How to do that keeping the relations based on the unique attributes
Thanks in advance
All the sessions of the conference are now available online