Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-25-2020 04:08 AM
Hi everybody!
I'm new at Neo4j and i'm trying to learn it in order to start a project for my thesis, i spent a lot of time to manually reorganize a large dataset from xml to csv, then i imported a piece of this db inside Neo4j, i would avoid those duplicates but i canno't find a proper query, can you help me?
Thanks!
Please tell me if you need to know anything about this database or other stuff to undestand better the situation.
03-25-2020 06:02 AM
Hi @alessio9899,
Welcome to the Community!!
I guess post Creating Nodes using Create Statement your applied
CALL apoc.refactor.mergeNodes(nodes, {properties: "combine"}) YIELD node
If that is the case try something like
MATCH (p:Person)
WITH p ORDER BY p.created DESC // newest one first
WITH p.email, collect(p) as nodes
CALL apoc.refactor.mergeNodes(nodes, {properties: {name:'discard', age:'overwrite', kids:'combine', `addr.*`, 'overwrite',`.*`: 'discard'}}) YIELD node
RETURN node
given @ http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/graph-refactoring/merge-nodes/
If your already have that data in the DB then you can try split function on the property and take first index [0]
03-25-2020 08:38 AM
thanks you for answer!
i'll try to explaine you step by step what exactly i did:
editorInfo:COALESCE(line.EditorsInfo,$defaultValue),
editorPersonId:COALESCE(toInteger(line.EditorsPersonId),$defaultInteger),
artistInfo:COALESCE(line.ArtistsInfo,$defaultValue),
artistPersonId:COALESCE(toInteger(line.ArtistsPersonId),$defaultInteger),
illustratorInfo:COALESCE(line.IllustratorsInfo,$defaultValue),
illustratorPersonId:COALESCE(toInteger(line.IllustratorsPersonId),$defaultInteger),
printerInfo:COALESCE(line.PrintersInfo,$defaultValue),
printerPersonId:COALESCE(toInteger(line.PrintersPersonId),$defaultInteger),
printerOrnament:COALESCE(line.PrinterOrnament,$defaultValue),
engraverReference:COALESCE(line.EngraverReference,$defaultValue),
essayTitle:COALESCE(line.EssayTitle,$defaultValue),
essayContributorInfo:COALESCE(line.EssayContributorInfo,$defaultValue),
essayContributorPersonId:COALESCE(toInteger(line.EssayContributorPersonId),$defaultInteger),
citation:COALESCE(line.Citation,$defaultValue),
workId:COALESCE(toInteger(line.WorkId),$defaultInteger),
notes:COALESCE(line.Notes,$defaultValue)
})
RETURN o
i obtained this result: https://ibb.co/6mvYKgS
as you can see there is a lot of nodes while i just need 3 nodes in total, based on uniqueness of WorkId.
then i executed this:
MATCH (o:Work)
WITH o.shortTitle AS title, collect(o) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.dedicateePersonId) AS dedicateePersonId,
extract(x IN node2Merge | x.dedicateeInfo) AS dedicateeInfo,
extract(x IN node2Merge | x.authorPersonId) AS authorPersonId,
extract(x IN node2Merge | x.authorInfo) AS authorInfo,
extract(x IN node2Merge | x.censorPersonId) AS censorPersonId,
extract(x IN node2Merge | x.censorInfo) AS censorInfo,
extract(x IN node2Merge | x.contributorPersonId) AS contributorPersonId,
extract(x IN node2Merge | x.contributorInfo) AS contributorInfo,
extract(x IN node2Merge | x.editorPersonId) AS editorPersonId,
extract(x IN node2Merge | x.editorInfo) AS editorInfo,
extract(x IN node2Merge | x.artistPersonId) AS artistPersonId,
extract(x IN node2Merge | x.artistInfo) AS artistInfo,
extract(x IN node2Merge | x.illustratorPersonId) AS illustratorPersonId,
extract(x IN node2Merge | x.illustratorInfo) AS illustratorInfo,
extract(x IN node2Merge | x.printerPersonId) AS printerPersonId,
extract(x IN node2Merge | x.printerInfo) AS printerInfo,
extract(x IN node2Merge | x.essayContributorPersonId) AS essayContributorPersonId,
extract(x IN node2Merge | x.essayContributorInfo) AS essayContributorInfo,
extract(x IN node2Merge | x.essayTitle) AS essayTitle
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.dedicateePersonId = dedicateePersonId
SET node.dedicateeInfo = dedicateeInfo
SET node.authorPersonId = authorPersonId
SET node.authorInfo = authorInfo
SET node.contributorPersonId = contributorPersonId
SET node.contributorInfo = contributorInfo
SET node.censorPersonId = censorPersonId
SET node.censorInfo = censorInfo
SET node.editorPersonId = editorPersonId
SET node.editorInfo = editorInfo
SET node.artistPersonId = artistPersonId
SET node.artistInfo = artistInfo
SET node.illustratorPersonId = illustratorPersonId
SET node.illustratorInfo = illustratorInfo
SET node.printerPersonId = printerPersonId
SET node.printerInfo = printerInfo
SET node.essayContributorPersonId = essayContributorPersonId
SET node.essayContributorInfo = essayContributorInfo
SET node.essayTitle = essayTitle
any help?
All the sessions of the conference are now available online