Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-30-2021 11:04 AM
I am trying to load a XML file with apoc and merge different nested elements into nodes. Unfortunately, in case there are nested elements, my query returns multiple elements per unique ID where I only want one. A different cypher query only returns a Type missmatch.
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<document>
<objects type="w">
<spec>
<id>some_id</id>
<shortdesc>sd</shortdesc>
<status>approved</status>
<version>1</version>
<sourcefile>source</sourcefile>
<sourceline>1</sourceline>
<description>description</description>
<needs>
<needso>f</needso>
</needs>
<provides>
<provcov>
<L2>some_random_number</L2>
<dstv>1</dstv>
</provcov>
<provcov>
<L2>some_random_number_1</L2>
<dstv>1</dstv>
</provcov>
</provides>
</spec>
My cypher query I:
call apoc.load.xml("file:/import.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id
As you can see in the results, there two rows (nodes) for the same ID where I actually only want one. I know the unwind transforms the elements to rows, but otherwise I get an error (see cypher query 2 at the end for details).
Type | L2 | id |
---|---|---|
sw | null | [some_id] |
sw | [some_random_number] | [some_id] |
Any idea to have only one row (node) per parent (id)? Skipping the 'null' cells is not an option because there are some IDs which have a L2 'null' attribute.
Thank you in advance.
Cypher query 2:
call apoc.load.xml("file:/import.xml",'/document/objects,{}, true) yield value
unwind value as RA
unwind RA._objects as RA1
return RA.type as Type,
[item in RA1._spec where item._type='id'|item._text] as ID,
[item in RA1._spec._provides._provcov where item._type='L2'] as L2;
Result:
Type mismatch: expected a map but was List
08-30-2021 04:21 PM
Hi,
It's not really clear from your post which are your expected rows.
If the one not desired is the one with null as tag, you can try:
call apoc.load.xml("file:/t.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
with *
where any(item in RA3._children where item._type = 'L2')
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id
Bennu
08-31-2021 02:40 AM
Thx for your reply. My expected result would look sth like this:
Type | L2 | id |
---|---|---|
sw | null, [some_random_number, some_random_number_1, ...] | [some_id] |
Basically that all duplicated rows for the same ID are reduced and aggregated into one row so I can process them more consistent. Skipping the null rows per default is not an option because there are some elements that have no L2 entry.
All the sessions of the conference are now available online