Neo4j

rubinho101 · ‎08-30-2021

I am trying to load a XML file with apoc and merge different nested elements into nodes. Unfortunately, in case there are nested elements, my query returns multiple elements per unique ID where I only want one. A different cypher query only returns a Type missmatch.

XML file:

<?xml version="1.0" encoding="UTF-8"?>
<document>
  <objects type="w">
    <spec>
      <id>some_id</id>
      <shortdesc>sd</shortdesc>
      <status>approved</status>
      <version>1</version>
      <sourcefile>source</sourcefile>
      <sourceline>1</sourceline>
      <description>description</description>
      <needs>
        <needso>f</needso>
      </needs>
      <provides>
        <provcov>
          <L2>some_random_number</L2>
          <dstv>1</dstv>
        </provcov>
        <provcov>
          <L2>some_random_number_1</L2>
          <dstv>1</dstv>
        </provcov>
      </provides>
    </spec>

My cypher query I:

call apoc.load.xml("file:/import.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id

As you can see in the results, there two rows (nodes) for the same ID where I actually only want one. I know the unwind transforms the elements to rows, but otherwise I get an error (see cypher query 2 at the end for details).

Type	L2	id
sw	null	[some_id]
sw	[some_random_number]	[some_id]

Any idea to have only one row (node) per parent (id)? Skipping the 'null' cells is not an option because there are some IDs which have a L2 'null' attribute.

Thank you in advance.

Cypher query 2:

call apoc.load.xml("file:/import.xml",'/document/objects,{}, true) yield value
unwind value as RA
unwind RA._objects as RA1
return RA.type as Type,
    [item in RA1._spec where item._type='id'|item._text] as ID,
    [item in RA1._spec._provides._provcov where item._type='L2'] as L2;

Result:

Type mismatch: expected a map but was List

Bennu · ‎08-30-2021

Hi,

It's not really clear from your post which are your expected rows.

If the one not desired is the one with null as tag, you can try:

call apoc.load.xml("file:/t.xml",'/document/objects',{}, false) yield value as ids
unwind ids._children as RA
unwind RA._children as RA2
unwind RA2._children as RA3
with *
where any(item in RA3._children where item._type = 'L2')
return ids.type as Type,
[item in RA3._children where item._type = 'L2'|item._text] as L2,
[item in RA._children where item._type = 'id'|item._text] as id

Bennu

rubinho101 · ‎08-31-2021

Thx for your reply. My expected result would look sth like this:

Type	L2	id
sw	null, [some_random_number, some_random_number_1, ...]	[some_id]

Basically that all duplicated rows for the same ID are reduced and aggregated into one row so I can process them more consistent. Skipping the null rows per default is not an option because there are some elements that have no L2 entry.

Neo4j

Loading nested XML elements not working properly