cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Importing nested XML Elements

Michel
Node Clone

Hi,

the example  https://neo4j.com/labs/apoc/4.4/overview/apoc.load/apoc.load.xml/ works fine.

But how to impoert nested XML Elements, e.g. if the https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85)?redirectedfrom=... 

looks like:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <prices>
      	<price1>44.95</price1>
      	<price2>77.95</price2>
      </prices>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>

How to get instead one price the two prices (price1 and price2) ?

 

Thanks

Michel

 

3 ACCEPTED SOLUTIONS

I agree, the XML parsing is not intuitive. I modified your data to have multiple prices as shown in your example. Here is the test data. I extracted each attribute into a map that represents a single book. The query products a list of books.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <prices>
      	<price1>44.95</price1>
      	<price2>77.95</price2>
      </prices>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <prices>
      	<price1>144.95</price1>
      	<price2>177.95</price2>
      </prices>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <prices>
      	<price1>244.95</price1>
      	<price2>277.95</price2>
      </prices>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   </catalog>

Query:

call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children, 
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices | x._text] as prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: prices
} as books

Result:

Screen Shot 2022-08-16 at 6.12.38 PM.png

View solution in original post

Try this.  I broke out each price and showed them separately. 

call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children, 
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices where x._type = 'price1' | x._text][0] as price1,
[x in book_prices where x._type = 'price2' | x._text][0] as price2,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: {price1: price1, price2: price2}
} as books

Example book from query:

{
  "author": "Gambardella, Matthew",
  "genre": "Computer",
  "description": "An in-depth look at creating applications with XML.",
  "id": "bk101",
  "prices": {
    "price1": "44.95",
    "price2": "77.95"
  },
  "title": "XML Developer's Guide",
  "publish_date": "2000-10-01"
}

View solution in original post

Ok, I think this is getting to become very convoluted.  I would suggest you write a utility in java or other language in your skill set to extract the xml data into csv format for easier importing.  That being said, I got something to work. It gets more complex the greater the hierarchy. Look at the price extraction logic.

call apoc.load.xml("file:///Sample2.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book.id as id,
[z in [y in [x in children where x._type = 'prices'| x._children][0] | y._children] | {
    currency: [i in z where i._type = 'currency' | i._text][0],
    price: [i in z where i._type = 'netprice' | i._text][0],
    vat: [i in z where i._type = 'add' | i._children][0][0]._text
}] as book_prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: book_prices
} as books

Result for first book:

{
  "author": "Gambardella, Matthew",
  "genre": "Computer",
  "description": "An in-depth look at creating applications with XML.",
  "id": "bk101",
  "prices": [
    {
      "price": "54.95",
      "vat": "0.00",
      "currency": "USD"
    },
    {
      "price": "52.95",
      "vat": "0.30",
      "currency": "EUR"
    }
  ],
  "title": "XML Developer's Guide",
  "publish_date": "2000-10-01"
}

 

View solution in original post

5 REPLIES 5

I agree, the XML parsing is not intuitive. I modified your data to have multiple prices as shown in your example. Here is the test data. I extracted each attribute into a map that represents a single book. The query products a list of books.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <prices>
      	<price1>44.95</price1>
      	<price2>77.95</price2>
      </prices>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <prices>
      	<price1>144.95</price1>
      	<price2>177.95</price2>
      </prices>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <prices>
      	<price1>244.95</price1>
      	<price2>277.95</price2>
      </prices>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   </catalog>

Query:

call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children, 
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices | x._text] as prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: prices
} as books

Result:

Screen Shot 2022-08-16 at 6.12.38 PM.png

Michel
Node Clone

Dear Ninja,

thank you so much for your valuable help, it works !  Just an other question: How could I access the two prices separatelly ( I tried a lot of combinations - no success 😞 ) . I can access them with [0] and [1] , but it is not secure way... I want to know, which one was price1 and price2.


Warm regards,

Michel

Try this.  I broke out each price and showed them separately. 

call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children, 
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices where x._type = 'price1' | x._text][0] as price1,
[x in book_prices where x._type = 'price2' | x._text][0] as price2,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: {price1: price1, price2: price2}
} as books

Example book from query:

{
  "author": "Gambardella, Matthew",
  "genre": "Computer",
  "description": "An in-depth look at creating applications with XML.",
  "id": "bk101",
  "prices": {
    "price1": "44.95",
    "price2": "77.95"
  },
  "title": "XML Developer's Guide",
  "publish_date": "2000-10-01"
}

Michel
Node Clone

Hi Ninja,

I have a further more complex XML and I adjusted accordingly "our" test data.  For every currency should be generated a node with all book data. 

Coudl you help me please in that case as well ? The focus is on the nested elements...

Thanks you so much.  If there is a appropriate documentation for that: Please refer me them.

 

<?xml version="1.0"?>
<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <prices>
            <price>
                <currency>USD</currency>
                <netprice>54.95</netprice>
                <add>
                    <VAT>
                        0.00
                    </VAT>
                </add>
            </price>
            <price>
                <currency>EUR</currency>
                <netprice>52.95</netprice>
                <add>
                    <VAT>
                        0.30
                    </VAT>
                </add>
            </price>
        </prices>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications 
            with XML.</description>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <prices>
            <price>
                <currency>USD</currency>
                <netprice>74.95</netprice>
                <add>
                    <VAT>
                        0.00
                    </VAT>
                </add>
            </price>
            <price>
                <currency>EUR</currency>
                <netprice>72.95</netprice>
                <add>
                    <VAT>
                        0.50
                    </VAT>
                </add>
            </price>
        </prices>
        <publish_date>2000-12-16</publish_date>
        <description>A former architect battles corporate zombies, 
            an evil sorceress, and her own childhood to become queen 
            of the world.</description>
    </book>
    <book id="bk103">
        <author>Corets, Eva</author>
        <title>Maeve Ascendant</title>
        <genre>Fantasy</genre>
        <prices>
            <price>
                <currency>USD</currency>
                <netprice>24.95</netprice>
                <add>
                    <VAT>
                        0.00
                    </VAT>
                </add>
            </price>
            <price>
                <currency>EUR</currency>
                <netprice>24.95</netprice>
                <add>
                    <VAT>
                        0.20
                    </VAT>
                </add>
            </price>
        </prices>
        <publish_date>2000-11-17</publish_date>
        <description>After the collapse of a nanotechnology 
            society in England, the young survivors lay the 
            foundation for a new society.</description>
    </book>
</catalog>

 

 

 

 

 

Ok, I think this is getting to become very convoluted.  I would suggest you write a utility in java or other language in your skill set to extract the xml data into csv format for easier importing.  That being said, I got something to work. It gets more complex the greater the hierarchy. Look at the price extraction logic.

call apoc.load.xml("file:///Sample2.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book.id as id,
[z in [y in [x in children where x._type = 'prices'| x._children][0] | y._children] | {
    currency: [i in z where i._type = 'currency' | i._text][0],
    price: [i in z where i._type = 'netprice' | i._text][0],
    vat: [i in z where i._type = 'add' | i._children][0][0]._text
}] as book_prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: book_prices
} as books

Result for first book:

{
  "author": "Gambardella, Matthew",
  "genre": "Computer",
  "description": "An in-depth look at creating applications with XML.",
  "id": "bk101",
  "prices": [
    {
      "price": "54.95",
      "vat": "0.00",
      "currency": "USD"
    },
    {
      "price": "52.95",
      "vat": "0.30",
      "currency": "EUR"
    }
  ],
  "title": "XML Developer's Guide",
  "publish_date": "2000-10-01"
}

 

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online