Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-16-2022 12:46 PM - edited 08-16-2022 12:52 PM
Hi,
the example https://neo4j.com/labs/apoc/4.4/overview/apoc.load/apoc.load.xml/ works fine.
But how to impoert nested XML Elements, e.g. if the https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85)?redirectedfrom=...
looks like:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<prices>
<price1>44.95</price1>
<price2>77.95</price2>
</prices>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
How to get instead one price the two prices (price1 and price2) ?
Thanks
Michel
Solved! Go to Solution.
08-16-2022 03:13 PM
I agree, the XML parsing is not intuitive. I modified your data to have multiple prices as shown in your example. Here is the test data. I extracted each attribute into a map that represents a single book. The query products a list of books.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<prices>
<price1>44.95</price1>
<price2>77.95</price2>
</prices>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<prices>
<price1>144.95</price1>
<price2>177.95</price2>
</prices>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<prices>
<price1>244.95</price1>
<price2>277.95</price2>
</prices>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
</catalog>
Query:
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices | x._text] as prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: prices
} as books
Result:
08-17-2022 06:59 AM
Try this. I broke out each price and showed them separately.
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices where x._type = 'price1' | x._text][0] as price1,
[x in book_prices where x._type = 'price2' | x._text][0] as price2,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: {price1: price1, price2: price2}
} as books
Example book from query:
{
"author": "Gambardella, Matthew",
"genre": "Computer",
"description": "An in-depth look at creating applications with XML.",
"id": "bk101",
"prices": {
"price1": "44.95",
"price2": "77.95"
},
"title": "XML Developer's Guide",
"publish_date": "2000-10-01"
}
08-18-2022 09:26 AM
Ok, I think this is getting to become very convoluted. I would suggest you write a utility in java or other language in your skill set to extract the xml data into csv format for easier importing. That being said, I got something to work. It gets more complex the greater the hierarchy. Look at the price extraction logic.
call apoc.load.xml("file:///Sample2.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book.id as id,
[z in [y in [x in children where x._type = 'prices'| x._children][0] | y._children] | {
currency: [i in z where i._type = 'currency' | i._text][0],
price: [i in z where i._type = 'netprice' | i._text][0],
vat: [i in z where i._type = 'add' | i._children][0][0]._text
}] as book_prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: book_prices
} as books
Result for first book:
{
"author": "Gambardella, Matthew",
"genre": "Computer",
"description": "An in-depth look at creating applications with XML.",
"id": "bk101",
"prices": [
{
"price": "54.95",
"vat": "0.00",
"currency": "USD"
},
{
"price": "52.95",
"vat": "0.30",
"currency": "EUR"
}
],
"title": "XML Developer's Guide",
"publish_date": "2000-10-01"
}
08-16-2022 03:13 PM
I agree, the XML parsing is not intuitive. I modified your data to have multiple prices as shown in your example. Here is the test data. I extracted each attribute into a map that represents a single book. The query products a list of books.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<prices>
<price1>44.95</price1>
<price2>77.95</price2>
</prices>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<prices>
<price1>144.95</price1>
<price2>177.95</price2>
</prices>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<prices>
<price1>244.95</price1>
<price2>277.95</price2>
</prices>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
</catalog>
Query:
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices | x._text] as prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: prices
} as books
Result:
08-17-2022 05:24 AM - edited 08-17-2022 05:26 AM
Dear Ninja,
thank you so much for your valuable help, it works ! Just an other question: How could I access the two prices separatelly ( I tried a lot of combinations - no success 😞 ) . I can access them with [0] and [1] , but it is not secure way... I want to know, which one was price1 and price2.
Warm regards,
Michel
08-17-2022 06:59 AM
Try this. I broke out each price and showed them separately.
call apoc.load.xml("file:///Sample.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book, children,
[x in children where x._type = 'prices'| x._children][0] as book_prices
with book.id as id,
[x in book_prices where x._type = 'price1' | x._text][0] as price1,
[x in book_prices where x._type = 'price2' | x._text][0] as price2,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: {price1: price1, price2: price2}
} as books
Example book from query:
{
"author": "Gambardella, Matthew",
"genre": "Computer",
"description": "An in-depth look at creating applications with XML.",
"id": "bk101",
"prices": {
"price1": "44.95",
"price2": "77.95"
},
"title": "XML Developer's Guide",
"publish_date": "2000-10-01"
}
08-18-2022 02:47 AM - edited 08-18-2022 04:22 AM
Hi Ninja,
I have a further more complex XML and I adjusted accordingly "our" test data. For every currency should be generated a node with all book data.
Coudl you help me please in that case as well ? The focus is on the nested elements...
Thanks you so much. If there is a appropriate documentation for that: Please refer me them.
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<prices>
<price>
<currency>USD</currency>
<netprice>54.95</netprice>
<add>
<VAT>
0.00
</VAT>
</add>
</price>
<price>
<currency>EUR</currency>
<netprice>52.95</netprice>
<add>
<VAT>
0.30
</VAT>
</add>
</price>
</prices>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<prices>
<price>
<currency>USD</currency>
<netprice>74.95</netprice>
<add>
<VAT>
0.00
</VAT>
</add>
</price>
<price>
<currency>EUR</currency>
<netprice>72.95</netprice>
<add>
<VAT>
0.50
</VAT>
</add>
</price>
</prices>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<prices>
<price>
<currency>USD</currency>
<netprice>24.95</netprice>
<add>
<VAT>
0.00
</VAT>
</add>
</price>
<price>
<currency>EUR</currency>
<netprice>24.95</netprice>
<add>
<VAT>
0.20
</VAT>
</add>
</price>
</prices>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
</catalog>
08-18-2022 09:26 AM
Ok, I think this is getting to become very convoluted. I would suggest you write a utility in java or other language in your skill set to extract the xml data into csv format for easier importing. That being said, I got something to work. It gets more complex the greater the hierarchy. Look at the price extraction logic.
call apoc.load.xml("file:///Sample2.xml")
yield value
with value._children as books
unwind books as book
with book, book._children as children
with book.id as id,
[z in [y in [x in children where x._type = 'prices'| x._children][0] | y._children] | {
currency: [i in z where i._type = 'currency' | i._text][0],
price: [i in z where i._type = 'netprice' | i._text][0],
vat: [i in z where i._type = 'add' | i._children][0][0]._text
}] as book_prices,
[x in children where x._type = 'title' | x._text][0] as title,
[x in children where x._type = 'author' | x._text][0] as author,
[x in children where x._type = 'genre' | x._text][0] as genre,
[x in children where x._type = 'publish_date' | x._text][0] as date,
[x in children where x._type = 'description' | x._text][0] as description
return {id: id, author: author, title: title, genre: genre, publish_date: date, description: description, prices: book_prices
} as books
Result for first book:
{
"author": "Gambardella, Matthew",
"genre": "Computer",
"description": "An in-depth look at creating applications with XML.",
"id": "bk101",
"prices": [
{
"price": "54.95",
"vat": "0.00",
"currency": "USD"
},
{
"price": "52.95",
"vat": "0.30",
"currency": "EUR"
}
],
"title": "XML Developer's Guide",
"publish_date": "2000-10-01"
}
All the sessions of the conference are now available online