cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

XML-File - max.-value for levels ?

Hello, 

I have some XML-Files and I'm trying to get some values from it and put it on new Nodes.

Part of my Code:

call apoc.load.xml("file:///NCT0017xxxx/NCT00170157.xml") yield value
with value._children as root
......

with [x in title where x._type = 'title'][0] as title,
[x in nctid where x._type = 'nct_id'][0] as nctid,
[x in milestone where x._type = 'title'][0] as mstitle,
[x in milestone where x._type = 'participants_list'][0]._children as participantslist

with [x in title where x._type = 'title'][0] as title,
[x in nctid where x._type = 'nct_id'][0] as nctid,
[x in mstitle where x._type = 'title'][0] as mstitle,
[x in participantslist where x._type = 'participants'][0] as participants

........

Now I have the issue, that some level have multiple fields and I can only get one of those ([0] = first ).

XMLBild.png

 One solution would be to unwind a range(start,max), but I dont know how to find out the value: maximum of each lvl on an xml-file. Thx for your help. 

10 REPLIES 10

'participantslist' will be a list of the participants. You can get the number of participants in the list with 'size(participantslist)'. A single 'participant' should look something like:

{_type:"participants", group_id:'P1', count:'54'},

so the list 'participantslist' should look like:

[{_type:"participants", group_id:'P1', count:'54'}, {_type:"participants", group_id:'P2', count:'58'}]

What information do you want from the list?  You can use list comprehension to extract the group_ids, such as:

 [x in participantslist | x.group_id]

The list comprehension worked well, so I tried this: 

....

with [in title where x._type = 'title'][0] as title,
     [in nctid where x._type = 'nct_id'][0] as nctid, 
     [in mstitle where x._type = 'title'][0] as mstitle, 
     [in participantslist where x._type = 'participants'][0] as participants,
    [in participantslist | x.group_id] as groupids,
    [in participantslist | x.count] as counts,
    unwind groupids as groupid
    unwind counts as count
...
MERGE(pa:Participants{group_id: groupid, count: count})
MERGE(t:Trial{nct_id:nctid._text })
MERGE (t)-[:CONTAIN]->(pa)
 
But the output is an cartesian product with both properties(groupid,count). So I got 4 Nodes instead of 2. 
Then I tried that with a counter-attribute: 
.... 
unwind range(0,size(participantslist)) as counters 
with
    counters as counters,
  ..... ,
[in participantslist where x._type = 'participants'][0] as participants, 
MERGE(pa:Participants{group_id: participants.groupids[counters], count: participants.count[counters]}) 
 
counters value is: 0 1 2 in a column but when it comes to the merge I get this Error: (54 is the first count-value) . 
The line toInteger(counters) as counters also does not affect to the output
`String("54")` is not a collection or a map. Element access is only possible by performing a collection lookup using an integer index, or by performing a map lookup using a string key (found: String("54")[Long(0)])

The first one combined with the counter also doesnt work: 

unwind range(0,size(participantslist)) as counters 
with
  [in participantslist where x._type = 'participants'][0] as participants,
    [in participantslist | x.group_id] as groupids,
    [in participantslist | x.count] as counts,
    unwind groupids as groupid
    unwind counts as count
MERGE(pa:Participants{group_id: groupid[counters], count: count[counters]}) 
...
Error: (P1 is a group_id-value) 
`String("P1")` is not a collection or a map. Element access is only possible by performing a collection lookup using an integer index, or by performing a map lookup using a string key (found: String("P1")[Long(0)])

Yes, the double 'unwind' will do that.  For illustration purpose, the following query gives the results below. The reason is the first unwind gives you each value of the first list with the entire second list appended to each element. When you unwind the second list, it is done for each row, which represents a different value of the first list, thus you cross product of both lists.

with ['a', 'b'] as list1,  ['c', 'd'] as list2
unwind list1 as itemFromList1
unwind list2 as itemFromList2
return itemFromList1, itemFromList2

Screen Shot 2022-07-07 at 9.21.21 AM.png 

 

 

 

 

it is getting a little complicated following the code without having access to the xml and your full query, but I can make this suggestion. Instead of extracting the 'group_id' values into its own list and the 'count' values into its own list, you can process the each pair of 'group_id' and 'count' together. 

Assuming 'participantslist' looks like the following:

[{_type:"participants", group_id:'P1', count:'54'}, {_type:"participants", group_id:'P2', count:'58'}]

you could do something like the following:

forEach(x in participantslist |
MERGE(pa:Participants{group_id: x.group_id, count: x.count})
)

BTW- when you did this 'unwind range(0,size(participantslist)) as counters ', you will need to set the max range to size(participantslist)-1. 

Thank you for your advice, I am a step forward, but how can I use the variable "pa", to use it? 

My plan was: 

merge(pa:Participants)
forEach(in participantslist |MERGE(pa{group_id: x.group_id, count: x.count}) )
MERGE(t:Trial{nct_id:nctid._text })
MERGE (pa)-[:IN]->(m)-[:IN]->(p) 
........
The good thing is that the 2 Nodes got the right properties but without a Label(Participants). And they are still not bound to the other Nodes, because I have no access to the inside of the foreach(pa).
 

This code gave me back an empty Participants-Node, so it does not merge well : 

merge(pa:Participants)
forEach(in participantslist |MERGE(pa{group_id: x.group_id, count: x.count}) )
return pa 
 
Here I have a shot of a part of the file: 
XMLBild2.png

It would be best to help if you provided the entire query and relevant xml.  

Anyways, to address your specific issue, you can try using this snippet of cypher:

MERGE(m:M_Label{id: 3})
MERGE(p:P_Label{id: 4})
MERGE(m)-[:IN]->(p) 
forEach(x in participantslist |
    MERGE(pa:Participants{group_id: x.group_id, count: x.count})
    MERGE (pa)-[:IN]->(m)
)

Replace the 'merge' on 'm' and 'p' with your specific node requirements. 

This is good, it worked like I wanted. Now I have to do this for all the values. Overall my plan is it to "traverse" the file and create Nodes/Properties, but not with every value. For example I dont want the milestonelist as a Node, BUT I need all the milestonelists to get through all the data, so I am working on smt like this, with the index:   

unwind range(0,size(milestonelist)-1) as y
forEach(in milestonelist[y] | ??????)) 

I have a link to the file as you asked for: 

https://1drv.ms/u/s!AkWorWNsEQW9gRFEDp74MKCrRt4Y?e=kybFCa

Here is the code I'm on. You can ignore the :Resource, its part of something I imported with neosemantics. 
 
call apoc.load.xml("file:///NCT0017xxxx/NCT00170157.xml") yield value

with value._children as root
with [in root where x._type = "clinical_results"][0]._children as clinicalresults,
     [in root where x._type = "id_info"][0]._children as idinfo

with [in clinicalresults where x._type = "participant_flow"][0]._children as participantflow,
     [in idinfo where x._type = 'nct_id'][0] as nctid

with [in participantflow where x._type = 'period_list'][0]._children as periodlist,
     nctid as nctid  
    
with [in periodlist where x._type = 'period'][0]._children as period, 
     nctid as nctid 
    
with [in period where x._type = 'title'][0] as title,
     nctid as nctid , 
     [in period where x._type = 'milestone_list'][0]._children as milestonelist 


with title as title,
     nctid as nctid , 
     [in milestonelist where x._type = 'milestone'][0]._children as milestone,
     milestonelist as milestonelist

with title as title,
     nctid as nctid, 
     [in milestone where x._type = 'title'][0] as mstitle, 
     [in milestone where x._type = 'participants_list'][0]._children as participantslist,
     milestonelist as milestonelist
 


with 
    [in participantslist | x.group_id] as groupids,
    [in participantslist | x.count] as counts,
    nctid as nctid,
    title as title,
    mstitle as mstitle,
    participantslist as participantslist,
    milestonelist as milestonelist  

unwind range(0,size(milestonelist)-1) as y

//Milestonelist
forEach(x in milestonelist[y] | ????)

MERGE(p:Period{title_name: title._text})  
MERGE(m:Milestone{title_name: mstitle._text}) 


 
MERGE(t:Trial{nct_id:nctid._text })
MERGE(r1:Resource{ns1__title:"Period"})
MERGE(r2:Resource{ns1__title:"name title"})
MERGE(r3:Resource{ns1__title:"Milestone"})
MERGE(r4:Resource{ns1__title:"Participant-Flow"})
MERGE(r5:Resource{ns1__title:"group id"})
MERGE(r6:Resource{ns1__title:"participant counts"})

//RELATIONS



MERGE (t)-[:CONTAIN]->(p) 
MERGE (t)-[:CONTAIN]->(m) 
MERGE (t)-[:CONTAIN]->(pa) 

MERGE (p)-[:SCO]->(r1)
MERGE (pa)-[:SCO]->(r4)
MERGE (m)-[:SCO]->(r3)

MERGE (p)-[:INCLUDE]->(r2)               
MERGE (m)-[:INCLUDE]->(r2)
MERGE (pa)-[:INCLUDE]->(r5)
MERGE (pa)-[:INCLUDE]->(r6)

MERGE (m)-[:IN]->(p)   



//Participants
forEach(in participantslist |
MERGE(pa:Participants{group_id: x.group_id, count: x.count})
MERGE (pa)-[:IN]->(m) 
MERGE (t)-[:CONTAIN]->(pa) 
)
*/


Thank you...seeing the xml makes it easier.  

Well, I don't think you need process the milestone_list with an index and a forEach loop. The milestone_list, once extracted, should contain a list of _children, which are the individual milestones. Thus, milestone_list._children will be a list of milestones. You can use the forEach loop on this list directly. 

with milestone_list._children as milestones
forEach(i in milestones |
 //process each milestone
)

It is not possible to do "milestone_list._children" , because of a type missmatch (expected a map, but was list) . 

This would work for example for the index 0  of milestonelist: 
with milestonelist[0]._children as milestone
forEach(in milestone | merge(z:Z) )
I just want to go through all the milestonelsts 
 
The participants worked because they was end-lvl-parts, with this code: 
forEach(in participantslist |
MERGE(pa:Participants{group_id: x.group_id, count: x.count})
MERGE (pa)-[:IN]->(m) 
MERGE (t)-[:CONTAIN]->(pa) 
)

 

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online