Neo4j

dionisios95 · ‎07-04-2022

Hello,

I have some XML-Files and I'm trying to get some values from it and put it on new Nodes.

Part of my Code:

call apoc.load.xml("file:///NCT0017xxxx/NCT00170157.xml") yield value
with value._children as root
......

with [x in title where x._type = 'title'][0] as title,
[x in nctid where x._type = 'nct_id'][0] as nctid,
[x in milestone where x._type = 'title'][0] as mstitle,
[x in milestone where x._type = 'participants_list'][0]._children as participantslist

with [x in title where x._type = 'title'][0] as title,
[x in nctid where x._type = 'nct_id'][0] as nctid,
[x in mstitle where x._type = 'title'][0] as mstitle,
[x in participantslist where x._type = 'participants'][0] as participants

........

Now I have the issue, that some level have multiple fields and I can only get one of those ([0] = first ).

One solution would be to unwind a range(start,max), but I dont know how to find out the value: maximum of each lvl on an xml-file. Thx for your help.

glilienfield · ‎07-05-2022

'participantslist' will be a list of the participants. You can get the number of participants in the list with 'size(participantslist)'. A single 'participant' should look something like:

{_type:"participants", group_id:'P1', count:'54'},

so the list 'participantslist' should look like:

[{_type:"participants", group_id:'P1', count:'54'}, {_type:"participants", group_id:'P2', count:'58'}]

What information do you want from the list? You can use list comprehension to extract the group_ids, such as:

[x in participantslist | x.group_id]

dionisios95 · ‎07-06-2022

The list comprehension worked well, so I tried this:

....

with [x in title where x._type = 'title'][0] as title,

[x in nctid where x._type = 'nct_id'][0] as nctid,

[x in mstitle where x._type = 'title'][0] as mstitle,

[x in participantslist where x._type = 'participants'][0] as participants,

[x in participantslist | x.group_id] as groupids,

[x in participantslist | x.count] as counts,

unwind groupids as groupid

unwind counts as count

...

MERGE(pa:Participants{group_id: groupid, count: count})

MERGE(t:Trial{nct_id:nctid._text })

MERGE (t)-[:CONTAIN]->(pa)

But the output is an cartesian product with both properties(groupid,count). So I got 4 Nodes instead of 2.

Then I tried that with a counter-attribute:

....

unwind range(0,size(participantslist)) as counters

with

counters as counters,

..... ,

[x in participantslist where x._type = 'participants'][0] as participants,

MERGE(pa:Participants{group_id: participants.groupids[counters], count: participants.count[counters]})

counters value is: 0 1 2 in a column but when it comes to the merge I get this Error: (54 is the first count-value) .

The line toInteger(counters) as counters also does not affect to the output

`String("54")` is not a collection or a map. Element access is only possible by performing a collection lookup using an integer index, or by performing a map lookup using a string key (found: String("54")[Long(0)])

dionisios95 · ‎07-06-2022

The first one combined with the counter also doesnt work:

unwind range(0,size(participantslist)) as counters

with

[x in participantslist where x._type = 'participants'][0] as participants,

[x in participantslist | x.group_id] as groupids,

[x in participantslist | x.count] as counts,

unwind groupids as groupid

unwind counts as count

MERGE(pa:Participants{group_id: groupid[counters], count: count[counters]})

...

Error: (P1 is a group_id-value)

`String("P1")` is not a collection or a map. Element access is only possible by performing a collection lookup using an integer index, or by performing a map lookup using a string key (found: String("P1")[Long(0)])

glilienfield · ‎07-07-2022

Yes, the double 'unwind' will do that. For illustration purpose, the following query gives the results below. The reason is the first unwind gives you each value of the first list with the entire second list appended to each element. When you unwind the second list, it is done for each row, which represents a different value of the first list, thus you cross product of both lists.

with ['a', 'b'] as list1,  ['c', 'd'] as list2
unwind list1 as itemFromList1
unwind list2 as itemFromList2
return itemFromList1, itemFromList2

it is getting a little complicated following the code without having access to the xml and your full query, but I can make this suggestion. Instead of extracting the 'group_id' values into its own list and the 'count' values into its own list, you can process the each pair of 'group_id' and 'count' together.

Assuming 'participantslist' looks like the following:

[{_type:"participants", group_id:'P1', count:'54'}, {_type:"participants", group_id:'P2', count:'58'}]

you could do something like the following:

forEach(x in participantslist |
MERGE(pa:Participants{group_id: x.group_id, count: x.count})
)

BTW- when you did this 'unwind range(0,size(participantslist)) as counters ', you will need to set the max range to size(participantslist)-1.

dionisios95 · ‎07-07-2022

Thank you for your advice, I am a step forward, but how can I use the variable "pa", to use it?

My plan was:

merge(pa:Participants)

forEach(x in participantslist |MERGE(pa{group_id: x.group_id, count: x.count}) )

MERGE(t:Trial{nct_id:nctid._text })

MERGE (pa)-[:IN]->(m)-[:IN]->(p)

........

The good thing is that the 2 Nodes got the right properties but without a Label(Participants). And they are still not bound to the other Nodes, because I have no access to the inside of the foreach(pa).

This code gave me back an empty Participants-Node, so it does not merge well :

merge(pa:Participants)

forEach(x in participantslist |MERGE(pa{group_id: x.group_id, count: x.count}) )

return pa

Here I have a shot of a part of the file:

glilienfield · ‎07-08-2022

It would be best to help if you provided the entire query and relevant xml.

Anyways, to address your specific issue, you can try using this snippet of cypher:

MERGE(m:M_Label{id: 3})
MERGE(p:P_Label{id: 4})
MERGE(m)-[:IN]->(p) 
forEach(x in participantslist |
    MERGE(pa:Participants{group_id: x.group_id, count: x.count})
    MERGE (pa)-[:IN]->(m)
)

Replace the 'merge' on 'm' and 'p' with your specific node requirements.

dionisios95 · ‎07-08-2022

This is good, it worked like I wanted. Now I have to do this for all the values. Overall my plan is it to "traverse" the file and create Nodes/Properties, but not with every value. For example I dont want the milestonelist as a Node, BUT I need all the milestonelists to get through all the data, so I am working on smt like this, with the index:

unwind range(0,size(milestonelist)-1) as y

forEach(x in milestonelist[y] | ??????))

I have a link to the file as you asked for:

https://1drv.ms/u/s!AkWorWNsEQW9gRFEDp74MKCrRt4Y?e=kybFCa

dionisios95 · ‎07-08-2022

Here is the code I'm on. You can ignore the :Resource, its part of something I imported with neosemantics.

call apoc.load.xml("file:///NCT0017xxxx/NCT00170157.xml") yield value

with value._children as root

with [x in root where x._type = "clinical_results"][0]._children as clinicalresults,

[x in root where x._type = "id_info"][0]._children as idinfo

with [x in clinicalresults where x._type = "participant_flow"][0]._children as participantflow,

[x in idinfo where x._type = 'nct_id'][0] as nctid

with [x in participantflow where x._type = 'period_list'][0]._children as periodlist,

nctid as nctid

with [x in periodlist where x._type = 'period'][0]._children as period,

nctid as nctid

with [x in period where x._type = 'title'][0] as title,

nctid as nctid ,

[x in period where x._type = 'milestone_list'][0]._children as milestonelist

with title as title,

nctid as nctid ,

[x in milestonelist where x._type = 'milestone'][0]._children as milestone,

milestonelist as milestonelist

with title as title,

nctid as nctid,

[x in milestone where x._type = 'title'][0] as mstitle,

[x in milestone where x._type = 'participants_list'][0]._children as participantslist,

milestonelist as milestonelist

with

[x in participantslist | x.group_id] as groupids,

[x in participantslist | x.count] as counts,

nctid as nctid,

title as title,

mstitle as mstitle,

participantslist as participantslist,

milestonelist as milestonelist

unwind range(0,size(milestonelist)-1) as y

//Milestonelist

forEach(x in milestonelist[y] | ????)

MERGE(p:Period{title_name: title._text})

MERGE(m:Milestone{title_name: mstitle._text})

MERGE(t:Trial{nct_id:nctid._text })

MERGE(r1:Resource{ns1__title:"Period"})

MERGE(r2:Resource{ns1__title:"name title"})

MERGE(r3:Resource{ns1__title:"Milestone"})

MERGE(r4:Resource{ns1__title:"Participant-Flow"})

MERGE(r5:Resource{ns1__title:"group id"})

MERGE(r6:Resource{ns1__title:"participant counts"})

//RELATIONS

MERGE (t)-[:CONTAIN]->(p)

MERGE (t)-[:CONTAIN]->(m)

MERGE (t)-[:CONTAIN]->(pa)

MERGE (p)-[:SCO]->(r1)

MERGE (pa)-[:SCO]->(r4)

MERGE (m)-[:SCO]->(r3)

MERGE (p)-[:INCLUDE]->(r2)

MERGE (m)-[:INCLUDE]->(r2)

MERGE (pa)-[:INCLUDE]->(r5)

MERGE (pa)-[:INCLUDE]->(r6)

MERGE (m)-[:IN]->(p)

//Participants

forEach(x in participantslist |

MERGE(pa:Participants{group_id: x.group_id, count: x.count})

MERGE (pa)-[:IN]->(m)

MERGE (t)-[:CONTAIN]->(pa)

)

*/

glilienfield · ‎07-09-2022

Thank you...seeing the xml makes it easier.

Well, I don't think you need process the milestone_list with an index and a forEach loop. The milestone_list, once extracted, should contain a list of _children, which are the individual milestones. Thus, milestone_list._children will be a list of milestones. You can use the forEach loop on this list directly.

with milestone_list._children as milestones
forEach(i in milestones |
 //process each milestone
)

dionisios95 · ‎07-09-2022

It is not possible to do "milestone_list._children" , because of a type missmatch (expected a map, but was list) .

This would work for example for the index 0 of milestonelist:

with milestonelist[0]._children as milestone

forEach(x in milestone | merge(z:Z) )

I just want to go through all the milestonelsts

The participants worked because they was end-lvl-parts, with this code:

forEach(x in participantslist |

MERGE(pa:Participants{group_id: x.group_id, count: x.count})

MERGE (pa)-[:IN]->(m)

MERGE (t)-[:CONTAIN]->(pa)

)

Neo4j

XML-File - max.-value for levels ?