cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Unwind Multiple Arrays to Set Property

RedInk
Node Link

Greetings,

I have 2 different arrays to Unwind. I'm trying to use the first Unwind and a Where clause to later establish what nodes will be set with the second Unwind data.

Here are my arrays to unwind...

datasourcefieldname = ['field1', 'field2','field3','field4']

sourcedatabasename = ['database1','database2','database1','database2']

Here is the cypher I'm using

Match (ds:DataSource {Title: $datasourcename})-->(dsf:DSField)-->(dsfv:DSFieldVersion)
With ds, dsfv
Unwind $datasourcefieldname as fieldname
With ds, dsfv, fieldname
Where dsfv.Title = fieldname
Unwind $sourcedatabasename as sourcedatabase
Set dsfv.SourceDatabase = sourcedatabase

This partially works in that a dsfv.SourceDatabase is set for every dsfv, but my query is only giving the first property of 'database1' to every dsfv instead of giving 'database2' to the second and fourth dsfv.

What am I missing?

Many Thanks!

1 ACCEPTED SOLUTION

If I understand correctly, each element in datasourcefieldname corresponded with the similarly indexed element of sourcedatabasename. If so, try the following. Again, it assumes size(datasourcefieldname)=size(sourcedatabasename).

match (ds:DataSource {Title: $datasourcename})
unwind range(0,size($datasourcefieldname)-1) as index
Match (ds)-->(:DSField)-->(dsfv:DSFieldVersion{Title: $datasourcefieldname[index]})
Set dsfv.SourceDatabase = $sourcedatabasename[index]

The issue in your implementation is that you are looping over datasourcefieldname with the first 'with', and then looping over all values of sourcedatabasename for each value of datasourcefieldname. As a result, the value of 'dsfv.SourceDatabase' should be equal to the last element in sourcedatabasename for every 'dsfv' node. 

View solution in original post

7 REPLIES 7

ameyasoft
Graph Maven

Try this:

When the two arrays are supposed to be one-to-one matching, here is a way to identify the correct matches.

with ['field1', 'field2','field3','field4'] as f1,
['database1','database2','database1','database2'] as d1
//replace 'field2' with dsfv.Title
with apoc.coll.indexOf(f1, 'field2') as i1, d1
return d1[i1]
// Result:"database2"
Set dsfv.SourceDatabase = d1[i1]

Hi @ameyasoft , thanks for your response. I don't quite follow, the content in my arrays is not static/in my control, I can't replace anything in them.
Are you suggesting removing my Unwind statements altogether, if so, is what you're suggesting a solution to my issue or just a way to see if things match up?

Yes, that's correct.

If I understand correctly, each element in datasourcefieldname corresponded with the similarly indexed element of sourcedatabasename. If so, try the following. Again, it assumes size(datasourcefieldname)=size(sourcedatabasename).

match (ds:DataSource {Title: $datasourcename})
unwind range(0,size($datasourcefieldname)-1) as index
Match (ds)-->(:DSField)-->(dsfv:DSFieldVersion{Title: $datasourcefieldname[index]})
Set dsfv.SourceDatabase = $sourcedatabasename[index]

The issue in your implementation is that you are looping over datasourcefieldname with the first 'with', and then looping over all values of sourcedatabasename for each value of datasourcefieldname. As a result, the value of 'dsfv.SourceDatabase' should be equal to the last element in sourcedatabasename for every 'dsfv' node. 

RedInk
Node Link

Thanks so much @glilienfield, this works. If you have the time, I have a few follow up questions.

1. When you say "As a result, the value of 'dsfv.SourceDatabase' should be equal to the last element in sourcedatabasename for every 'dsfv' node.", what does this mean? I'm not sure what the "last element" is?

Aren't there 4 rows of $sourcedatabasename after Unwinding it? And every element is used?

2. What is appending [index] doing? Can you point me to documentation?

3. How might you rewrite this query if the sizing were different between size(datasourcefieldname) != size(sourcedatabasename)? I think this is why (in poor fashion) I was trying to say where dsfv.Title = fieldname, then the sourcedatabase would be applied where dsfv.Title = fieldname.

Many Thanks!

Let's go through what is happening. You have two lists of equal size (4 elements): datasourcefieldname and sourcedatabasename. In your code, you unwind datasourcefieldname first. What the unwind does is create a row of data for each element in the list your are unwinding. All the other data at the time gets appended to the row. The result of the first unwind is shown in the screenshot. As you can see, you get four rows, one row for each value of datasourcefieldname. Also shown is that the sourcedatabasename list is repeated for each row. 

Screen Shot 2022-08-26 at 1.31.20 PM.png

After the second unwind on sourcedatabasename, you will get the result shown in the following screenshot. As you can see, for each of the four rows above, the unwinding of sourcedatabasename creates four rows each, one row for each element in the sourcedatabasename list. We now have 16 rows, resulting from the 4 rows from datasourcefieldname and 4 rows from sourcedatabasename. 

with ['field1', 'field2','field3','field4'] as datasourcefieldname, ['database1','database2','database1','database2'] as sourcedatabasename
unwind datasourcefieldname as d
unwind sourcedatabasename as s
return *

Screen Shot 2022-08-26 at 1.40.02 PM.png

In your code, you use the value of datasourcefieldname from the first unwind to filter for the node represented by 'dsfv' (using the 'where' clause). That node is either matched to 'field1', 'field2', 'field3', or 'field4'. For each of these values, the second unwind results in four rows each, iterating over 'database1', 'database2', 'database1', and 'database2'; therefore, the 'SET' clause on the last line is executed four times for 'field1', four times for 'field2', etc. The final value of 'dsfv.SourceDatabase' will be the last value set, since the prior three will have been overwritten by the last one. The last element in the 'sourcedatabasename' list is 'database2'.  

As you can see, a double unwind does not work in your scenario. What you really want is to process the first element of the datasourcefieldname list with the first element of the sourcedatabasename list, and the second with the second, etc. I accomplished this by creating a list of four integers ([0,1,2,3]) using the 'range' function and unwinding the integer list. This gives the following result, were we have four rows, one for each index value. The datasourcefieldname and sourcedatabasename lists are repeated on each row.

Screen Shot 2022-08-26 at 1.55.47 PM.png

I can now process each corresponding element from both lists on each row by using the value of 'index' to extract each from their list. You can see this when I add datasourcefieldname[index] and sourcedatabasename[index] to the output. These values can now be used to perform the matches to get the correct nodes and create the relationship between them.

Screen Shot 2022-08-26 at 2.00.32 PM.png

It would be a different problem if the sizes of the two lists were different. In your case the first list represented the node to find and the second list represent a property value to set on the node from the first list. The solution would depend on what you are trying to accomplish. 

Thanks so much for the detailed explanation, and that was what was happening with the duplication.
I'll use your solution and amend how I bring these arrays into the post function (meaning there shouldn't be any nulls).
Many Thanks!