Neo4j

mcentellesv · ‎12-28-2020

Hi,

I'm writing a LOAD CSV clause to import a data set from a csv file. There is a column (called ORIGINAL) including some values within this column are like:

J'irai où tu iras (I'll Go Where You Go)

I want to split the string into two or more values based on the separator "(". So I pass the clause

split(row.ORIGINAL, "(") AS titolOriginal,

As a result, I get a set of lists containing the splitted titles, like:

["J'irai où tu iras ", "I'll Go Where You Go)"]

I want to know how can I get a result like

["J'irai où tu iras", "I'll Go Where You Go"]

That is: erasing the blank at the end of the first value, and erasing the ending ")" at the end of the second value.

TIA

Miquel Centelles

ameyasoft · ‎12-28-2020

Try this:

with "J'irai où tu iras (I'll Go Where You Go)" as p1
with replace(p1, ")", "") as p2
with split(p2, "(") AS titolOriginal
return titolOriginal

result: ["J'irai où tu iras ", "I'll Go Where You Go"]

clem · ‎12-28-2020

A more general solution is to use APOC regular expression groups (which is similar to regular expression groups in other regex libraries).

WITH "J'irai où tu iras (I'll Go Where You Go)" AS text
WITH apoc.text.regexGroups(text, "([^(]*)\\(([^)]*)" ) AS ListOfList
with ListOfList[0] as groups
return groups[1], groups[2]

Some explanation:
([^(]*) is group 1. The outer open/close parens surround the text to be grouped (the French text). The [^(]* matches text includes everything up to but not including the first open paren.

\\( indicates the literal open paren (you need to have \\ before it so it doesn't get interpreted as a grouping open paren.

([^)]*) is group 2. The outer open/close parens surround the text to be grouped (the English text). The [^)]* matches text includes everything up to but not including the first close paren.

I'm a little surprised/annoyed by the return value of the APOC. It's a list of a list (which is not pointed out in the documentation.)

[["J'irai où tu iras (I'll Go Where You Go","J'irai où tu iras ","I'll Go Where You Go"]]

So, you have to take the first element of the list of the list to get the list of groups. Group 0 is the full match.

You can also trim() the leading and trailing spaces:

trim(group[1])

Neo4j

Splitting a value into two different values