Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-29-2021 05:50 PM
Ideally, I want to keep the original form of strings in the graph, so I want to store the case sensitive form. However, at query time. I want to ignore the case sensitivity by using regular expression search as below:
WHERE m.name =~ '(?i)neo'
The problem with regular expression search is that it doesn't take usage of indexes, thus reducing speed a lot. I tested one query and the time spent are 1ms VS 120 ms respectively observed in the browser.
If I always store lowercase forms of string and at query time I convert query string into lower case, this works best for performance, but it looks bad in the graph, i.e. netbase instead of NetBase.
Is there any suggestion on handling case more efficiently?
01-29-2021 09:40 PM
Try this:
Use apoc.text.clean
apoc.text.clean(x.value) = apoc.text.clean('Hxb')
apoc.text.clean('Hxb') = 'hub'
01-31-2021 08:08 AM
This converts lower case to upper case and store the lower case in the graph. My question is whether there is a way to store the original form and at query time still do case insensitive matching without using regular expression matching?
01-31-2021 10:38 AM
Say you have a node property as: a.name = "NetBase".
You can query this:
WHERE apoc.text.clean(a.name) = apoc.text.clean("net base")
apoc.text.clean(a.name) = "netbase"
apoc.text.clean("net base") = "netbase"
apoc.text.clean removes blank spaces, special characters and gives the result all in lower case as a continuous string.
02-01-2021 10:46 AM
I see. But the testing shows that adding a function apoc.text.clean() or ToLower() significantly degraded the same query performance compared with the pure equality checking in the Where clause, almost the same as using regular expression. So probably it doesn't make use of the index either.
02-01-2021 12:42 PM
Another approach that is working for me is to add another property and store the lower case value. Then query against this lower case value.
a.name = "NetBase", a.lowerName = "netbase"
02-01-2021 01:28 PM
Yes. That's probably is a good approach to go.
02-01-2021 05:54 PM
Alternately you can use fulltext schema indexes, which you can create and query using procedures, which support case insensitive lookup. That way you won't have to store multiple properties.
02-01-2021 06:50 PM
@andrew.bowman The fulltext search schema indexes use Lucene to create indexes and return results based on probabilistic ranking. That's a keyword based search and is useful for a lot of cases. However, if I just want to do exact match except for case insensitivity, I can't use fulltext search. Right, because fulltext does partial matches?
02-01-2021 07:06 PM
You can almost get there. If we quote the string you're searching for, it will do a phrase search, case insensitive, but in the field there may be words before or after the phrase.
So you could use this to quickly narrow down possibilities, and then do one additional round of filtering on the results to find your exact match:
CALL db.index.fulltext.queryNodes("title", '"the matrix"') YIELD node, score
WHERE toLower(node.title) = 'the matrix'
RETURN node
02-01-2021 07:33 PM
@andrew.bowman That's a smart way to do this, and thanks. But when I tested it in the browser, the warning says that
One of the property names in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing property name is: NameIndex)
'NameIndex' is the index name I created for the 'name' property. And the full query is below:
CALL db.index.fulltext.queryNodes("NameIndex", '"the matrix"') YIELD node, score
WHERE toLower(node.NameIndex) = “the matrix"
return node, score
If I remove the WHERE clause, I will get names containing the keyword 'matrix'.
Anything wrong in my query?
02-02-2021 03:21 PM
@andrew.bowman Please take a look at my issue (comment) above and see whether there is a solution to that. Thanks!
02-03-2021 01:48 PM
You're confusing the name of the index (NameIndex
) with the name of the property on the node (name
) which is used to populate the NameIndex.
The CALL is against the NameIndex, that's done properly.
The WHERE clause is acting on the node, and you want to filter on the node's property, so you should use node.name
here.
02-03-2021 02:11 PM
However, my NameIndex is built on a few properties, 'name', 'alias_name', 'foreign_name', just like this example, in which the index is built upon both 'title' & 'description':
fulltext.createNodeIndex("titlesAndDescriptions",["Movie", "Book"],["title", "description"])
In such a case, the filtering based on WHERE clause may be impossible?
02-03-2021 02:18 PM
It's possible, but you may need to make use of coalesce()
or have more complex boolean logic on your filtering.
If, on a node, only one of those properties exists, but not the others, then you can use coalesce()
across the properties, and the first non-null found will be used.
Otherwise, you can use a more complex boolean logic with OR'd terms, or use a list predicate, something like:
...
WHERE any(prop IN ['name', 'alias_name', 'foreign_name'] WHERE toLower(node[prop]) = 'the matrix')
...
02-03-2021 02:56 PM
I will try the 2nd option and it should work. Thanks!
02-03-2021 05:38 PM
I received this message:
Expected a string value for
toLower, but got: StringArray[Basire]; consider converting it to a string with toString().
I think the reason is that, my 'name' and 'foreign_name' property is string, but 'alias_name' is created as an array property. Then how to handle this situation? I tried to add another embedded WHERE clause but I failed, as below:
CALL db.index.fulltext.queryNodes("NameIndex", "the matrix")
YIELD node
WHERE any(prop IN ['name', 'foreign_name', 'alias_name'']
WHERE any (p in node[prop])
WHERE p ='the matrix')
RETURN properties(node) as properties LIMIT 1
Also, the above query assumes all are array properties, but 'name' is a string property.
02-03-2021 07:13 PM
@andrew.bowman I just realized that if I created array properties, i.e.
node.alias_name = ['US', 'USA', 'U.S.A']
node.name = 'United States'
Even if I created NameIndex to include the alias_name, it won't work because full_text only works with string values, not array. Is that right?
CALL db.index.fulltext.createNodeIndex('NameIndex', ['Country'], ['name', 'foreign_name', 'alias_name'])
To take advantage of the full_text index, I need to change my model, for example, to create separate nodes for array values, instead of treating them as properties.
02-05-2021 09:12 AM
I believe that's so.
02-02-2021 05:13 PM
Regarding " If we quote the string you're searching for, it will do a phrase search, case insensitive, but in the field there may be words before or after the phrase.". In Lucene, isn't a phrase search for strings with double quotes an exact search? If yes, why 'in the field there may be words before or after the phrase'?
According to this documentation, the quotes should return exact matches, not partial matches:
But my test shows that even if the string is quoted, it still returns non-exact matches.
All the sessions of the conference are now available online