Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-23-2020 12:39 PM
MATCH (n:Product) WHERE n.name =~ "(?i)Apple" OR n.type =~ "(?!)phone" OR n.model =~ "(?i)plus" RETURN n
Solved! Go to Solution.
05-24-2020 04:54 PM
@lingvisa You are using Py2neo right?
You can pass channel and name as parameters:
cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
"statements": [
{
"statement": cypher,
"parameters": {
"channel": "(?i)" + channel,
"name": "(?i)" + name,
"nproduct": int(limit_count)
},
}
]
}
Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)
For example if you have only a Product and item labels:
label = str(label).lower()
if label == "product":
cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
print("Label invalid!")
05-23-2020 01:12 PM
05-23-2020 03:05 PM
Hi lingvisa,
i'm not an expert, but maybe you can explore indexes to improve speed of the query.
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.htm...
I'm sure Maxime can help you on that if that's what you are looking for.
brgs
JAS
05-23-2020 03:09 PM
Yeah @a10554, you right
If you want your index to be unique like a product id, use UNIQUE CONSTRANITS
DOC: https://neo4j.com/docs/cypher-manual/current/administration/constraints/
Regards,
Cobra
05-23-2020 05:50 PM
Yes, Cobra and a10554:
I have indexes and constraints. The reason I asked this question was that my program suddenly slowed down a lot and I suspect it was because I made this change:
WHERE n.name =~ "(?i)" + name + " OR n.type =~ "(?!)" + type + " OR n.model =~ "(?i)" + model"
name, type and model are parameters passed into a function.
Originally, it was exact match like:
Where n.name =
But I don't think this change should cause slow down too much. I need to investigate more on the change of speed.
05-24-2020 02:49 AM
Hello @lingvisa,
Indeed, regexs are faster , one more thing you can try is to build regexs in parameters and not in the cypher request
Regards,
Cobra
05-24-2020 03:35 AM
Hi lingvisa,
if you find the slowdown is because of the changes you made to the query, and you really want to speed up things, may i suggest you create a name_2 and model_2 properties with lowercase data and then match them with your parameters also in lowercase ?
05-24-2020 11:56 AM
05-24-2020 11:59 AM
@cobra was thinking if there are millions of nodes, going through regex over those millions of nodes would be computacionaly intensive, i guess. Having seen some practices done in business intelligence to achieve better performances by having redundant info doesn't shock me anymore. (check "Star schema" from Kimball)
05-24-2020 12:03 PM
If it's like other languages and I think it is, regex will be always faster than anything
I see what you mean but you can have performance and a clean database In this case maybe when he load the data, he can format them directly correctly to avoid to do this in the query, there are lot of improvments around that could be possible
05-24-2020 04:31 PM
def get_nodes_by_name(channel, label, name, limit_count=500000):
"""
Get node by matching n.name, n.znname, n.enname As long as one matches, return
"""
cypher = "MATCH (n:" + label + ") WHERE n.channel = \"" + channel + "\" AND (n.name =~ \"(?i)" + name + "\" OR n.znname =~ \"(?i)" + name + "\" OR n.enname =~ \"(?i)" + name + "\") RETURN n LIMIT $nproduct"
data = {
"statements": [
{
"statement": cypher,
"parameters": {
"name_value": name,
"nproduct": limit_count
},
}
]
}
r = requests.post(URL_V4, headers=headers, json=data)
#print(label, name)
rows = r.json()["results"][0]["data"]
nodes = []
for row in rows:
node= row['row'][0]
nodes.append((node))
return nodes
Cobra, The above is one of my typical queries composed through a function call. How would you improve it in this case?
05-24-2020 04:54 PM
@lingvisa You are using Py2neo right?
You can pass channel and name as parameters:
cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
"statements": [
{
"statement": cypher,
"parameters": {
"channel": "(?i)" + channel,
"name": "(?i)" + name,
"nproduct": int(limit_count)
},
}
]
}
Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)
For example if you have only a Product and item labels:
label = str(label).lower()
if label == "product":
cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
print("Label invalid!")
05-24-2020 04:59 PM
Cobra, this looks a lot cleaner. Actually, I am using the original Neo4j HTTP API. It's my own function definition. I will use this syntax.
I have a few labels in a specific channel (domain). So the ifelse should work for better performance.
05-24-2020 05:09 PM
Oh I see, I never used the Neo4j HTTP API
These things will make your code better, cleaner and resilient
05-24-2020 05:20 PM
That makes sense. I will have another query improvement in a similar fashion which is composed very similar but is used in a Neo4j Driver API scenario. I will ask later.
05-24-2020 05:21 PM
No problem, I will be happy to help you
I always use Python Neo4j driver
Regards,
Cobra
05-29-2020 12:26 PM
I agree with @a10554 that regex useage is going to be slower than native match terms. I love regex’s when they can get me out of a bind, or parse unstructured/semi-structured data, but I do t think they benefit much from the index. I would guess that they are at least as slow as a CONTAINS, which has to check every character in a string. In contrast STARTS WITH is fast ant totally makes use of the index. I think ENDS WITH is finally benefiting from the index, via a trick where the the search reverses word character sequence...so not as fast as STARTS WITH. But that’s a hint that you don’t have to make the match in a single step. You may be able to do an intermediate search that reduces the size of your matched objects, then use the surgical regex to make the final match.
05-30-2020 01:58 AM
You should use fulltext indexes which are optimized for this can handle multiple properties and have full lucene syntax support
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-full-text-search/
05-30-2020 03:09 AM
05-30-2020 04:15 AM
No they don’t have to be unique and can also be larger texts
05-30-2020 11:38 PM
@lingvisa, I don’t know, all these OR =? seems pretty sql’y to me. If you really want to query against the properties like that, why not move them to unique nodes instead?
All the sessions of the conference are now available online