Neo4j

lingvisa · ‎05-23-2020

MATCH (n:Product) WHERE n.name =~ "(?i)Apple" OR n.type =~ "(?!)phone" OR n.model =~ "(?i)plus" RETURN n

Cobra · ‎05-24-2020

@lingvisa You are using Py2neo right?

You can pass channel and name as parameters:

cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
    "statements": [
        {
             "statement": cypher,
             "parameters": {
                 "channel": "(?i)" + channel,
                 "name": "(?i)" + name,
                 "nproduct": int(limit_count)
              },
         }
     ]
}

Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)

For example if you have only a Product and item labels:

label = str(label).lower()
if label == "product":
    cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
    cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
    print("Label invalid!")

View solution in original post

Cobra · ‎05-23-2020

Hello @lingvisa,

I think you cannot do more efficient

Regards,
Cobra

a10554 · ‎05-23-2020

Hi lingvisa,

i'm not an expert, but maybe you can explore indexes to improve speed of the query.
https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/index.htm...

I'm sure Maxime can help you on that if that's what you are looking for.

brgs
JAS

Cobra · ‎05-23-2020

Yeah @a10554, you right

If you want your index to be unique like a product id, use UNIQUE CONSTRANITS
DOC: https://neo4j.com/docs/cypher-manual/current/administration/constraints/

Regards,
Cobra

lingvisa · ‎05-23-2020

Yes, Cobra and a10554:

I have indexes and constraints. The reason I asked this question was that my program suddenly slowed down a lot and I suspect it was because I made this change:
WHERE n.name =~ "(?i)" + name + " OR n.type =~ "(?!)" + type + " OR n.model =~ "(?i)" + model"

name, type and model are parameters passed into a function.

Originally, it was exact match like:
Where n.name =

But I don't think this change should cause slow down too much. I need to investigate more on the change of speed.

Cobra · ‎05-24-2020

Hello @lingvisa,

Indeed, regexs are faster , one more thing you can try is to build regexs in parameters and not in the cypher request

Regards,
Cobra

a10554 · ‎05-24-2020

Hi lingvisa,

if you find the slowdown is because of the changes you made to the query, and you really want to speed up things, may i suggest you create a name_2 and model_2 properties with lowercase data and then match them with your parameters also in lowercase ?

Cobra · ‎05-24-2020

@lingvisa, why did you add + " after each conditions?

This should be enough normally? WHERE n.name =~ "(?i)" + name OR n.type =~ "(?!)" + type OR n.model =~ "(?i)" + model

@a10554 I don't think that is a good practise and the regex which starts by (?i) is case insensitive

Regards,
Cobra

a10554 · ‎05-24-2020

@cobra was thinking if there are millions of nodes, going through regex over those millions of nodes would be computacionaly intensive, i guess. Having seen some practices done in business intelligence to achieve better performances by having redundant info doesn't shock me anymore. (check "Star schema" from Kimball)

Cobra · ‎05-24-2020

If it's like other languages and I think it is, regex will be always faster than anything

I see what you mean but you can have performance and a clean database In this case maybe when he load the data, he can format them directly correctly to avoid to do this in the query, there are lot of improvments around that could be possible

lingvisa · ‎05-24-2020

def get_nodes_by_name(channel, label, name, limit_count=500000):
    """
    Get node by matching  n.name, n.znname, n.enname As long as one matches, return
    """
    cypher = "MATCH (n:"  + label + ") WHERE n.channel = \"" + channel + "\" AND (n.name =~ \"(?i)" + name + "\" OR n.znname =~ \"(?i)" + name + "\" OR n.enname =~ \"(?i)" + name + "\") RETURN n LIMIT $nproduct"
    data = {
        "statements": [
            {
                "statement": cypher,
                "parameters": {
                    "name_value": name,
                    "nproduct": limit_count
                },
            }
        ]
    }
    r = requests.post(URL_V4, headers=headers, json=data)
    #print(label, name)
    rows = r.json()["results"][0]["data"]
    nodes = []

    for row in rows:
        node= row['row'][0]
        nodes.append((node))

    return nodes

Cobra, The above is one of my typical queries composed through a function call. How would you improve it in this case?

Cobra · ‎05-24-2020

@lingvisa You are using Py2neo right?

You can pass channel and name as parameters:

cypher = "MATCH (n:"+label+") WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
data = {
    "statements": [
        {
             "statement": cypher,
             "parameters": {
                 "channel": "(?i)" + channel,
                 "name": "(?i)" + name,
                 "nproduct": int(limit_count)
              },
         }
     ]
}

Moreover, if you don't have a lot of different Labels, I advice you to do a ifelse condition and to write directly the label in cypher request (to avoid injection for example), for this request, I don't think there is any problem but it's better to use good practices:)

For example if you have only a Product and item labels:

label = str(label).lower()
if label == "product":
    cypher = "MATCH (n:Product) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
elif label == "item":
    cypher = "MATCH (n:Item) WHERE n.channel =~ $channel AND (n.name =~ $name OR n.znname =~ $name OR n.enname =~ $name) RETURN n LIMIT $nproduct"
else:
    print("Label invalid!")

lingvisa · ‎05-24-2020

Cobra, this looks a lot cleaner. Actually, I am using the original Neo4j HTTP API. It's my own function definition. I will use this syntax.

I have a few labels in a specific channel (domain). So the ifelse should work for better performance.

Cobra · ‎05-24-2020

Oh I see, I never used the Neo4j HTTP API

the parameters are here to speed up your query because the database will recognize your request
regexs make your query robustness because it is case insentive for channel and name parameters in your case
you can cast the limit_count to be sure to have an int and not something else
the ifelse on labels will prevent injection and unwanted behaviour

These things will make your code better, cleaner and resilient

lingvisa · ‎05-24-2020

That makes sense. I will have another query improvement in a similar fashion which is composed very similar but is used in a Neo4j Driver API scenario. I will ask later.

Cobra · ‎05-24-2020

No problem, I will be happy to help you

I always use Python Neo4j driver

Regards,
Cobra

mojo2go · ‎05-29-2020

I agree with @a10554 that regex useage is going to be slower than native match terms. I love regex’s when they can get me out of a bind, or parse unstructured/semi-structured data, but I do t think they benefit much from the index. I would guess that they are at least as slow as a CONTAINS, which has to check every character in a string. In contrast STARTS WITH is fast ant totally makes use of the index. I think ENDS WITH is finally benefiting from the index, via a trick where the the search reverses word character sequence...so not as fast as STARTS WITH. But that’s a hint that you don’t have to make the match in a single step. You may be able to do an intermediate search that reduces the size of your matched objects, then use the surgical regex to make the final match.

michael_hunger · ‎05-30-2020

You should use fulltext indexes which are optimized for this can handle multiple properties and have full lucene syntax support

https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-full-text-search/

Cobra · ‎05-30-2020

Hello @michael.hunger

To use them, string properties must be unique?

Regards,
Cobra

michael_hunger · ‎05-30-2020

No they don’t have to be unique and can also be larger texts

peter_osterdahl · ‎05-30-2020

@lingvisa, I don’t know, all these OR =? seems pretty sql’y to me. If you really want to query against the properties like that, why not move them to unique nodes instead?

Neo4j

Is there a more efficient version of this query?