Neo4j

lingvisa · ‎10-22-2020

New Neo4j version.

My graph db has only 130,000 nodes and 5 labels. I do this simple counting:

MATCH (m:Product) WHERE m.effect='corloring' AND m.source='web' RETURN count(m) as count

In the Neo4j Brower, I can notice it below after the query:
Started streaming 1 records in less than 1 ms and completed after 107 ms.

A '107 ms' time seems long for a single query for a tiny graph. I am on a powerful Mac Pro machine. Is this normal for neo4j query? For offline queries, it's fine.

If I build indexes for the 2 properties in the WHERE clause, would be it much faster?

jggomez · ‎10-22-2020

Hi, The query may take a long time the first time but it could be less afterwards. The indexes can improve the query. You could create an compose index.

Try it!!

Thanks

mdfrenchman · ‎10-22-2020

Yes that's normal. Putting an index on the two properties you're using to filter will speed that up.

As @jggomez mentioned, the first time the query runs it'll typically be a lot slower. The second and subsequent runs the query plan is cached so returns much quicker.

Also, to use the query caching, if you are going to pass in the effect or source as more than just "coloring" and "web", you will want to make use of parameters as they will allow for reuse of the qeury via the cache.

Example of what I mean:
This is better

MATCH (m:Product) WHERE m.effect = $effect AND m.source = $source 
RETURN count(*)

than different queries like this

MATCH (m:Product) WHERE m.effect = 'coloring' AND m.source = 'web' 
RETURN count(*)

MATCH (m:Product) WHERE m.effect = 'coloring' AND m.source = 'print' 
RETURN count(*)

lingvisa · ‎10-22-2020

Does the $effect and $source make a difference? In my code they are already variables, since they are passed into the function via the generic variable name 'property_name' and 'property_value'. In this case, property_name is 'effect' and value is 'coloring'.

Par of the function looks like:

def create_where_clause(WHERE, channel, properties):
        property_clause = ''
        for property in properties:
            property_clause += " m." + property.name + "='" + property.value + "'" + " AND"

        if property_clause:
            if not WHERE:
                WHERE = "WHERE " + property_clause
            else:
                WHERE += property_clause

        if WHERE and WHERE.endswith(' AND'):
            WHERE = WHERE[:len(WHERE) - 4]

        return WHERE

I dont know whether this way is what you meant by using variables. To my understanding, the cypher statement composed this way is still a raw string eventually.

cypher = match_clasue + where_clasue + return clause + limit_clause

By the time cypher is executed by tx.run(cypher), it already becomes an explicit cypher query without variables, even though variable names are used to parsing and composing the query statement.

mdfrenchman · ‎10-22-2020

It has to be a variable when it reaches neo4j for it to work as a variable.
So what you have, as you said, is still an explicit string with no variables and would not take advantage of the query caching.

Although, depending on size, and indexes, you might not need it.

lingvisa · ‎10-23-2020

So, as you can see, my whole cypher is composed based on function arguments, and these arguments can be optionally null. Take the code above as one example, both 'properties' and channel can be None, and 'properties' is a list of property.name and property.value, and whole WHERE clause can be optional. The template of my query composition is:

cypher = match_clasue + where_clasue + return clause + limit_clause

This allows me to design generic user interface like:

@not_all_none
def node_of_relation(*, property: str=None, relation: RelType=None, node_type: NodeType=None, channel: WBChannel=None, limit: int=3) -> List:
    return KG_APP.node_of_relation(property, relation, node_type, channel, limit)

Users then can do:

Query nodes with a relation specification: any node pairs with this relation. The label is the node type.
node_of_relation(relation=RelType.brandHasTopProd, label=WBChannel.Computer, limit=3)
Query nodes(neighbors) with a specification of a property (whatever relations between the node and target nodes). The property is to identify the starting node.
node_of_relation(property='id=00001', label=WBChannel.Computer, limit=3
Query nodes with a specific relation specification, and the starting node with the property specification. The property string is parsed as a list of property_name and property_value pair.
node_of_relation(relation=RelType.competitiveProduct, property='color=White;price=$100')

Any of the node_of_relation could be optional and it is very flexible for users to use. and that's how the way my cypher statement is composed with the template of:

cypher = match_clasue + where_clasue + return clause + limit_clause

The problem might be the fact that I can not make use of query caching like this:

query = (
            "MATCH (p:Computer) "
            "WHERE p.name = $name AND $price"
            "RETURN p.name AS name"
        )
        result = tx.run(cypher, name=$name, price=$price)

Neo4j

Is this query speed normal?