Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-02-2020 05:48 AM
Hi,
I am new user of neo4j. I am using neo4j 3.5 now.
My first question is how to show the query plan like this, which can show the performance data clearer.
Compiler CYPHER 4.1
Planner COST
Runtime INTERPRETED
Runtime version 4.1
+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Page Cache Hits | Page Cache Misses | Page Cache Hit Ratio |
+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| +ProduceResults | person | 1 | 1 | 0 | 0 | 0 | 0.0000 |
| | +-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
| +NodeIndexSeek | person:Person(firstname, surname) WHERE firstname STARTS WITH $autostring_0 AND exists(surname) | 1 | 1 | 2 | 0 | 0 | 0.0000 |
+-----------------+-------------------------------------------------------------------------------------------------+----------------+------+---------+-----------------+-------------------+----------------------+
Total database accesses: 2, total allocated memory: 0
Second question: I create an index on id, but it seems not working.
CREATE INDEX ON :Artifact(id)
Then I execute this cypher:
explain Match (some:Artifact)-[:DEPEND_ON*]->(a:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"}) where not exists(()-[:DEPEND_ON]->(some)) and 0<ID(some)<1000 return distinct some.gav
The result is shown below:
08-02-2020 08:08 AM
You have to create index on property not on ID. The internal graph id "ID(some)" does not need an index.
Also, it is not good idea to use the ID's as part of your queries like this, as node id's may not be as sequential for a node as you think.
You need index on "gav" property here.
CREATE INDEX ON :Artifact(gav)
This will use index.
08-02-2020 09:40 AM
Thanks.
The requirement is to query all the Artifact nodes that directly or indirectly depend on the Artifact with gav:org.slf4j:slf4j-api:1.7.21
.
This graph is built up from a Maven Central Repository (which is a java library repository). And the org.slf4j:slf4j-api:1.7.21 Artifact has a tons of other Artifacts depend on it either directly or indirectly.
The simplest data model is like this:
My solution for the requirement is like this:
Step 1: Find all the Artifact nodes which don't have other Artifacts depend on them. Cause if they have, it makes these nodes also depend on org.slf4j:slf4j-api:1.7.21
. get result a
.
Step 2: Filter the result a
to get which of them truly depend_on* org.slf4j:slf4j-api:1.7.21
. get result b
.
Cyher
Match (some:Artifact) where 0<id(some)<1000 and not exists(()-[:DEPEND_ON]->(some))with collect(some) as col
Match (a:Artifact) where id(a)=179110 with a
FOREACH (n IN col| match p=shortestpath((n)-[:DEPEND_ON*]->(a))
return case p when p then n end AS result)
result with below:
Step 3: Iterate the result b
to get the distinct nodes in all the paths between every nodes in result b
to org.slf4j:slf4j-api:1.7.21
. That's the final result for the requirement.
e.g.
match (a:Artifact {gav:"net.wessendorf.kafka:kafka-cdi-extension:0.0.9"}) ,(b:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"}),p=allshortestpaths((a)-[:DEPEND_ON*]->(b)) return p
I also tried skip...limit to implement pagination before, But when the skip number goes up, the query tended to extremely slow.
MATCH (a:Artifact {gav: "org.slf4j:slf4j-api:1.7.21"})<-[:DEPEND_ON*]-(some)
where a<>some return distinct some skip n limit m
I would like to ask if there is any other better way to do this work.
Or how can I make my cypher above executable?
08-04-2020 11:09 AM
You need to understand that WITH changes what variables are in scope. Only the variables you include will stay in scope, any others will be left out.
So for the query that's erroring out, the problem is this: with a
. Change it to with a, col
so col
remains in scope.
Also, you can't use MATCH inside a FOREACH (only write clauses are allowed), so you'll need to use UNWIND on col instead.
08-04-2020 11:21 AM
Thanks, Andrew.
So, how the correct cypher should be?
I am a new user of Neo4j.
Thank for your help.
08-04-2020 11:25 AM
This compiles:
MATCH (some:Artifact)
WHERE 0 < id(some) < 1000 and not ()-[:DEPEND_ON]->(some)
WITH collect(some) as col
MATCH (a:Artifact)
WHERE id(a)=179110
WITH a, col
UNWIND col as n
MATCH p = shortestpath((n)-[:DEPEND_ON*]->(a))
RETURN CASE p WHEN p THEN n END AS result
Though I'm not sure it will do what you want. I'm not quite sure what you're intending by that RETURN.
08-04-2020 11:48 AM
The requirement is to query all the Artifact nodes that directly or indirectly depend on the Artifact with property gav: org.slf4j:slf4j-api:1.7.21
.and its id is 179110
I want to get all the nodes like the diagram below.
This graph is built up from a Maven Central Repository (which is a java library repository). And the org.slf4j:slf4j-api:1.7.21 Artifact has a tons of other Artifacts depend on it either directly or indirectly.
08-04-2020 01:27 PM
Do you need paths to each of these nodes, or do you just need the nodes?
You also did some pre-matching based on ids. Is that still needed, or is it enough to get all connected :Artifact nodes that don't have anything depending on them?
08-04-2020 02:22 PM
Hi, Andrew,
I want to get all the :Artifact nodes related to the target node. Not only the nodes that don't have anything depending on them but also these nodes in the paths the start nodes
to target nodes
in this diagram, excluding the target node.
In a word, all the :Artifact nodes depending on the target node either directly or indirectly.
And the pre-matching is not necessary.
Do I make this clear?
08-04-2020 03:39 PM
That sounds like it should be a fairly simple query, so let me know if there's something I've overlooked:
MATCH (a:Artifact)<-[:DEPEND_ON*]-(n:Artifact)
RETURN DISTINCT n
On a larger graph it may be easier to use APOC path finder procs:
MATCH (a:Artifact)
CALL apoc.path.subgraphNodes(a, {relationshipFilter:'<DEPEND_ON', labelFilter:'>Artifact'}) YIELD node
RETURN node
08-04-2020 04:24 PM
Hi, Andrew
Thank you very much.
You save my day.
I tried apoc before, but I used apoc.path.expand...That was the wrong direction.
I tried your cypher in this way:
MATCH (a:Artifact {gav:"org.slf4j:slf4j-api:1.7.21"})
CALL apoc.path.subgraphNodes(a, {relationshipFilter:'<DEPEND_ON', labelFilter:'>Artifact'}) YIELD node
RETURN node
And it's quite efficient:
Started streaming 164139 records after 6 ms and completed after 9674 ms, displaying first 1000 rows.
One more question:
How can I get the execution time of cypher and the memory allocated on the neo4j desktop?
Or are there any other ways to get these data?
I used profile match ...
, but cannot get the total allocated memory
.
I want to get the information like below. My neo4j version is 3.5.15
Compiler CYPHER 4.1
Planner COST
Runtime INTERPRETED
Runtime version 4.1
+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits | Page Cache Misses | Page Cache Hit Ratio | Order |
+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +ProduceResults | `p.name`, `count(m)` | 13 | 102 | 0 | | 0 | 0 | 0.0000 | p.name ASC |
| | +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Sort | `p.name` ASC | 13 | 102 | 0 | 22048 | 0 | 0 | 0.0000 | p.name ASC |
| | +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +EagerAggregation | cache[p.name] AS `p.name`, count(m) AS `count(m)` | 13 | 102 | 0 | 13768 | 0 | 0 | 0.0000 | |
| | +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Filter | m:Movie | 172 | 172 | 172 | | 0 | 0 | 0.0000 | |
| | +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +Expand(All) | (p)-[anon_17:ACTED_IN]->(m) | 172 | 172 | 297 | | 0 | 0 | 0.0000 | |
| | +---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
| +NodeIndexScan | p:Person(name) WHERE exists(name), cache[p.name] | 125 | 125 | 126 | | 0 | 0 | 0.0000 | |
+-------------------+---------------------------------------------------+----------------+------+---------+----------------+-----------------+-------------------+----------------------+------------+
Total database accesses: 595, total allocated memory: 32672
All the sessions of the conference are now available online