Neo4j

changecpl · ‎07-20-2021

Hi, everyone!
I am new to neo4j. I use the procedure apoc.path.subgraphNodes to get all the nodes in a subgraph.

MATCH (p:Pkg {name: "express", version:'4.17.1'})
CALL apoc.path.subgraphNodes(p, {
	relationshipFilter: "DependOn",
	minLevel:1,maxLevel:10
})
YIELD node
RETURN node

However, I want to count these nodes group by a property's values, which named license. The output should be like this:

license        count
name1           10
name2           12
...             ..

I know there is no group by clause in neo4j. What I want to do is like:

...
RETURN node.license, COUNT(node) GROUP BY node.license

Regards
Changecpl

changecpl · ‎07-21-2021

Oh, Thanks a lot @tard.gabriel ！You are right. The Cypher language is easy to read ! That is so close. I want to count the nodes that the package express depends on both directly and indirectly, group by the property license 's values. I change your code, but the count seems contains duplicate nodes
due to the intersection of different paths. Finally I put the distinct where it should be !
Thanks bro !

MATCH (p:Pkg {name: "express", version:'4.17.1'})-[:DependOn*1..]->(n:Pkg)
// due to the intersection of different paths, n contains duplicate nodes, use distinct
WITH n.license AS license, count(distinct n) AS count
RETURN license, count ORDER BY license

View solution in original post

tard_gabriel · ‎07-21-2021

Hi @changecpl

Any Neo4j Staff member is welcome to disagree, maybe I don't understand clearly the need here, but if I do I think you don't need apoc at all in this case and for production, intuitiveness and speed purpose I would recommend to use Cypher only if you can.

APOC is awesome and covers a lot of advanced uses cases, but too many beginners jump way to fast into apoc. The Cypher language is build to be natural, fast and efficient, it's a huge part of the graph philosophy.

MATCH (n)-[:DEPEND_ON*1..10]->(p:Pkg {name: "express", version:'4.17.1'})
WITH n.license AS license, count(n) AS count
RETURN license, count ORDER BY license

Easier to read, write and understand. But might not fit your needs.

You can read more about variable path length here.

You also need to add a composite index for your query, I will let you dig into the Cypher User manuel or Neo4j academy courses about it.

changecpl · ‎07-21-2021

Oh, Thanks a lot @tard.gabriel ！You are right. The Cypher language is easy to read ! That is so close. I want to count the nodes that the package express depends on both directly and indirectly, group by the property license 's values. I change your code, but the count seems contains duplicate nodes
due to the intersection of different paths. Finally I put the distinct where it should be !
Thanks bro !

MATCH (p:Pkg {name: "express", version:'4.17.1'})-[:DependOn*1..]->(n:Pkg)
// due to the intersection of different paths, n contains duplicate nodes, use distinct
WITH n.license AS license, count(distinct n) AS count
RETURN license, count ORDER BY license

changecpl · ‎07-24-2021

Actually I find that apoc.path.subgraphNodes is much faster than MATCH (A)-[:Relationship*1..]->(B) when the subgraph to search is big...

Neo4j

How to count the nodes group by their properties' values