cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Aggregate related nodes across all rows

I am relatively new to Neo4j/Cypher and getting the desired results for the following query has been escaping me for a while now.

If it helps, here is a Stack Overflow question where some setup and discussion has already taken place.

The nodes have a very simple relationship model: Books are grouped into Categories

The books and categories are all unique and a book can be related to multiple categories.

My base query collects the categories, resulting in a set of books with their associated categories:

match (c:Category)-[:contains]-(b:Book)
return b as book, collect(distinct c) as categories

I can then collect the books, resulting in a set of related books and categories:

match (c:Category)-[:contains]-(b:Book)
with b, collect(distinct c) as categories
return collect(distinct b) as books, categories

This seems to be going in the right direction but there are many duplicate books and categories throughout. Here is a pseudo example:

Books                         Categories
-----------------------------------------------
[Easy Home Updates]           [Home and Garden]
-----------------------------------------------
[Gardening Today,             [Outdoors,
 Gardening for Kids,           Hobbies,
 Green Thumb Made Easy]        Gardening]
-----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Travel,
 Advanced Spanish]             Language]
-----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids]           Kids]
-----------------------------------------------
[Home Improvement,            [Home Improvement,
 Easy Home Updates,            Home and Garden,
 Family Home Projects]         Family]
-----------------------------------------------
[Gardening Today]             [Gardening]
-----------------------------------------------
[Conversational Spanish,      [Language,
 Advanced Spanish]             Spanish]

I cannot seem to find a way to aggregate the duplicates either in the initial match with filtering or using a combination of the reduce and apoc functions.

The desired result would be to reduce both the related book and category collections. Something like this:

Books                         Categories
----------------------------------------------
[Gardening Today,             [Gardening,
 Gardening for Kids,           Outdoors,
 Green Thumb Made Easy]        Hobbies,
                               Kids,
                               Family]
----------------------------------------------
[Conversational Spanish,      [Spanish,
 Spanish for Travelers,        Language,
 Advanced Spanish]             Travel,
                               Education]
----------------------------------------------
[Home Improvement,            [Home and Garden,
 Easy Home Updates,            Home Improvement,
 Family Home Projects]         Construction]

I am hoping to build a data set that is more like how nodes are grouped in the graph view of the browser visualization tool. No matter how much duplicate data exists, it seems the graph view of the browser visualization tool always finds a way to group and remove duplicates perfectly while the data set in the table view is not grouped in the same way.

Is it possible to write a Cypher/Apoc query alone to achieve this or do I need to further reduce and aggregate the rows via custom programming? I am a developer by trade so I am not against coding the solution, but my first instinct is to always allow the database to do as much of the work as possible.

Any help you can provide to point me in the right direction would be greatly appreciated. Even just a high level explanation of how the visualization tool does it's magic would be helpful.

Please let me know if you need any further clarification.

2 REPLIES 2

ameyasoft
Graph Maven

Hi,
You can try this:

MATCH (c:Category)-[:CONTAINS]->(d:Book)
WHERE c.name IN ['Gardening','Outdoors','Hobbies','Family']
RETURN COLLECT(distinct d.name) as Books, COLLECT(distinct c.name) as Category;

Result:
2X_f_fec5417be7ac7fe6b30c6a6e4dcd1eb731cdc06b.png

-Kamal

Can you supply Cypher to create a sample graph so we can play around with it a bit?

Also, your example shouldn't be possible with your query.

match (c:Category)-[:contains]-(b:Book)
// you really should add this so the category lists use a common ordering: with b, c order by c.name ASC
with b, collect(distinct c) as categories
return collect(distinct b) as books, categories

At the second-to-last line, every book will be distinct with its own list of categories.
At the last line, you've collected those distinct books by the category lists, books should be distinct, but your example results show the same books appearing several times in multiple category sets. The only way that would happen is if you had multiple nodes for the same books, but connecting to different sets of categories.