cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

HOW can i group Nodes on groups based one cummon relationships?

Hi experts .

I'm trying to query my graph model to export nodes that share maximum relationships in one group :

i want to create lists that contains keywords that shares same URLS
per example : create list1 ['keyword-1' , 'keyword-2' , 'keyword-3' ] (they share : URL-A , URL-BH , URL-CD) and so on .... they have to share at least 3 URLS to be in a list , a list must have at least 2 keywords , one keyword should be in one list .

can you please help me , generating lists of keywords that shares the same URLS .

ps: each url is an independent node , the result above is generated by the folowing query :

match (k:Keyword)-[a:APPEARS_IN]-(u:Url) return k.keyword as keyword, collect(u.url) as urls

Regard .

3 REPLIES 3

@wadie.almouhtadi

I think something like this should works (I used the APOC library😞

match (k:Keyword)-[a:APPEARS_IN]-(u:Url) // MATCH nodes
with apoc.coll.toSet(collect(u.url)) as mySet, k // I create a set with all possible urls
with apoc.coll.combinations(mySet, 3, 3) as combinations, k // I create all possible combinations of 3, this became a list of lists
unwind combinations as combination.  // for each combination...
with combination, collect(k.keyword) as keys // I group the list of keyword with this
where size(keys) > 2 // and filter only combination with 3 or more keys
return combination, keys

With the above query I receive:
combination                  |	    keys
["URL-D", "URL-C", "URL-A"]  |	    ["keyword-1", "keyword-2", "keyword-6"]
["URL-C", "URL-B", "URL-A"]	 |      ["keyword-1", "keyword-2", "keyword-5"]

using this dataset:

create (k:Keyword {keyword: "keyword-1"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});

create (k:Keyword {keyword: "keyword-2"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});

create (k:Keyword {keyword: "keyword-3"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'});

create (k:Keyword {keyword: "keyword-4"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});

create (k:Keyword {keyword: "keyword-5"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-E'});

create (k:Keyword {keyword: "keyword-6"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});

match (m:Keyword)-[:APPEARS_IN]-(u:Url)
with apoc.coll.toSet(collect(u.url))as murls, m.keyword as mkeywords
with collect({keyword:mkeywords , urls:murls}) as all_
unwind all_ as k1
    unwind all_ as k2 
    with k1,k2
        where k1 <> k2
            with k1.keyword as keyword1,k2.keyword as keyword2 ,apoc.coll.intersection(k1.urls,k2.urls)as urls
            with keyword1,keyword2 , urls, size(urls) as size_urls
            with keyword1,keyword2 , apoc.coll.combinations(urls,3,size_urls) as url_cobinaitions
            //where size(url_cobinaitions) > 0
            unwind url_cobinaitions as com 
                with keyword1,keyword2,com as urls
                with  keyword1,keyword2, urls, apoc.coll.frequencies([urls]) AS nbr_urls_count
                with keyword1,keyword2, urls,nbr_urls_count[0].count as nbr_urls_count
                with  urls, sum(nbr_urls_count) as nbr_urls_count, apoc.coll.toSet(collect(keyword1)+collect(keyword2)) as keywords
                with urls,nbr_urls_count, keywords, size(urls) as size_url, size(keywords) as size_keywords
                where size_url >= 3 and size_keywords >=2
                return urls,nbr_urls_count,size_url,size_keywords ,keywords
                order by nbr_urls_count desc

so , i came to think of it this way , but seems so hard coded and slow when i'm running it on 500k nodes , it even brake after 3 hours of running .

@wadie.almouhtadi

Sorry, I can't fully understand what this query should return in addition to your initial answer
Can you add a brief description of this one?