Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-02-2021 09:38 AM
Hi experts .
I'm trying to query my graph model to export nodes that share maximum relationships in one group :
i want to create lists that contains keywords that shares same URLS
per example : create list1 ['keyword-1' , 'keyword-2' , 'keyword-3' ] (they share : URL-A , URL-BH , URL-CD) and so on .... they have to share at least 3 URLS to be in a list , a list must have at least 2 keywords , one keyword should be in one list .
can you please help me , generating lists of keywords that shares the same URLS .
ps: each url is an independent node , the result above is generated by the folowing query :
match (k:Keyword)-[a:APPEARS_IN]-(u:Url) return k.keyword as keyword, collect(u.url) as urls
Regard .
12-17-2021 07:16 AM
I think something like this should works (I used the APOC library😞
match (k:Keyword)-[a:APPEARS_IN]-(u:Url) // MATCH nodes
with apoc.coll.toSet(collect(u.url)) as mySet, k // I create a set with all possible urls
with apoc.coll.combinations(mySet, 3, 3) as combinations, k // I create all possible combinations of 3, this became a list of lists
unwind combinations as combination. // for each combination...
with combination, collect(k.keyword) as keys // I group the list of keyword with this
where size(keys) > 2 // and filter only combination with 3 or more keys
return combination, keys
combination | keys
["URL-D", "URL-C", "URL-A"] | ["keyword-1", "keyword-2", "keyword-6"]
["URL-C", "URL-B", "URL-A"] | ["keyword-1", "keyword-2", "keyword-5"]
using this dataset:
create (k:Keyword {keyword: "keyword-1"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});
create (k:Keyword {keyword: "keyword-2"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});
create (k:Keyword {keyword: "keyword-3"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'});
create (k:Keyword {keyword: "keyword-4"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});
create (k:Keyword {keyword: "keyword-5"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-B'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-E'});
create (k:Keyword {keyword: "keyword-6"})-[a:APPEARS_IN]->(u:Url {url: 'URL-A'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-C'})
with k
create (k)-[a:APPEARS_IN]->(:Url {url: 'URL-D'});
12-29-2021 05:37 AM
match (m:Keyword)-[:APPEARS_IN]-(u:Url)
with apoc.coll.toSet(collect(u.url))as murls, m.keyword as mkeywords
with collect({keyword:mkeywords , urls:murls}) as all_
unwind all_ as k1
unwind all_ as k2
with k1,k2
where k1 <> k2
with k1.keyword as keyword1,k2.keyword as keyword2 ,apoc.coll.intersection(k1.urls,k2.urls)as urls
with keyword1,keyword2 , urls, size(urls) as size_urls
with keyword1,keyword2 , apoc.coll.combinations(urls,3,size_urls) as url_cobinaitions
//where size(url_cobinaitions) > 0
unwind url_cobinaitions as com
with keyword1,keyword2,com as urls
with keyword1,keyword2, urls, apoc.coll.frequencies([urls]) AS nbr_urls_count
with keyword1,keyword2, urls,nbr_urls_count[0].count as nbr_urls_count
with urls, sum(nbr_urls_count) as nbr_urls_count, apoc.coll.toSet(collect(keyword1)+collect(keyword2)) as keywords
with urls,nbr_urls_count, keywords, size(urls) as size_url, size(keywords) as size_keywords
where size_url >= 3 and size_keywords >=2
return urls,nbr_urls_count,size_url,size_keywords ,keywords
order by nbr_urls_count desc
so , i came to think of it this way , but seems so hard coded and slow when i'm running it on 500k nodes , it even brake after 3 hours of running .
01-10-2022 05:35 AM
Sorry, I can't fully understand what this query should return in addition to your initial answer
Can you add a brief description of this one?
All the sessions of the conference are now available online