Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-08-2018 11:02 AM
I am trying to do some text analysis in Neo4j and I want to write a query where it sorts the number of results in a descending order. My data is structured: (Word)->[next]->(Word)->[Next] etc, I want to write a query which says which are the most popular 3 word combinations, 4 word combinations, etc. I tried this but it always gives a count of one for word combinations:
MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word)
WITH [w1.name,w2.name,w3.name] AS word_pair
RETURN COUNT(word_pair) as frequency, word_pair
ORDER BY frequency DESC LIMIT 50
Solved! Go to Solution.
09-09-2018 08:38 AM
You're both right and wrong.
So what you need to do is to take either the word or relationship frequencies into account and sum them up.
e.g.
MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word)
RETURN [w1.name,w2.name,w3.name] AS word_pair,
sum(r.count + r2.count) as frequency
ORDER BY frequency DESC LIMIT 50
If you want to you can additionally sum up the counts of the words, or actually use a proper formula for computing a relevance score.
09-08-2018 05:04 PM
I think it's your data
This should actually work if you have the w1-w2-w3 pattern appearing in your graph more than once.
Can you share your data or an example?
09-09-2018 04:16 AM
This is how I loaded the data, so there is only one node for a unique word, but the relationships repeat. I thought there was a way to take an output of a query and count how many times it repeats within the dataset.
WITH split(tolower("My cat eats fish on Saturday")," ") as text
Unwind range(0,size(text)-2) as i
MERGE (w1:Word {name: text[i]})
ON CREATE SET w1.count = 1 ON MATCH SET w1.count=w1.count+1
MERGE (w2:Word {name: text[i+1]})
ON CREATE SET w2.count = 1 ON MATCH SET w2.count=w2.count+1
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count=r.count+1
RETURN w1,r, w2
second statement
WITH split(tolower("My cat eats cat food on Mondays")," ") as text
Unwind range(0,size(text)-2) as i
MERGE (w1:Word {name: text[i]})
ON CREATE SET w1.count = 1 ON MATCH SET w1.count=w1.count+1
MERGE (w2:Word {name: text[i+1]})
ON CREATE SET w2.count = 1 ON MATCH SET w2.count=w2.count+1
MERGE (w1)-[r:NEXT]->(w2)
ON CREATE SET r.count = 1 ON MATCH SET r.count=r.count+1
09-09-2018 08:38 AM
You're both right and wrong.
So what you need to do is to take either the word or relationship frequencies into account and sum them up.
e.g.
MATCH p = (w1:Word)-[r:NEXT]->(w2:Word)-[r2:NEXT]->(w3:Word)
RETURN [w1.name,w2.name,w3.name] AS word_pair,
sum(r.count + r2.count) as frequency
ORDER BY frequency DESC LIMIT 50
If you want to you can additionally sum up the counts of the words, or actually use a proper formula for computing a relevance score.
09-11-2018 02:28 AM
Thanks, that's really helpful!
All the sessions of the conference are now available online