Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-30-2018 08:05 AM
Hi,
I have what should be a simple query to get a list of children (ideally grandchildren later) nodes from a parent node. I have a document node that is connected to words with a :BOW_OF relationship. The result I'm looking for a row with a documentID and a list of words in that document.
If I specify the document ID, it is very fast:
MATCH (word)-[r:BOW_OF]->(doc:Desc{id:'12345'}) RETURN doc.id, collect(word.word)
but if I take the id out, and add a LIMIT 1 on the end, it doesn't finish, so I think I'm doing something wrong.
MATCH (word)-[r:BOW_OF]->(doc:Desc) RETURN doc.id, collect(word.word) limit 1
What I would like to get to is without the limit:
MATCH (stem)-[s:STEM_OF]->(word)-[r:BOW_OF]->(doc:Desc) RETURN doc.id, collect(stem.stem)
Is there something I'm doing wrong? Thank you very much!
Oleg
11-30-2018 11:45 AM
Since you're using an aggregation (collect) with respect to the doc.id, ALL results need to be expanded out first before the collect(). Is doc.id
unique per :Desc node? If so, your aggregation should instead be by the doc
node and not by its id property. That way when you do property access at the end, it only does the access once per node instead of multiple times for every row for which the same node appears.
For your LIMIT 1 approach try this instead:
MATCH (doc:Desc)
WITH doc
LIMIT 1
MATCH (word)-[r:BOW_OF]->(doc)
WITH doc, collect(word.word) as words
RETURN doc.id, words
Alternately you could use pattern comprehension to get a list of results from a pattern:
MATCH (doc:Desc)
WITH doc
LIMIT 1
WITH doc, [(word)-[r:BOW_OF]->(doc) | word.word] as words
RETURN doc.id, words
How many :Desc nodes are your db, and how many word and stem nodes? If the result set is huge you may have some trouble executing this via the browser (especially if the browser is attempting to visualize it). You could try using cypher-shell instead.
For your full query, you would want to do a similar approach, but make sure to get only DISTINCT stems, I'm guessing there are a lot of duplicates there.
MATCH (stem)-[:STEM_OF]->()-[:BOW_OF]->(doc:Desc)
WITH doc, collect(DISTINCT stem) as stems
RETURN doc.id, stems
12-04-2018 09:10 AM
Thanks for replying! 🙂 I get it now about aggregating by node instead of property. I Yes, every doc.id
is unique. I have 200k doc
nodes now, but eventually a few million. Each doc
node can have ~100-4000 words/stems. No, there shouldn't be any duplicate words/stems, but that will be something to check.
What I'd like to do is then add the classification(s) of each document to the query to get a result to be able to train on... classifications and a list of stems. Does this seem like a reasonable query to do that? I don't necessarily need to visualize it, but I'll try to use the cypher-shell, I just never have before.
All the sessions of the conference are now available online