Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-21-2020 01:15 PM
Hello everyone, I'm pretty new to Neo4j, thank you in advance for any help
Let me try to explain my problem here:
I have stories that have many tags, each tag has one element (an element might be a company, location, ...)
then I have newsletters, each newsletter has many interests, and every interest have one or more (has_many) elements
the element has newsletters and tags end of relationship mapped
to build a newsletter with simple interests (simple means each interest has only one element), the performance right now is really fast and the query is pretty easy:
newsletter.interests.element.tags.story.as(:s).where('s.published_date > ?', Date.yesterday) works fine and fast
(it maps to the graph query bellow)
Gbrief#interests#elements#tags#story
MATCH (gbrief363386)
WHERE (ID(gbrief363386) = {ID_gbrief363386})
MATCH (gbrief363386)-[rel1:`interested_in`]->(node3:`Ginterest`)
MATCH (node3)-[rel2:`interested`]->(node4:`Gelement`)
MATCH (node4)<-[rel3:`element_tag`]-(node5:`GstoryTag`)
MATCH (node5)-[rel4:`storytag`]->(s:`Gstory`)
WHERE (s.published_date > {question_mark_param})
RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}
but that is not good enough, if an interest has more than one element, all elements must match to the tagged elements to match a story
for example, one interest has elements A and C
storyA is tagged with A and B
storyB is tagged with A, B and C
only storyB should match that interest
and I'm using this code:
newsletter.interests.query_as(:int).match('(int)-[intelem:interested
]->(elem:Gelement
)').with('int,collect(distinct elem) as ielements').match('(elem)<-[rel3:element_tag
]-(node5:GstoryTag
),(node5)-[rel4:storytag
]->(s:Gstory
),(s)<-[rel5:storytag
]-(node7:GstoryTag
),(node7)-[rel6:element_tag
]->(selem:Gelement
)').with('ielements, s, collect(distinct selem) as selements').where('all(e in ielements where e in selements)').where('s.published_date > ?', dt).pluck(:s)
(it maps to the graph query bellow)
Gbrief#interests
MATCH (gbrief363386)
WHERE (ID(gbrief363386) = {ID_gbrief363386})
MATCH
(gbrief363386)-[rel1:`interested_in`]->(int:`Ginterest`),
(int)-[intelem:`interested`]->(elem:`Gelement`)
WITH int,collect(distinct elem) as ielements
MATCH (elem)<-[rel3:`element_tag`]-(node5:`GstoryTag`),(node5)-[rel4:`storytag`]->(s:`Gstory`),(s)<-[rel5:`storytag`]-(node7:`GstoryTag`),(node7)-[rel6:`element_tag`]->(selem:`Gelement`)
WITH ielements, s, collect(distinct selem) as selements
WHERE
(all(e in ielements where e in selements)) AND
(s.published_date > {question_mark_param})
RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}
and this is really, really slow
any ideas on how I can improve this? changing the mapping or the query will work
Thank youvery much
02-21-2020 02:44 PM
Hey @urubatan,
Welcome to the community.
As it relates to your issue, I am trying to make sense of the graph structure that you currently have.
**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:HAS]->(Element)**
**(Element)-[:HAS]->(Tag)**
**(Story)-[:HAS]->(Tag)**
Is that about right?
I guess without some real data, it is a bit challenging to know the relevance of each node label.
For example, I understand what each of the node labels are for, except for the Element node label. It seems to me that you might be able to get by without Element nodes in your graph. Again, without understanding your data a bit better, that is one thing that stands out for me.
Is it possible to get a sample of your data (screenshot, etc.) to provide some better direction?
Let us know.
-yyyguy
02-21-2020 03:01 PM
my graph is:
**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:INTERESTED]->(Element)** #because one interest might be one element or more, for example I might be interested in stories about Apple, or only stories about Apple that happened in China
**(Story)-[TAGGED_WITH]->(Tag)**
**(Tag)-[HAS_ONE]->(Element)**
**(Element)<-[INTERESTED]-(Interest)**
**(Element)<-[MAS_MANY]-(Tag)** # (the inverse of the other relationship)
Interest has a name property, that is used as a label in the newsletter
for example:
Newsletter#1
Interest{name: 'Local News'} -[:INTERESTED]->(Element {name: 'Brazil'})
Interest{name: 'Mercosul Tech'} -[:INTERESTED]->(Element {name: 'Latin America'}, Element{name: 'Technology'})
Story#1
Tag-[]->(Element {name: 'Uruguai'})
Story#2
Tag-[]->(Element {name: 'Technology'})
Story#3
Tag-[]->(Element {name: 'Technology'})
Tag-[]->(Element {name: 'Brazil'})
Story#4
Tag-[]->(Element {name: 'Technology'})
Tag-[]->(Element {name: 'Latin America'})
Tag-[]->(Element {name: 'Uruguai'})
Story#3 will be in the "Local News" section of the newsletter
Story#4 will be in the "Mercosul Tech" section
Stories 1 and 2 will not be in the newsletter
this is a very simple example, but I think it is enough to show the problem
Thank you very much for any help on this
02-21-2020 03:20 PM
Thanks for the example you provided. It appears that an Element node holds information relevant to the associated tag. Is this accurate?
It does seem to me that you are using relationships to indicate the cardinality between nodes. Neo4j does not need or use cardinality. I would not recommend using any relationship to indicate cardinality. A relationship should indicate the intent of the relationship, not if it is a one-to-one or one-to-many or many-to-one. You can easily figure out whether a node has one or multiple node relationships.
It seems like you could collapse the Element name into the Tag node if I am understanding what is intended with a tag.
Let me know if any of this makes sense to you.
Cheers,
-yyyguy
02-21-2020 04:33 PM
I can remove the tag and associate the story directly with the element
Thanks for pointing that, I'll update the model
The cardinality I was using just to explain the associations and my actual problem
To match all element/tags when an interest has more than one element
Any help is welcome
All the sessions of the conference are now available online