Neo4j

urubatan · ‎02-21-2020

Hello everyone, I'm pretty new to Neo4j, thank you in advance for any help
Let me try to explain my problem here:
I have stories that have many tags, each tag has one element (an element might be a company, location, ...)
then I have newsletters, each newsletter has many interests, and every interest have one or more (has_many) elements
the element has newsletters and tags end of relationship mapped

to build a newsletter with simple interests (simple means each interest has only one element), the performance right now is really fast and the query is pretty easy:
newsletter.interests.element.tags.story.as(:s).where('s.published_date > ?', Date.yesterday) works fine and fast
(it maps to the graph query bellow)

Gbrief#interests#elements#tags#story 
  MATCH (gbrief363386)
  WHERE (ID(gbrief363386) = {ID_gbrief363386})
  MATCH (gbrief363386)-[rel1:`interested_in`]->(node3:`Ginterest`)
  MATCH (node3)-[rel2:`interested`]->(node4:`Gelement`)
  MATCH (node4)<-[rel3:`element_tag`]-(node5:`GstoryTag`)
  MATCH (node5)-[rel4:`storytag`]->(s:`Gstory`)
  WHERE (s.published_date > {question_mark_param})
  RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}

but that is not good enough, if an interest has more than one element, all elements must match to the tagged elements to match a story

for example, one interest has elements A and C
storyA is tagged with A and B
storyB is tagged with A, B and C
only storyB should match that interest

and I'm using this code:
newsletter.interests.query_as(:int).match('(int)-[intelem:interested]->(elem:Gelement)').with('int,collect(distinct elem) as ielements').match('(elem)<-[rel3:element_tag]-(node5:GstoryTag),(node5)-[rel4:storytag]->(s:Gstory),(s)<-[rel5:storytag]-(node7:GstoryTag),(node7)-[rel6:element_tag]->(selem:Gelement)').with('ielements, s, collect(distinct selem) as selements').where('all(e in ielements where e in selements)').where('s.published_date > ?', dt).pluck(:s)

(it maps to the graph query bellow)

Gbrief#interests 
  MATCH (gbrief363386)
  WHERE (ID(gbrief363386) = {ID_gbrief363386})
  MATCH 
    (gbrief363386)-[rel1:`interested_in`]->(int:`Ginterest`), 
    (int)-[intelem:`interested`]->(elem:`Gelement`)
  WITH int,collect(distinct elem) as ielements
  MATCH (elem)<-[rel3:`element_tag`]-(node5:`GstoryTag`),(node5)-[rel4:`storytag`]->(s:`Gstory`),(s)<-[rel5:`storytag`]-(node7:`GstoryTag`),(node7)-[rel6:`element_tag`]->(selem:`Gelement`)
  WITH ielements, s, collect(distinct selem) as selements
  WHERE 
    (all(e in ielements where e in selements)) AND 
    (s.published_date > {question_mark_param})
  RETURN s | {:question_mark_param=>Tue, 12 Nov 2019 21:14:45 UTC +00:00, :ID_gbrief363386=>363386}

and this is really, really slow
any ideas on how I can improve this? changing the mapping or the query will work

Thank youvery much

yyyguy · ‎02-21-2020

Hey @urubatan,

Welcome to the community.

As it relates to your issue, I am trying to make sense of the graph structure that you currently have.

**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:HAS]->(Element)**
**(Element)-[:HAS]->(Tag)**
**(Story)-[:HAS]->(Tag)**

Is that about right?

I guess without some real data, it is a bit challenging to know the relevance of each node label.

For example, I understand what each of the node labels are for, except for the Element node label. It seems to me that you might be able to get by without Element nodes in your graph. Again, without understanding your data a bit better, that is one thing that stands out for me.

Is it possible to get a sample of your data (screenshot, etc.) to provide some better direction?

Let us know.

-yyyguy

urubatan · ‎02-21-2020

my graph is:

**(Newsletter)-[:INTERESTED_IN]->(Interest)**
**(Interest)-[:INTERESTED]->(Element)** #because one interest might be one element or more, for example I might be interested in stories about Apple, or only stories about Apple that happened in China
**(Story)-[TAGGED_WITH]->(Tag)**
**(Tag)-[HAS_ONE]->(Element)**
**(Element)<-[INTERESTED]-(Interest)**
**(Element)<-[MAS_MANY]-(Tag)** # (the inverse of the other relationship)

Interest has a name property, that is used as a label in the newsletter
for example:

Newsletter#1
  Interest{name: 'Local News'} -[:INTERESTED]->(Element {name: 'Brazil'})
  Interest{name: 'Mercosul Tech'} -[:INTERESTED]->(Element {name: 'Latin America'}, Element{name: 'Technology'})

Story#1
  Tag-[]->(Element {name: 'Uruguai'})
Story#2
  Tag-[]->(Element {name: 'Technology'})
Story#3
  Tag-[]->(Element {name: 'Technology'})
  Tag-[]->(Element {name: 'Brazil'})
Story#4
  Tag-[]->(Element {name: 'Technology'})
  Tag-[]->(Element {name: 'Latin America'})
  Tag-[]->(Element {name: 'Uruguai'})

Story#3 will be in the "Local News" section of the newsletter
Story#4 will be in the "Mercosul Tech" section
Stories 1 and 2 will not be in the newsletter

this is a very simple example, but I think it is enough to show the problem

Thank you very much for any help on this

yyyguy · ‎02-21-2020

Thanks for the example you provided. It appears that an Element node holds information relevant to the associated tag. Is this accurate?

It does seem to me that you are using relationships to indicate the cardinality between nodes. Neo4j does not need or use cardinality. I would not recommend using any relationship to indicate cardinality. A relationship should indicate the intent of the relationship, not if it is a one-to-one or one-to-many or many-to-one. You can easily figure out whether a node has one or multiple node relationships.

It seems like you could collapse the Element name into the Tag node if I am understanding what is intended with a tag.

Let me know if any of this makes sense to you.

Cheers,

-yyyguy

urubatan · ‎02-21-2020

I can remove the tag and associate the story directly with the element
Thanks for pointing that, I'll update the model

The cardinality I was using just to explain the associations and my actual problem

To match all element/tags when an interest has more than one element

Any help is welcome

Neo4j

Multiple interest/tag match performance