Neo4j

jamesolinhoffma · ‎06-02-2021

Hi, I'm trying to build v1 of a query (pagination and other stuff will come later) that will generate a social media timeline for a project I'm working on.

Here is the query I currently have. Right now, I'm able to get the most recent user that liked the site. However, like the title says, I want the most recent 3 users.

I'm having trouble wrapping my head around how collect/unwind work. Specifically, how does the apoc.agg.maxItems know to only look at the LIKED_SITE relationships that are tied to the current site it's looking at? If I do a collect on the ls inside that WITH, how does it know to group them based off the site they're pointing to? I thought I understood how this worked, but as soon as I UNWIND the collections to perform operations on them, I can't collect them back into lists for each site.

Sorry if this is confusing... Just having a hard time wrapping my head around how neo4j/cypher work. Would really appreciate just any general advice as well.

MATCH (u:User {id: $cypherParams.userId})-[:IS]->(pu:PublicUser)-[:FOLLOWS]->(following_pu:PublicUser)-[:HAD_SESSION]->(sesh:Session)-[ls:LIKED_SITE]->(site:Site)
        WITH
            site, 
            duration.inSeconds(max(ls.lastLikedOn), datetime()) AS dur, 
            max(ls.lastLikedOn) AS lastLikedOn,
            apoc.agg.maxItems(ls, ls.lastLikedOn, 1) as most_recent_LIKED_SITE
        MATCH (pu:PublicUser)-[:HAD_SESSION]->(:Session)-[:LIKED_SITE {id: most_recent_LIKED_SITE.items[0].id}]->(:Site) 
        RETURN {
            url: site.url,
            siteName: site.siteName,
            title: site.title,
            description: site.description,
            image: site.image,
            lastLikedBy: pu.username,
            lastLikedOn: lastLikedOn,
            secondsSinceLastLike: dur.seconds
        } ORDER BY dur

PS: This will generate the same dataset that I have

                CREATE (pu1:PublicUser) set pu1.username = "1"
                CREATE (u1)-[:IS]->(pu1)
CREATE (u2:User) SET u2.username = "2", u2.id = randomUUID()
                CREATE (pu2:PublicUser) set pu2.username = "2"
                CREATE (u2)-[:IS]->(pu2)
CREATE (u3:User) SET u3.username = "3", u2.id = randomUUID()
                CREATE (pu3:PublicUser) set pu3.username = "3"
                CREATE (u3)-[:IS]->(pu3)
CREATE (u4:User) SET u4.username = "4", u4.id = randomUUID()
                CREATE (pu4:PublicUser) set pu4.username = "4"
                CREATE (u4)-[:IS]->(pu4)
CREATE (u5:User) SET u5.username = "5", u5.id = randomUUID()
                CREATE (pu5:PublicUser) set pu5.username = "5"
                CREATE (u5)-[:IS]->(pu5)
CREATE (jim:User) SET jim.username = "jim", jim.id = randomUUID()
                CREATE (pujim:PublicUser) set pujim.username = "jim"
                CREATE (jim)-[:IS]->(pujim)
with u5 as poop

MATCH (pu:PublicUser {username: "1"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "twitter.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:12, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()

WITH pu as poop

MATCH (pu:PublicUser {username: "2"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "twitter.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:13, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()
WITH pu as poop

MATCH (pu:PublicUser {username: "3"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "twitter.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:14, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()

WITH pu as poop
MATCH (pu:PublicUser {username: "3"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "nastyboys.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:14, minute:51, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()

WITH pu as poop

MATCH (pu:PublicUser {username: "4"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "twitter.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:15, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()

WITH pu as poop
MATCH (pu:PublicUser {username: "4"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "pornhub.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:16, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()
with pu as poop
MATCH (pu:PublicUser {username: "5"})
        MERGE (pu)-[:HAD_SESSION]->(sesh:Session {date: date({year: 2017, month: 11, day: 11})})
        MERGE (site:Site {url: "twitter.com"})
        MERGE (sesh)-[r:LIKED_SITE]->(site)
            ON CREATE
                SET 
		   r.id = randomUUID(),
                    r.timesLiked = 1,
                    r.lastLikedOn = datetime({year:2017, month:11, day:11, hour:11, minute:31, second:14, nanosecond: 645876123, timezone: '+01:00'})
            ON MATCH
                SET 
                    r.timesLiked = r.timesLiked + 1,
                    r.lastLikedOn = datetime()

return  pu as poop

MATCH (pu:PublicUser {username: "jim"}) 
WITH pu
MATCH(u1:PublicUser{ username: "1"})
MERGE (pu)-[:FOLLOWS]->(u1)
WITH pu
MATCH(u2:PublicUser{ username: "2"})
MERGE (pu)-[:FOLLOWS]->(u2)
WITH pu
MATCH(u3:PublicUser{ username: "3"})
MERGE (pu)-[:FOLLOWS]->(u3)
WITH pu
MATCH(u4:PublicUser{ username: "4"})
MERGE (pu)-[:FOLLOWS]->(u4)
WITH pu
MATCH(u5:PublicUser{ username: "5"})
MERGE (pu)-[:FOLLOWS]->(u5)
WITH pu
return pu

type or paste code here

andrew_bowman · ‎06-02-2021

I can help clarify some things here.

Aggregations in Cypher use the non-aggregation terms as their grouping key, which provides the context for the aggregation (there are some exceptions to this, but they don't apply here). This applies to both built-in aggregation functions as well as custom ones, such as apoc.agg.maxItems().

Expressions that are based on an aggregation are tied to the aggregation, but to avoid more complex situations it's best to keep all elements of an aggregation as simple as possible.

In your WITH clause, the only non-aggregation term is site (since the expression for dur builds off the max() aggregation term), so site becomes the grouping key: per distinct site, the aggregations will be applied.

In order to get the last 3 who liked each site, we can no longer use the max() or maxItems() aggregations, since those will only get the last liked relationship. Instead, we need to use ORDER BY to get the last 3 :LIKED_SITE relationships based on the lastLikedOn relationship, and then based on the limited result set, get the rest of what we want.

While there are several ways to do this, here's one approach:

MATCH (u:User {id: $cypherParams.userId})-[:IS]->(pu:PublicUser)-[:FOLLOWS]->(following_pu:PublicUser)-[:HAD_SESSION]->(sesh:Session)-[ls:LIKED_SITE]->(site:Site)

WITH site, ls
ORDER BY ls.lastLikedOn DESC
WITH site, collect(ls)[..3] as last3
CALL {
 WITH last3
 UNWIND last3 as liked
 MATCH (pu:PublicUser)-[:HAD_SESSION]->(:Session)-[liked]->()
 WITH pu, pu.username as lastLikedBy, liked.lastLikedOn as lastLikedOn, duration.inSeconds(liked.lastLikedOn, dateTime()).seconds as secondsSinceLastLike
 ORDER BY secondsSinceLastLike
 RETURN collect(pu {lastLikedBy, lastLikedOn, secondsSinceLastLike}) as last3Likes
}
RETURN site {.url, .siteName, .title, .description, .image, last3Likes}  as siteInfo

We use ORDER BY followed by a collect() with a slice of that list to get the last 3 liked relationships per site.

We then use a subquery (executed per site) to work with each site's top 3 results, MATCHing back to the user who originated the like, and collecting the relevant info into a list (since the subquery is executing per site, that forms the scope, and this ensures that the collect() is with respect to the site, even if there is no grouping key explicitly present).

We then use map projection to get the properties we want from the site, including the list of the last 3 likes.

View solution in original post

andrew_bowman · ‎06-02-2021

I can help clarify some things here.

Aggregations in Cypher use the non-aggregation terms as their grouping key, which provides the context for the aggregation (there are some exceptions to this, but they don't apply here). This applies to both built-in aggregation functions as well as custom ones, such as apoc.agg.maxItems().

Expressions that are based on an aggregation are tied to the aggregation, but to avoid more complex situations it's best to keep all elements of an aggregation as simple as possible.

In your WITH clause, the only non-aggregation term is site (since the expression for dur builds off the max() aggregation term), so site becomes the grouping key: per distinct site, the aggregations will be applied.

In order to get the last 3 who liked each site, we can no longer use the max() or maxItems() aggregations, since those will only get the last liked relationship. Instead, we need to use ORDER BY to get the last 3 :LIKED_SITE relationships based on the lastLikedOn relationship, and then based on the limited result set, get the rest of what we want.

While there are several ways to do this, here's one approach:

MATCH (u:User {id: $cypherParams.userId})-[:IS]->(pu:PublicUser)-[:FOLLOWS]->(following_pu:PublicUser)-[:HAD_SESSION]->(sesh:Session)-[ls:LIKED_SITE]->(site:Site)

WITH site, ls
ORDER BY ls.lastLikedOn DESC
WITH site, collect(ls)[..3] as last3
CALL {
 WITH last3
 UNWIND last3 as liked
 MATCH (pu:PublicUser)-[:HAD_SESSION]->(:Session)-[liked]->()
 WITH pu, pu.username as lastLikedBy, liked.lastLikedOn as lastLikedOn, duration.inSeconds(liked.lastLikedOn, dateTime()).seconds as secondsSinceLastLike
 ORDER BY secondsSinceLastLike
 RETURN collect(pu {lastLikedBy, lastLikedOn, secondsSinceLastLike}) as last3Likes
}
RETURN site {.url, .siteName, .title, .description, .image, last3Likes}  as siteInfo

We use ORDER BY followed by a collect() with a slice of that list to get the last 3 liked relationships per site.

We then use a subquery (executed per site) to work with each site's top 3 results, MATCHing back to the user who originated the like, and collecting the relevant info into a list (since the subquery is executing per site, that forms the scope, and this ensures that the collect() is with respect to the site, even if there is no grouping key explicitly present).

We then use map projection to get the properties we want from the site, including the list of the last 3 likes.

jamesolinhoffma · ‎06-02-2021

Andrew, thank you so much! I have a much better understanding of how this all works now. Using a subquery was the final piece of the puzzle.

Thanks for taking the time to explain how the grouping keys work with aggregations.

EDIT: just realized some of the stuff I put in that dummy dataset....

jamesolinhoffma · ‎06-02-2021

1 more question:

Would you mind explaining (or linking somewhere) how the "collect(pu {lastLikedBy, lastLikedOn, secondsSinceLastLike})" works? I don't think I've seen that syntax before. Specifically, I'm not sure what the pu does (although I do know that taking it out seems to break it, haha)

andrew_bowman · ‎06-02-2021

Sure, so this is a neat little feature called map projection, which lets us more easily select the properties we want for our map. When the map is based upon another map, or upon a node's properties, we can refer to existing properties and it will do the work of using those properties for the key and value. If we include a variable, then the variable name will become the key and the variable's value becomes the value. We can also add our own custom key/values the way we can with map literals.

Documentation here:

jamesolinhoffma · ‎06-02-2021

That is neat. Thank you!

Neo4j

Return the last 3 users you are following who liked a site