Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-07-2020 06:49 AM
I have a schema representing users that make comments on websites:
(:Website)<-[:POSTED_IN]-(:Comment)<-[:POSTED]-(:User)
I would like to count, for each pair of websites, how many distinct users commented on both sites. I tried this:
Match (w1:Website)<-[:POSTED_IN]-(:Comment)<-[:POSTED]-(u:User)-[:POSTED]->(:Comment)-[:POSTED_IN]->(w2:Website)
RETURN w1.name, w2.name, Count(Distinct u) as weight
There are around 6000 Website nodes in my db, so around 36 million rows need to be returned, and this seems to cause the db to crash.
One workaround I tried was to do a single query for each Website in the db:
for w in websites:
q = '''Match (w1:Website {name:$w})<-[:POSTED_IN]-(:Comment)<-[:POSTED]-(u:User)-[:POSTED]->(:Comment)-[:POSTED_IN]->(w2:Website)
RETURN w1.name, w2.name, Count(Distinct u) as weight'''
tx.run(q, w=w)
This gets the job done, (I think), but is till very slow.
Is there a better way to do this?
02-07-2020 08:41 AM
The first issue with your query is that for every user who commented on two websites, you get repetitive rows like this:
user1, website1, website2
user1, website2, website1
you can fix that by adding a WHERE
condition like this:
WHERE id(w1)>id(w2)
02-10-2020 12:26 PM
About how many comments does a user typically make per website? Unless the typical user only leaves a single comment per website, you may need to remodel this and your query for better performance.
If a user posts 20 comments on a website, in your query this will multiply out 20 times however many comments they posted on each other site, in terms of paths explored and rows generated that need to be processed with DISTINCT.
If you created and maintained these distinct paths: (:Website)<-[:POSTED_AT]-(:User)
, such that only one such relationship can exist between a user and a website, that should prevent those multiplicative path explorations and may increase your query efficiency.
All the sessions of the conference are now available online