cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Different results when I use DISTINCT

leelandclay
Graph Buddy

Hi...I have no idea what is going on. I'm hoping someone can help. I wrote a query using the Count function so I could see when it got down to a point where I wouldn't have huge return set. After I added all of the conditions in, I ran it with RETURN DISTINCT COUNT(other) and it said I had 60 results. So, I removed the COUNT so that it was only RETURN DISTINCT other....and it returned 12.

Here's the query...Can someone spot what I'm doing wrong????

With date.truncate('day', date()- duration('P324M')) AS agemax, date.truncate('day', date()- duration('P636M')) AS agemin
Match (n:Person {userId: '7J4OwwnmQ1fMhavSLeLkDkKe9Kl2'})-[]->(ct:ConnectionType)<-[]-(other:Person)
Match (n: Person { userId: '7J4OwwnmQ1fMhavSLeLkDkKe9Kl2'})-[:ISLOOKINGFORGENDER]->(ilfGender: Gender) < -[:ISAGENDER] - (other: Person)
Match (n: Person { userId: '7J4OwwnmQ1fMhavSLeLkDkKe9Kl2'})-[:ISAGENDER]->(isaGender: Gender) < -[:ISLOOKINGFORGENDER] - (other: Person)
WHERE distance(n.location, other.location) < 1609344 AND ct.key IN [10,20,30,40,50] AND ilfGender.key IN [10,20] AND isaGender.key IN [20] AND other.dateOfBirth >= toString(agemin) AND other.dateOfBirth <= toString(agemax) 
RETURN DISTINCT COUNT(other)
1 ACCEPTED SOLUTION

Thanks for the update. Glad to help you!

View solution in original post

20 REPLIES 20

12kunal34
Graph Fellow

hey @leelandclay

could you please tell us about your scenario that what is your requirement ??
it is quite difficult to understand from your post

Sure. It's a dating and the user is able to specify various characteristics. The where clause is (hopefully) filtering down to nodes that match all of the requirements.
The query seemed to work before I added the distance filter. I initially just tacked it onto the end of the where clause in CypherClient for .Net but got an error. That's when I deconstructed a query and put it in the desktop browser to figure out. I added each entry going backwards and that's when I noticed the discrepancy.
Edit
Also, I was using the count only as a testing to see which query clause caused to most reductions so that I could put that as the first filter. If it wasn't for that, I would have never known of the difference

leelandclay
Graph Buddy

I believe I discovered the cause. Well...not the root cause...but why this might be happening.

I decided to revert the query back to it's working state and continued working. I soon noticed that I was starting to get weird results. Now, on a query that should return 9,936 entries...I got 2. I'm guessing that somehow I corrupted my data yesterday while I was trying to get the locations working...now I get to rebuild my data before I can actually test this.

Try this query:

With date.truncate('day', date()- duration('P324M')) AS agemax, date.truncate('day', date()- duration('P636M')) AS agemin

Match (n:Person {userId: '7J4OwwnmQ1fMhavSLeLkDkKe9Kl2'})-->(ct:ConnectionType)<--(other:Person)
WHERE distance(n.location, other.location) < 1609344 AND ct.key IN [10,20,30,40,50] AND other.dateOfBirth >= toString(agemin) AND other.dateOfBirth <= toString(agemax)

WITH COLLECT(n) as p1, ct, COLLECT(other) as o1
UNWIND p1 as p2, UNWIND o1 as o2

MATCH (p2)-[:ISLOOKINGFORGENDER]->(isaGender: Gender)<-[:ISAGENDER]-(o2)
WHERE isaGender.key IN [20]

RETURN DISTINCT COUNT(other);

Thanks!
I copied it into the browser and ran it. It came up with 6,384 records. I then ran it without the count and added a limit 50 on it...it returned 24 records. So, I ran it again, this time with returning the records AND a count. Count came back as 266...with 24 records.

Like I mentioned above, I think I may have corrupted my data while implementing locations. I know I had some datatype issues previously because I copied over data from the desktop version and everything came in as string. While fixing that, I think I must have done something wrong and not realized it until now.

Thanks for the quick reply. Try running the query without location condition and see if the numbers come out correctly. That way you can narrow down the problem.

What's the value for n.location or other.location?

They're Point's that I created on Monday from latitude and longitude settings that were string (had to convert to float, then to point). I just tried removing the "distance(n.location, other.location) < 1609344 AND " from the Where clause and it's just spinning now (for about 2 minutes now)

It finally returned 59130716. I only have 10k Person nodes in the graph.

Sorry for this big number. Please send the screenshot of this query:
call apoc.meta.graph() if it is okay to share.

I made a mistake in the last RETURN statement. Here is the correct one:

Use this: RETURN DISTINCT (o2). I typed it as 'other'. Rerun the query with this change and check the numbers.

Oh, I caught that. It actually complained about it...and also the comma between the two UNWIND statements (I put them on separate lines). The apoc.meta.graph is:

Run this query that uses other.Person node only. Just to see if this works.

With date.truncate('day', date()- duration('P324M')) AS agemax, date.truncate('day', date()- duration('P636M')) AS agemin

MATCH (other:Person)-->(ct:ConnectionType)
WHERE ct.key IN [10,20,30,40,50] AND other.dateOfBirth >= toString(agemin) AND other.dateOfBirth <= toString(agemax)
WITH COLLECT(other) as o1, ct
UNWIND o1 as o2
MATCH (o2)-[:ISAGENDER]->(isaGender: Gender) WHERE isaGender.key IN [20]
RETURN COUNT(DISTINCT o2);

OK...The query returned 2,443 for the count...and when I changed the return to "DISTINCT o2", it returned 2,443 records as well!

Would it be helpful to be able to log into the server? I can open it up if you want. It's still in development, so theres only about 22 nodes that I would want to backup first (they're testers).

Thanks for your database access. I can only log in around 10 PM PST time. You can also send me some dummy data with your LOAD CSV script then I can import and play with it. You can just reply to me only so that the info comes my email directly. I will be happy to help you.

Also, I did not understand the reason for selecting Person with that id as all the filters are on other.Person except that location.

Now that you ask that, I think I may have incorrect assumptions about the query. My background was SQL, so everything had to have a starting point for the joins.

If you don't specify a starting person, will that person be included in the results? It wouldn't be a big deal to exclude the node from rendering, just want to know if needed

MATCH other.Person should have included the Person with that id. To check that, change the last RETURN statement to this and run.

MATCH (o2) WHERE o2.userId = "7J4OwwnmQ1fMhavSLeLkDkKe9Kl2"
RETURN o2;

Sorry for the slow response. Had other problems pop up in the app.
I checked the results and it did have the requestors data in there. I'll make sure I add a filter before pushing it live.

I have a question about the way you structured the query. Should there be a COLLECT and UNWIND between each MATCH? I originally wrote my queries with just the MATCH (no UNWIND or COLLECT) because I didn't know about them

Nevermind 🙂 I found this: https://neo4j.com/blog/cypher-union-query-using-collect-clause/
Makes sense now!

Thanks for the update. Glad to help you!