cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Matching nodes with 2 or more properties in common

Hi everyone.

I am working with data from a csv file which contains 6 columns of data: hotelName , hotelPool , hotelSpa , hotelGym , hotelParking, hotelBreakfast where hotelName is the unique name of the hotel, hotelPool lists what kind of pool the hotel has, hotelSpa lists what kind of spa the hotel has, and hotelGym lists what kind of gym the hotel has, hotelParking lists what kind of parking facilities the hotel has, and hotelBreakfast lists the kind of breakfast options the hotel has.

Sample data as follows:

Row 1 "California", "Full-size", "Full-size", "Fully-equipped", "On-site parking", "Continental"

Row 2 "Yorba", "Full-size", "None", "Full-equipped", "On-site parking", "Continental"

Row 3 "Heartbreak", "None", "Full-size", "Full-equipped", "On-site parking", "Continental"

Row 4 "Chelsea", "Full-size", "Full-size", "Fully-equipped", "On-site parking", "Continental"

I want to match these hotels if they share 2 or more features in common from hotelPool, hotelSpa, hotelGym, or hotelParking only. I don't want to match on hotelParking or hotelBreakfast.

For example, California has 2 features in common with Yorba - both hotels have full-size pools and fully-equipped gyms (the parking and breakfast are not counted).

California also has 2 features in common with Heartbreak - both hotels have full-sized spas and fully equipped gyms (the parking and breakfast are not counted)

California also has 3 features in common with Chelsea - both hotels have full-sized pools, full-sized spas, and fully equipped gyms (the parking and breakfast are not counted).

My desired output is pairs of hotels that have 2 or more features in common, as follows:

Row 1 California, Yorba, 2 Row 2 California, Heartbreak, 2 Row 3 California, Chelsea, 3

My code so far matches on all properties that the hotels in common:

MATCH (h:Hotel)

WITH COLLECT(h) AS allHotels
UNWIND alllHotels AS h1
UNWIND allHotels AS h2
WITH h1, h2 
WHERE id(h1)>id(h2)

WITH h1, h2,
     REDUCE(i=0, key in keys(h1)  |
            i
            + CASE WHEN h1[key] = h2[key] THEN 1 ELSE 0 END
     )  AS commonFeatures

WHERE commonFeatures >=2 

RETURN h1.hotelName, h2.hotelName , sameCount

What changes do I need to make to my code to match only on the above mentioned properties on not on every single property in common between the hotels?

Thanks!

0 REPLIES 0