Neo4j

amelia_monicad · ‎06-13-2021

Hi all,

I have a dataset for ratings , that has ratings from 1 to 5, with different count of ratings on each ratings.

Sample file -
sample.txt (102 Bytes)

Product	1	2	3	4	5	Total
Toyota 	10	15	22	40	55	142
BMW   	9	91	19	12	11	142
Audi	10	9	15	20	95	149

I have individual nodes created for ratings 1 to 5.

I want to create relationships as

Node = Toyota Connected to Node = Ratings (1) with property of rating count as 10.
Node = Toyota Connected to Node = Ratings (2) with property of rating count as 15.
Node = Toyota Connected to Node = Ratings (2) with property of rating count as 22.

i,e
(Car{name:Toyota})-[:HAS_RATINGS(count:10)]->(Ratings{id:1})

Currently I am manually hardcoded the values of ratings per columns using MATCH car and MATCH ratings, which results in 5 sets of 3 MATCH statements, totalling 15 MATCH statements. The original dataset has 30,000 rows, and it takes a lot of time to load the data.

Any help is appreciated.

@michael.hunger @maxdemarzi @craig.taverner

koji · ‎06-13-2021

Hi @amelia.monicad

I've copied your file here.

<NEO4J_HOME>/import/sample.txt

This is my Cypher.
I added a backtick where the item name starts with a number.
Ratings are CREATED for each car, but if they are different, please change them.

LOAD CSV WITH HEADERS FROM 'file:///sample.txt' AS line
CREATE (c:Car {name: line.Product})
CREATE (r1:Ratings {id: 1})
CREATE (r2:Ratings {id: 2})
CREATE (r3:Ratings {id: 3})
CREATE (r4:Ratings {id: 4})
CREATE (r5:Ratings {id: 5})
CREATE (c)-[:HAS_RATINGS {count:line.`1`}]->(r1)
CREATE (c)-[:HAS_RATINGS {count:line.`2`}]->(r2)
CREATE (c)-[:HAS_RATINGS {count:line.`3`}]->(r3)
CREATE (c)-[:HAS_RATINGS {count:line.`4`}]->(r4)
CREATE (c)-[:HAS_RATINGS {count:line.`5`}]->(r5)

amelia_monicad · ‎06-13-2021

Thanks, but this will work for small dataset, not for huge dataset with 100K rows. The individual CREATE relationships statements has to run 5 * 100K individually to create.

I would like to iterate 1 entire row, and based on the column name, it should create the relationship with the count as property.

koji · ‎06-13-2021

Hi @amelia.monicad

How about this one
With either Cypher, you can use the huge data as it is by simply changing the sample.txt file.

CREATE (:Ratings {id: 1}),
       (:Ratings {id: 2}),
       (:Ratings {id: 3}),
       (:Ratings {id: 4}),
       (:Ratings {id: 5});

CREATE INDEX index_rating_id FOR (n:Ratings) ON (n.id);

LOAD CSV WITH HEADERS FROM 'file:///sample.txt' AS line
MATCH (r1:Ratings {id: 1}),
      (r2:Ratings {id: 2}),
      (r3:Ratings {id: 3}),
      (r4:Ratings {id: 4}),
      (r5:Ratings {id: 5})
CREATE (c:Car {name: line.Product})
CREATE (c)-[:HAS_RATINGS {count:line.`1`}]->(r1),
       (c)-[:HAS_RATINGS {count:line.`2`}]->(r2),
       (c)-[:HAS_RATINGS {count:line.`3`}]->(r3),
       (c)-[:HAS_RATINGS {count:line.`4`}]->(r4),
       (c)-[:HAS_RATINGS {count:line.`5`}]->(r5);

However, as a graph design, it is not a good idea to have a large number of relations on a single Rating node.

amelia_monicad · ‎06-13-2021

is there any apoc procs to build the relationship more effectively ?

ameyasoft · ‎06-14-2021

Here is my take on the data model:
1. To keep the number of nodes to minimum, you can create one node for each row. Here is the Cypher:

LOAD CSV WITH HEADERS FROM 'file:///amelia.csv' AS line

MERGE (p:Product {name:"Product"})
MERGE (c:Car {name: line.Product, rating1:toInteger(line.`1`), rating2:toInteger(line.`2`), rating3:toInteger(line.`3`), rating4:toInteger(line.`4`), rating5:toInteger(line.`5`), total: toInteger(line.Total)})
MERGE (p)-[:PRODUCT_WITH_RATINGS]->(c)

Result:

2. A slight modification on koji's response:
LOAD CSV WITH HEADERS FROM 'file:///amelia.csv' AS line

MERGE (p:Product {name:"Product"})
MERGE (c:Car {name: line.Product})
CREATE (r:TotalRatings {total: toInteger(line.Total)})
MERGE (p)-[:PRODUCT]->(c)
CREATE (c)-[:TOTAL_RATINGS]->(r)

CREATE (r1:Ratings {id: 1})
CREATE (r2:Ratings {id: 2})
CREATE (r3:Ratings {id: 3})
CREATE (r4:Ratings {id: 4})
CREATE (r5:Ratings {id: 5})
CREATE (r)-[:RATING1 {count:toInteger(line.`1`)}]->(r1)
CREATE (r)-[:RATINGS2 {count:toInteger(line.`2`)}]->(r2)
CREATE (r)-[:RATINGS3 {count:toInteger(line.`3`)}]->(r3)
CREATE (r)-[:RATINGS4 {count:toInteger(line.`4`)}]->(r4)
CREATE (r)-[:RATINGS5 {count:toInteger(line.`5`)}]->(r5)

Result:

koji · ‎06-14-2021

@ameyasoft Thanks for adding toInteger(). I forgot to add it.

Neo4j

Create Relationship based on the column header