Neo4j

jcct100 · ‎12-22-2022

Hi there,

I have a CSV file with 100k rows of data, and I can see that when I run this cypher query below.

LOAD CSV WITH HEADERS FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv' AS row RETURN count(row)

When I try to import all the data to the Neo4J database. Only 98,956 data were imported into the database. Is there a limit on this? What do I need to do to import the rest (1044 rows missing)?

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
MERGE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})
ON CREATE SET
p.fullAddress = CASE row.formated_full_address WHEN "" THEN null ELSE row.formated_full_address END, p.isHospitality = true, p.primaryDescriptionText = CASE row.primaryDescriptionText WHEN "" THEN null ELSE row.primaryDescriptionText END, p.firmsName = CASE row.firmsName WHEN "" THEN null ELSE toLower(row.firmsName) END, p.houseNumberOfName = CASE row.formated_numberOfName WHEN "" THEN null ELSE row.formated_numberOfName END, p.subStreetLevelLineThree = CASE row.formated_subStreetLevelThree WHEN "" THEN null ELSE row.formated_subStreetLevelThree END, p.subStreetLevelLineTwo = CASE row.formated_subStreetLevelTwo WHEN "" THEN null ELSE row.formated_subStreetLevelTwo END, p.subStreetLevelLineOne = CASE row.formated_subStreetLevelOne WHEN "" THEN null ELSE row.formated_subStreetLevelOne END, p.street = CASE row.formated_street WHEN "" THEN null ELSE row.formated_street END, p.town = CASE row.formated_town WHEN "" THEN null ELSE row.formated_town END, p.postalDistrict= CASE row.formated_postalDistrict WHEN "" THEN null ELSE row.formated_postalDistrict END, p.county = CASE row.formated_county WHEN "" THEN null ELSE row.formated_county END, p.postCode = CASE row.postCode WHEN "" THEN null ELSE row.postCode END, p.totalArea = CASE row.totalArea WHEN "" THEN null ELSE row.totalArea END
RETURN p

There is no error message so I don't know where the problem is.

Thank you

glilienfield · ‎12-22-2022

I ran your query first query four times. Each time the count was 100,000.

I modified your second query to 'create' each node and not save all the properties. Creating will allow duplicates on the 'uniqueAddressReference' property.

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
CREATE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})

I than ran a query looking for duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return id, cnt

There were 804 'uniqueAddressReference' values that had a count greater than one. This means these 'uniqueAddressReference' had duplicate rows in the data. I than count the number of duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return sum(cnt-1)

The result was 1044.

Using 'merge' in your import resulted the missing 1044 rows.

View solution in original post

glilienfield · ‎12-22-2022