Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-22-2022 06:56 AM - edited 12-22-2022 07:12 AM
Hi there,
I have a CSV file with 100k rows of data, and I can see that when I run this cypher query below.
LOAD CSV WITH HEADERS FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv' AS row RETURN count(row)
When I try to import all the data to the Neo4J database. Only 98,956 data were imported into the database. Is there a limit on this? What do I need to do to import the rest (1044 rows missing)?
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
MERGE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})
ON CREATE SET
p.fullAddress = CASE row.formated_full_address WHEN "" THEN null ELSE row.formated_full_address END, p.isHospitality = true, p.primaryDescriptionText = CASE row.primaryDescriptionText WHEN "" THEN null ELSE row.primaryDescriptionText END, p.firmsName = CASE row.firmsName WHEN "" THEN null ELSE toLower(row.firmsName) END, p.houseNumberOfName = CASE row.formated_numberOfName WHEN "" THEN null ELSE row.formated_numberOfName END, p.subStreetLevelLineThree = CASE row.formated_subStreetLevelThree WHEN "" THEN null ELSE row.formated_subStreetLevelThree END, p.subStreetLevelLineTwo = CASE row.formated_subStreetLevelTwo WHEN "" THEN null ELSE row.formated_subStreetLevelTwo END, p.subStreetLevelLineOne = CASE row.formated_subStreetLevelOne WHEN "" THEN null ELSE row.formated_subStreetLevelOne END, p.street = CASE row.formated_street WHEN "" THEN null ELSE row.formated_street END, p.town = CASE row.formated_town WHEN "" THEN null ELSE row.formated_town END, p.postalDistrict= CASE row.formated_postalDistrict WHEN "" THEN null ELSE row.formated_postalDistrict END, p.county = CASE row.formated_county WHEN "" THEN null ELSE row.formated_county END, p.postCode = CASE row.postCode WHEN "" THEN null ELSE row.postCode END, p.totalArea = CASE row.totalArea WHEN "" THEN null ELSE row.totalArea END
RETURN p
There is no error message so I don't know where the problem is.
Thank you
Solved! Go to Solution.
12-22-2022 08:07 AM
I ran your query first query four times. Each time the count was 100,000.
I modified your second query to 'create' each node and not save all the properties. Creating will allow duplicates on the 'uniqueAddressReference' property.
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
CREATE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})
I than ran a query looking for duplicates:
match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return id, cnt
There were 804 'uniqueAddressReference' values that had a count greater than one. This means these 'uniqueAddressReference' had duplicate rows in the data. I than count the number of duplicates:
match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return sum(cnt-1)
The result was 1044.
Using 'merge' in your import resulted the missing 1044 rows.
12-22-2022 08:07 AM
I ran your query first query four times. Each time the count was 100,000.
I modified your second query to 'create' each node and not save all the properties. Creating will allow duplicates on the 'uniqueAddressReference' property.
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
CREATE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})
I than ran a query looking for duplicates:
match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return id, cnt
There were 804 'uniqueAddressReference' values that had a count greater than one. This means these 'uniqueAddressReference' had duplicate rows in the data. I than count the number of duplicates:
match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return sum(cnt-1)
The result was 1044.
Using 'merge' in your import resulted the missing 1044 rows.
All the sessions of the conference are now available online