cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Neo4j Desktop doesn't import all the data from my CSV file

jcct100
Node Clone

Hi there,

I have a CSV file with 100k rows of data, and I can see that when I run this cypher query below.

 

 

LOAD CSV WITH HEADERS FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv' AS row RETURN count(row)

 

 

 

When I try to import all the data to the Neo4J database. Only 98,956 data were imported into the database. Is there a limit on this? What do I need to do to import the rest (1044 rows missing)?

 

 

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
MERGE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})
ON CREATE SET
p.fullAddress = CASE row.formated_full_address WHEN "" THEN null ELSE row.formated_full_address END, p.isHospitality = true, p.primaryDescriptionText = CASE row.primaryDescriptionText WHEN "" THEN null ELSE row.primaryDescriptionText END, p.firmsName = CASE row.firmsName WHEN "" THEN null ELSE toLower(row.firmsName) END, p.houseNumberOfName = CASE row.formated_numberOfName WHEN "" THEN null ELSE row.formated_numberOfName END, p.subStreetLevelLineThree = CASE row.formated_subStreetLevelThree WHEN "" THEN null ELSE row.formated_subStreetLevelThree END, p.subStreetLevelLineTwo = CASE row.formated_subStreetLevelTwo WHEN "" THEN null ELSE row.formated_subStreetLevelTwo END, p.subStreetLevelLineOne = CASE row.formated_subStreetLevelOne WHEN "" THEN null ELSE row.formated_subStreetLevelOne END, p.street = CASE row.formated_street WHEN "" THEN null ELSE row.formated_street END, p.town = CASE row.formated_town WHEN "" THEN null ELSE row.formated_town END, p.postalDistrict= CASE row.formated_postalDistrict WHEN "" THEN null ELSE row.formated_postalDistrict END, p.county = CASE row.formated_county WHEN "" THEN null ELSE row.formated_county END, p.postCode = CASE row.postCode WHEN "" THEN null ELSE row.postCode END, p.totalArea = CASE row.totalArea WHEN "" THEN null ELSE row.totalArea END
RETURN p

 

 

There is no error message so I don't know where the problem is. 

Thank you

1 ACCEPTED SOLUTION

I ran your query first query four times.  Each time the count was 100,000. 

I modified your second query to 'create' each node and not save all the properties.  Creating will allow duplicates on the 'uniqueAddressReference' property.

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
CREATE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})

I than ran a query looking for duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return id, cnt

There were 804 'uniqueAddressReference' values that had a count greater than one. This means these 'uniqueAddressReference' had duplicate rows in the data. I than count the number of duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return sum(cnt-1)

The result was 1044. 

Screen Shot 2022-12-22 at 11.06.31 AM.png 

Using 'merge' in your import resulted the missing 1044 rows. 

View solution in original post

1 REPLY 1

I ran your query first query four times.  Each time the count was 100,000. 

I modified your second query to 'create' each node and not save all the properties.  Creating will allow duplicates on the 'uniqueAddressReference' property.

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRpa9qfhHBWuCoCAN2SQS07kvJRAcdrOSDs5KTo_zVQCNQldz2ksgL6GWbEdNJFzVnT8FbHeheXglkB/pub?gid=1498785689&single=true&output=csv'
AS row
CREATE (p:Place {uniqueAddressReference: toInteger(row.uniqueAddressReferenceNumber)})

I than ran a query looking for duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return id, cnt

There were 804 'uniqueAddressReference' values that had a count greater than one. This means these 'uniqueAddressReference' had duplicate rows in the data. I than count the number of duplicates:

match(p:Place)
with p.uniqueAddressReference as id, count(*) as cnt
where cnt > 1
return sum(cnt-1)

The result was 1044. 

Screen Shot 2022-12-22 at 11.06.31 AM.png 

Using 'merge' in your import resulted the missing 1044 rows.