cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to get the rejected nodes when loading data from csv using MERGE

Hi Team,

I am using load csv to load the data in label using MERGE constraint on specific property.
I want to know list of nodes which failed while loading the data from csv.

Please suggest on how to achieve this!

Thanks !!

Regards
AM

14 REPLIES 14

Hi All,

Kindly suggest !

Regards
Akshat

Hello @akshat.mittal,

With the MERGE clause, you can use ON CREATE SET and ON MATCH SET: Documentation

So you could add a property to see if you match the node or not and after you will know which nodes have failed since they won't have a property Example

Regards,
Cobra

Hi Cobra,

Appreciate your help!

I am aware with Usage of Merge clause.
But here Problem is csv contains 1 lacs records and in every 8 hour new record gets introduced.

So I want to know the list of nodes which did not match the criteria given in Merge clause.

Regards
Akshat

What do you mean by contains 1 lacs records?

Do you have an example of criteria?

The second part of my previous message should do the trick no?

Regards,
Cobra

Hi Cobra,

CSV has more than 99000 records. let say if there are 10 records which will fail while loading.
Want to know a way to get the list of those 10 records.

Regards
Akshat

Hi @akshat.mittal,

Why don't you check if your records are complete in Python for example and after you load them in the database?

Regards,
Cobra

Hi Cobra,

I am not using python here. What do you mean by complete records ?
Unfortunately , what you are suggesting is not helping at all.

Thanks !

Regards
Akshat

Hi @akshat.mittal,

Why would the loading fail?

Are you using a driver or only Cypher?

Regards,
Cobra

Hi Cobra,

Loading will fail due to bad records. Using cypher for load csv and driver for jdbc load from database view.

Regards
Akshat

Why don't you check in Java if you don't have bad records? After you clean it, you can load it

Regards,
Cobra

Hi Cobra,

I am talking about cypher related stuff.
Is cypher good enough to handle this query-asked by me ?

Regards
Akshat

Can you show us some code you did to load your data?

Regards,
Cobra

Hi Cobra,

This is the code , I am using for loading data from csv.

USING PERIODIC COMMIT 1000
load csv with headers from "file:///MISSING_RAN_EQUIPMENTS.csv" as row with row
where row.EQUIP_NAME IS not null
MERGE(cont:CONTAINER { EQUIP_NAME: REPLACE(row.EQUIP_NAME,'"','')})
ON CREATE SET
cont.EQUIP_NAME=REPLACE(row.EQUIP_NAME,'"',''),
cont.BEARER_IPADDRESS=REPLACE(row.BEARER_IPADDRESS,'"',''),
cont.EQUIP_REFERENCE_MODULATION=REPLACE(row.EQUIP_REFERENCE_MODULATION,'"',''),
cont.EQUIP_RADIO_CONFIGURATION=REPLACE(row.EQUIP_RADIO_CONFIGURATION,'"',''),
cont.EQUIP_RX_LEVEL=REPLACE(row.EQUIP_RX_LEVEL,'"',''),
cont.EQUIP_HSI_SERVICEID=REPLACE(row.EQUIP_HSI_SERVICEID,'"',''),
cont.EQUIP_RINGNUMBER=REPLACE(row.EQUIP_RINGNUMBER,'"',''),
cont.MYCOM_NATIVENAME=REPLACE(row.MYCOM_NATIVENAME,'"',''),
cont.EQUIP_TRACKINGAREA=REPLACE(row.EQUIP_TRACKINGAREA,'"',''),
cont.EQUIP_TYPE=REPLACE(row.EQUIP_TYPE,'"',''),
cont.SIGNALING_IPADDRESS=REPLACE(row.SIGNALING_IPADDRESS,'"',''),
cont.PARENT_EQUIP_NAME=REPLACE(row.PARENT_EQUIP_NAME,'"',''),
cont.EQUIP_COVERAGE_CIRCLE=REPLACE(row.EQUIP_COVERAGE_CIRCLE,'"',''),
cont.EQUIP_DATA_PORT_NO=REPLACE(row.EQUIP_DATA_PORT_NO,'"',''),
cont.SERIAL_NUMBER=REPLACE(row.SERIAL_NUMBER,'"',''),
cont.TAKEOFF_SAPID=REPLACE(row.TAKEOFF_SAPID,'"',''),
cont.EQUIP_STATUS=REPLACE(row.EQUIP_STATUS,'"',''),
cont.EQUIP_NMS_EMS=REPLACE(row.EQUIP_NMS_EMS,'"',''),
cont.EQUIP_EQ_CLS=REPLACE(row.EQUIP_EQ_CLS,'"',''),
cont.EQUIP_SITE_NAME=REPLACE(row.EQUIP_SITE_NAME,'"',''),
cont.EQUIP_MAINTENANCEPOINT=REPLACE(row.EQUIP_MAINTENANCEPOINT,'"',''),
cont.TAKEOFF_FRIENDLY_NAME=REPLACE(row.TAKEOFF_FRIENDLY_NAME,'"',''),
cont.EQUIP_NAME=REPLACE(row.EQUIP_NAME,'"',''),
cont.EQUIP_MODEL=REPLACE(row.EQUIP_MODEL,'"',''),
cont.EQUIP_ECS_SERVICEID=REPLACE(row.EQUIP_ECS_SERVICEID,'"',''),
cont.EQUIP_LEVEL=REPLACE(row.EQUIP_LEVEL,'"',''),
cont.EQUIP_MM_SERVICEID=REPLACE(row.EQUIP_MM_SERVICEID,'"',''),
cont.ECGI=REPLACE(row.ECGI,'"',''),
cont.EQUIP_EMS_TYPE=REPLACE(row.EQUIP_EMS_TYPE,'"',''),
cont.EQUIP_NEID=REPLACE(row.EQUIP_NEID,'"',''),
cont.EQUIP_CUSTOMER=REPLACE(row.EQUIP_CUSTOMER,'"',''),
cont.EQUIP_CLEI=REPLACE(row.EQUIP_CLEI,'"',''),
cont.EQUIP_IPV6=REPLACE(row.EQUIP_IPV6,'"',''),
cont.EQUIP_IPV6_DEFAULT_GATEWAY=REPLACE(row.EQUIP_IPV6_DEFAULT_GATEWAY,'"',''),
cont.EQUIP_IPV4=REPLACE(row.EQUIP_IPV4,'"',''),
cont.EQUIP_NATIVENAME=REPLACE(row.EQUIP_NATIVENAME,'"',''),
cont.SAPID=REPLACE(row.SAPID,'"',''),
cont.EQUIP_EMS_INSTANCENAME=REPLACE(row.EQUIP_EMS_INSTANCENAME,'"',''),
cont.EQUIP_POLARIZATION=REPLACE(row.EQUIP_POLARIZATION,'"',''),
cont.EQUIP_CUSTOMERTYPE=REPLACE(row.EQUIP_CUSTOMERTYPE,'"',''),
cont.EQUIP_VENDOR=REPLACE(row.EQUIP_VENDOR,'"',''),
cont.EQUIP_DEVICE_CODE=REPLACE(row.EQUIP_DEVICE_CODE,'"',''),
cont.EQUIP_RING_TYPE=REPLACE(row.EQUIP_RING_TYPE,'"',''),
cont.EQUIP_PKEY=REPLACE(row.EQUIP_PKEY,'"','')
ON MATCH SET
cont.EQUIP_NAME=REPLACE(row.EQUIP_NAME,'"',''),
cont.BEARER_IPADDRESS=REPLACE(row.BEARER_IPADDRESS,'"',''),
cont.EQUIP_REFERENCE_MODULATION=REPLACE(row.EQUIP_REFERENCE_MODULATION,'"',''),
cont.EQUIP_RADIO_CONFIGURATION=REPLACE(row.EQUIP_RADIO_CONFIGURATION,'"',''),
cont.EQUIP_RX_LEVEL=REPLACE(row.EQUIP_RX_LEVEL,'"',''),
cont.EQUIP_HSI_SERVICEID=REPLACE(row.EQUIP_HSI_SERVICEID,'"',''),
cont.EQUIP_RINGNUMBER=REPLACE(row.EQUIP_RINGNUMBER,'"',''),
cont.MYCOM_NATIVENAME=REPLACE(row.MYCOM_NATIVENAME,'"',''),
cont.EQUIP_TRACKINGAREA=REPLACE(row.EQUIP_TRACKINGAREA,'"',''),
cont.EQUIP_TYPE=REPLACE(row.EQUIP_TYPE,'"',''),
cont.SIGNALING_IPADDRESS=REPLACE(row.SIGNALING_IPADDRESS,'"',''),
cont.PARENT_EQUIP_NAME=REPLACE(row.PARENT_EQUIP_NAME,'"',''),
cont.EQUIP_COVERAGE_CIRCLE=REPLACE(row.EQUIP_COVERAGE_CIRCLE,'"',''),
cont.EQUIP_DATA_PORT_NO=REPLACE(row.EQUIP_DATA_PORT_NO,'"',''),
cont.SERIAL_NUMBER=REPLACE(row.SERIAL_NUMBER,'"',''),
cont.TAKEOFF_SAPID=REPLACE(row.TAKEOFF_SAPID,'"',''),
cont.EQUIP_STATUS=REPLACE(row.EQUIP_STATUS,'"',''),
cont.EQUIP_NMS_EMS=REPLACE(row.EQUIP_NMS_EMS,'"',''),
cont.EQUIP_EQ_CLS=REPLACE(row.EQUIP_EQ_CLS,'"',''),
cont.EQUIP_SITE_NAME=REPLACE(row.EQUIP_SITE_NAME,'"',''),
cont.EQUIP_MAINTENANCEPOINT=REPLACE(row.EQUIP_MAINTENANCEPOINT,'"',''),
cont.TAKEOFF_FRIENDLY_NAME=REPLACE(row.TAKEOFF_FRIENDLY_NAME,'"',''),
cont.EQUIP_NAME=REPLACE(row.EQUIP_NAME,'"',''),
cont.EQUIP_MODEL=REPLACE(row.EQUIP_MODEL,'"',''),
cont.EQUIP_ECS_SERVICEID=REPLACE(row.EQUIP_ECS_SERVICEID,'"',''),
cont.EQUIP_LEVEL=REPLACE(row.EQUIP_LEVEL,'"',''),
cont.EQUIP_MM_SERVICEID=REPLACE(row.EQUIP_MM_SERVICEID,'"',''),
cont.ECGI=REPLACE(row.ECGI,'"',''),
cont.EQUIP_EMS_TYPE=REPLACE(row.EQUIP_EMS_TYPE,'"',''),
cont.EQUIP_NEID=REPLACE(row.EQUIP_NEID,'"',''),
cont.EQUIP_CUSTOMER=REPLACE(row.EQUIP_CUSTOMER,'"',''),
cont.EQUIP_CLEI=REPLACE(row.EQUIP_CLEI,'"',''),
cont.EQUIP_IPV6=REPLACE(row.EQUIP_IPV6,'"',''),
cont.EQUIP_IPV6_DEFAULT_GATEWAY=REPLACE(row.EQUIP_IPV6_DEFAULT_GATEWAY,'"',''),
cont.EQUIP_IPV4=REPLACE(row.EQUIP_IPV4,'"',''),
cont.EQUIP_NATIVENAME=REPLACE(row.EQUIP_NATIVENAME,'"',''),
cont.SAPID=REPLACE(row.SAPID,'"',''),
cont.EQUIP_EMS_INSTANCENAME=REPLACE(row.EQUIP_EMS_INSTANCENAME,'"',''),
cont.EQUIP_POLARIZATION=REPLACE(row.EQUIP_POLARIZATION,'"',''),
cont.EQUIP_CUSTOMERTYPE=REPLACE(row.EQUIP_CUSTOMERTYPE,'"',''),
cont.EQUIP_VENDOR=REPLACE(row.EQUIP_VENDOR,'"',''),
cont.EQUIP_DEVICE_CODE=REPLACE(row.EQUIP_DEVICE_CODE,'"',''),
cont.EQUIP_RING_TYPE=REPLACE(row.EQUIP_RING_TYPE,'"',''),
cont.EQUIP_PKEY=REPLACE(row.EQUIP_PKEY,'"','')
RETURN " PERFORMING CONTAINER DELTA LOAD FOR UPDATE RECORDS : " +count(cont);

Regards
Akshat

To be honest, Cypher is not the best tool to check if your CSV is clean, it will be very durty

The best practice is to try to handle complex cleanup/manipulation before loading. So in your case, in JAVA. But I prefer to do it python, it's 3 lines of code

Regards,
Cobra