cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

LOAD CSV. Import relationships between columns with some blank cells

Hello,

I want to import relationships from a CSV file. I have extracted the two relevant columns (there are more):

2X_7_7feb70c39c193bc0be0d46c6acd700ef762ae32a.png

I have already created the nodes that correspond to the first and second columns ('Associations
and 'tags'), and these columns contain identifiers of these two kind of nodes.

Examples of relationships:
ArochaCH --> #chrétiens
Alliancesudfr --> #à_gauche, #activistes
(some Associations have several tags!)

Why this layout with blanks ? The blank cells in the first column make it more readable for humans and the fact of using different cells for each tag, rather than a comma-separated list, allows for a validation filter in google sheets (the source).
However, I do not know how to import these relationships successfully with LOAD CSV, row by row, because of the blank cells.

-> Perhaps there is a neo4j command to indicate that the value of blank cells is the same as the first-non-null value above (e.g., cell 7 should be understood as "Alliancesudfr")
-> Perhaps there is a more standard format for the EXCEL, which is user-readable and robust (with validation)

  • neo4j version: 3.5

Many thanks in advance!

3 REPLIES 3

intouch_vivek
Graph Steward

When you are trying to load csv file into neo4j then whenever a row encounter new line character it considers new row(complete dataset). In the new line when it see blank then that creates null node.
Even in excel the architecture of one person with multiple tags come differently.
try either

  1. have all cells filled with respective value.That you can do with any manually or with the use of any programming language.
  2. have tags separated with any other delimiter like "|" rather new line or "," and then use split function to store in properties

Thank you for your answer.

  1. Using a programming language or an intelligent ETL could be an option. Do you have any suggestion on how to proceed or links to documentation ? One idea for me would be Python
    https://towardsdatascience.com/accessing-google-spreadsheet-data-using-python-90a5bc214fd2
    Do you have a recommendation on how to connect python and neo4j ? Bolt Driver perhaps ?
  2. Using a list inside a cell is not an option because it does not allow for Data Validation in google spreadsheets, as explained

I suggest is to use Python and fill each name appropriately. You can achieve using pandas.
Once your data set is ready either you use merge clause to load the nodes Or match to load the nodes and relationship and then merge node using APOC so that you have one user with multiple edges for their tags.