cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to reference the relationship variable when loading from csv?

lingvisa
Graph Fellow
_start  _end _type _rank
 1      2   coachAt 20

This is the format of the relationship file, and this is my code:

CALL apoc.load.csv('sports.csv', {sep:'\t'}) 
yield map as row
MATCH (h:Coach{ID: row._start})
MATCH (t:Team{ID: row._end})
MERGE (h)-[:row._type]->(t)
return row._type

This reports an error:

Invalid input '.': expected an identifier character, whitespace, '|', a length specification, a property map or ']' (line 5, column 16 (offset: 243))
"MERGE (h)-[:row._type]->(t)"

It doesn't reports an error for the two variables:

row._start
row._end

But the 'row._type' is not allowed. So how to reference the relationship through the variable row._type? I think it has the same problem when I use the pure cypher 'LOAD CSV' without using apoc plugin.

For the same reason, how to reference the '_rank' variable of the row, which is a property of the relationship. I don't want to explicitly specify the relationship name and potentially any number of relation properties, which are the columns after the '_type' column. I want a way to create relationships with their properties in the merge clause. Something like:
MERGE (h)-[:row._type{_rank:5, _duration:1year}]->(t)

6 REPLIES 6

Cypher does not natively support dynamic labels. You have two options here:

1. Use a property instead

CALL apoc.load.csv('sports.csv', {sep:'\t'}) YIELD map AS row
MATCH (h:Coach{ID: row._start})
MATCH (t:Team{ID: row._end})
MERGE (h)-[r:COACHED_BY]->(t)
SET r.type = row._type
RETURN h, r, t

2. apoc.create.relationship

CALL apoc.load.csv('sports.csv', {sep:'\t'}) YIELD map AS row
MATCH (h:Coach{ID: row._start})
MATCH (t:Team{ID: row._end})
CALL apoc.create.relationship(h, row._type, {}, t) YIELD rel
RETURN rel

The first option 'COACHED_BY', which is a temporary and fake value, and reset it immediately. This is smart. The second one is neat. This solved my first part of question.

As for the 2nd part, if after the relation type column, '_type', there are a number of columns, which are relationship's properties. Is there a way to access all the properties without explicitly specifying each property name/value pair? Because I have to do it programmatically, whatever relationship properties will be added to the cypher query?

If I use either way you suggested to create relationship, how to achieve something like this:
MERGE (h)-[:row._type{_rank:5, _duration:1year}]->(t)

The '_rank' and '_duration' will be variables, not explicit specifications.

Okay, what are you trying to do? It appears that you're misunderstanding something about the way Neo4j nodes work, and how to interact with the data. I strongly advise going through the Neo4j intros available in the browser via :play intro, :play concepts, and :play cypher. Additionally, a exploration of the Neo4j starting guide, and getting to know graph data modeling.

Getting your data in a graph

If at all possible, you should avoid using dynamic name for labels and properties, which is what your inquiry suggests. If, instead, you're simply looking to store the _rank and _duration values in the relationship, then put those in properties on the relationship.

CALL apoc.load.csv('sports.csv', {sep:'\t'}) YIELD map AS row
MERGE (h:Coach{ID: row._start})
MERGE (t:Team{ID: row._end})
MERGE (h)-[r:COACHED_BY]->(t)
SET
    r.type = row._type,
    r.rank = row._rank,
    r.duration = duration({years: 1})
RETURN h, r, t

Some tips for better modeling, and understanding of Neo4j

There are a few basic patterns to follow that will make your life easier down the road:

  • Relationships from (child)-[to]->(parent) , or (many)-[to]->(one)
  • Label names descriptive of a single node of that label:
    • (:Person {name: "Bob"}) and (:People {group-name: "Matrix", people: ["Mouse", "Apoc", "Switch"]}) , not (:People {name: "Joe"}) .
  • Identities are key. If you can define a way to uniquely identify each node in your data, from your data source (csv, json, etc.), everything gets easier.
    • Define indices for those identity properties with CREATE INDEX
    • Import all data with MERGE (carefully), so you can re-run the same data without breaking anything.
      • MERGE (:Thing {id: 2})-[:REL]->(:Other {id: 9}) is not the same as MERGE (a:Thing {id: 2}) MERGE (b:Other {id: 9)) MERGE (a)-[:REL]->(b)

@tony.chiboucas Thanks for the resource pointers. Let me describe this way:

I have 10 CSV files, and each represents a label, with different properties, and relationships among some labels, and some relations may have properties as well. This is just normal situation.

One relationship file look like this:

_start  _end _type _rank
 1      2   coachAt 20

And another relationship file may look like this:

_start  _end _type _since.  _end
 1      2   participateIn 2009.  2020

When I load relationships, since the structure of the file may be different, I can't always explicitly specify the property names. In the example data, the first set of relationship property is '_rank', and the second is '_since, _end'. More specifically, what is for sure is that from the 4th columns, all are relationship properties, and I need a way to refer to represent them in the cypher, and I don't necessarily know their names.

Probably what I should do is to define all the node types, node properties, relationship names and relationship properties in some configuration file, and as I load each file, I exactly know what columns and relation types, properties exist and use them in the cypher query formation process, instead of trying to figure them out on the fly in the LOAD CSV statements.

Is this the right thinking?

Loading a CSV with dynamic columns is not something you can do with Cypher alone. IMO, manually curate a list of all the CSV formats you'll be loading, and create a different Cypher command for each of them. It may seem like more work now, but you'll thank me later.

Baring that, next best alternative would be to write your own procedure. Fair warning, it's tricky, but worth it. You'll probably end up spending 10x more time writing the procedure, than just tweaking 10 or 20 Cypher commands.

Yes, I will go for the pre-defined solution, which makes sense.