Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-27-2021 11:28 AM
Hello!
I am new to Neo4j and attempting to bring over another graph database that I have structured in a database as, Name1, Entity1, Relationship1, Relationship2, Name2, Entity2, Source. As an example, it might say: Julie, Person, has employer, has employee, Company, Costco, www.linkedin.com.
I have been toying with different ways of importing the data and the best I have found is to load a CSV file with each entity type/name. For example, one file for all the Person nodes with their labels and then one with Company nodes.
Then loaded a CSV with the relationships using the following CSV Load in attempts to get the following to get a basic structure of:
Name1, Relationship 1, Name2
Or
Julie has employer Costco
LOAD CSV FROM "http://localhost:11001/project-ba2a314e-1c09-4a9b-b568-3bc04a75ad42/COMPLETE.csv" AS line
MERGE (n1:Name1 {name:line[1]})
MERGE (n2:Name2 {name:line[5]})
WITH n1, n2, line
CALL apoc.merge.relationship(n1,line[1],{},{},n2) YIELD rel
RETURN n1, n2, line
Problem is, even with this basic load I am getting a duplicate node for both Julie and Costco. I am not sure how to correct this issue or if there is an entirely better way to load the data. I have nearly 30,000 nodes structured this way so I would prefer to not have to do any massive data massaging in possible.
Solved! Go to Solution.
05-28-2021 06:46 PM
Hi Kathleen.
So a few questions.
(n:Name1 {name:'Costco'})
and for (n:Name2 {name:'Costco'})
. This will create the duplication for all the values that show up in both Name1 and Name2 columns. Cypher will not let you dynamically set Labels out of the box, but you can use APOC.LOAD CSV WITH HEADERS FROM 'https:path.to.file' as line
CALL apoc.merge.node([line['Entity1']], {name:line['Name1']}) yield node as n1
CALL apoc.merge.node([line['Entity2']], {name:line['Name2']}) yield node as n2
CALL apoc.merge.relationship(n1,line['Relationship1'],{},{},n2) YIELD rel as r1
CALL apoc.merge.relationship(n2,line['Relationship2'],{},{},n1) YIELD rel as r2
return n1, n2, r1, r2
This should solve the duplication problem.
Lastly, if you don't mind me asking, what's the use case for having each the employee and the employer have a different relationship? Once the nodes are connected the single relationship should suffice, unless having the direction of the relationship is important to your use case.
05-28-2021 04:30 AM
Hi Kathleen,
I it sounds like you are on the right path when trying to split your entities and relationships into seperate .csv files. Speaking from experience, is it possible there are some trailing white spaces or some some other data issues in the .csv files? Do you mind posting what the first few lines of your .csv looks like if it fits here?
05-28-2021 04:47 AM
Also, you have line[1]
as both a name property and being set as the relationshipType
in the apoc.merge.relationship function.
05-28-2021 08:48 AM
Hi John,
Thank you for the reply and you are definitely right, line[1] should be line[3]. I made the adjustment. I also trimmed all the csv docs again and reloaded them. I am having the same issue. I believe the issue has to do with the relationship data csv not recognizing if the entity is a person or company and simply uploading the node type as Name1 and Name2, if that makes sense?
I attached here a sample of the data:
05-28-2021 04:49 AM
Hi John
You can try algorithm like louvain similarity and decide it by yourself.
Thanking you
Yours faithfully
Sameer SG
05-28-2021 04:53 AM
Hi Sameer,
I'm not sure I understand your question. Is this related to the possible issues in the data?
Thanks!
-Matt
05-28-2021 08:53 AM
[{'\ufeffID': '190004',
'Name1': 'Joe Smith',
'Entity1': 'Person',
'Relationship1': 'has employer',
'Relationship2': 'has employee',
'Name2': 'Costco',
'Entity2': 'Company',
'Source': 'costco website'},
{'\ufeffID': '190005',
'Name1': 'Daisy Duke',
'Entity1': 'Person',
'Relationship1': 'has employer',
'Relationship2': 'has employee',
'Name2': 'Trader Joes',
'Entity2': 'Company',
'Source': 'traderjoes website'},
{'\ufeffID': '190006',
'Name1': 'Costco',
'Entity1': 'Company',
'Relationship1': 'has competitor',
'Relationship2': 'has competitor',
'Name2': 'Walmart',
'Entity2': 'Company',
'Source': 'costco website'},
{'\ufeffID': '190007',
'Name1': 'Timothy Neal',
'Entity1': 'Person',
'Relationship1': 'shops at',
'Relationship2': 'has shopper',
'Name2': 'Costco',
'Entity2': 'Company',
'Source': 'costco website'},
{'\ufeffID': '190008',
'Name1': 'Trader Joes',
'Entity1': 'Company',
'Relationship1': 'has shopper',
'Relationship2': 'shops at',
'Name2': 'Daisy Duke',
'Entity2': 'Person',
'Source': 'traderjoes website'},
{'\ufeffID': '190009',
'Name1': 'Safeway',
'Entity1': 'Company',
'Relationship1': 'has employee',
'Relationship2': 'has employer',
'Name2': 'Timothy Neal',
'Entity2': 'Person',
'Source': 'safeway website'},
{'\ufeffID': '190010',
'Name1': 'Aldi',
'Entity1': 'Company',
'Relationship1': 'has employee',
'Relationship2': 'has employer',
'Name2': 'Bob Smith',
'Entity2': 'Person',
'Source': 'aldi website'},
{'\ufeffID': '190011',
'Name1': 'Lidl',
'Entity1': 'Company',
'Relationship1': 'has employee',
'Relationship2': 'has employer',
'Name2': 'Jordan Julie',
'Entity2': 'Person',
'Source': 'lidl website'},
{'\ufeffID': '190012',
'Name1': 'Grocery Store Association',
'Entity1': 'Company',
'Relationship1': 'has Members',
'Relationship2': 'Member of',
'Name2': 'Aldi',
'Entity2': 'Company',
'Source': 'groceries website'},
{'\ufeffID': '190013',
'Name1': 'Safeway',
'Entity1': 'Company',
'Relationship1': 'Member of',
'Relationship2': 'has Member',
'Name2': 'Grocery Store Association',
'Entity2': 'Company',
'Source': 'safeway website'}]
05-28-2021 09:43 AM
Great thank you! Let me check a few things and I'll check back with you.
05-28-2021 06:46 PM
Hi Kathleen.
So a few questions.
(n:Name1 {name:'Costco'})
and for (n:Name2 {name:'Costco'})
. This will create the duplication for all the values that show up in both Name1 and Name2 columns. Cypher will not let you dynamically set Labels out of the box, but you can use APOC.LOAD CSV WITH HEADERS FROM 'https:path.to.file' as line
CALL apoc.merge.node([line['Entity1']], {name:line['Name1']}) yield node as n1
CALL apoc.merge.node([line['Entity2']], {name:line['Name2']}) yield node as n2
CALL apoc.merge.relationship(n1,line['Relationship1'],{},{},n2) YIELD rel as r1
CALL apoc.merge.relationship(n2,line['Relationship2'],{},{},n1) YIELD rel as r2
return n1, n2, r1, r2
This should solve the duplication problem.
Lastly, if you don't mind me asking, what's the use case for having each the employee and the employer have a different relationship? Once the nodes are connected the single relationship should suffice, unless having the direction of the relationship is important to your use case.
05-31-2021 01:00 PM
Hi! Thank you! I am going to take a crack out of this later today and see if I can get it to work. I don't need to have both sides of the relationship in the data. I just have that set up in the database from old system I used to model data.
06-01-2021 11:12 AM
This worked perfectly! Thank you!
05-29-2021 10:30 PM
If you have already created Name1 and Name2 nodes then you should use 'MATCH' in place of 'MERGE'. Expecting that Name1 and Name2 have a property, 'entity' that stores 'person' /'company' values.
Using MERGE statement with only name , the system cannot find Name1 or Name2 node just with name property and hence creates a new nodes for Person and Company. Better approach would have been having nodes with labels 'Person' and 'Company'. Also I prefer to have one relationship between Person and Company as (joe Smith)-[:HAS_EMPLOYER]->(Costco) or (joe Smith)-[:EMPLOYEE_OF]->(Costco)
LOAD CSV FROM "http://localhost:11001/project-ba2a314e-1c09-4a9b-b568-3bc04a75ad42/COMPLETE.csv 1" AS line
MATCH (n1:Name1 {name:line[1]})
MATCH (n2:Name2 {name:line[5]})
WITH n1, n2, line
CALL apoc.merge.relationship(n1,line[1],{},{},n2) YIELD rel
RETURN n1, n2, line
06-01-2021 11:11 AM
Thank you for your input! It looks like I was able to fix it using johnmattgrogan's input.
All the sessions of the conference are now available online