cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to export data with path id of each path in neo4j

nishspeak
Node Clone

I have created a graph with 3 paths as below.

Data:
a,b
b,c
d,e
f,e
g,h

Graph
a->b->c
d->e<-f
g->h

Desired outupt
a,uuid_1
b,uuid_1
c,uuid_1
d,uuid_2
e,uuid_2
f,uuid_2
g,uuid_3
h,uuid_3

Note : I have 50 million nodes.

1 ACCEPTED SOLUTION

Issue resolved..

I just added return clause at the end.

Thanks a lot for the solution.

View solution in original post

17 REPLIES 17

Wouldn't g and h return just 78, in keeping with the pattern? You have no 'node 9' from what I can see.

Aside from that, what is the problem you're actually trying to solve? There may be simpler options.

Actually, in desired output second column would be any unique id(uuid) nothing like node id.
I just want to assign a unique id to each graph created in database so i can export in csv.

Assigning an id to each node as you create them and ensuring the ids are unique would be straightforward:

// create a unique constraint
CREATE CONSTRAINT unique_id on (n:Node) ASSERT n.id IS UNIQUE

// set id as you create nodes
CREATE (n:Node) set n.id = 1

But I still get the feeling that's not what your looking for, can you try rephrasing your question?

I want to assign unique id at group level not at node level, here 3 group have been created.
Each group will have separate unique id and each node of a group will share the same unique id of corresponding group. It can be added as new property of nodes.

I have made some changes in my desired output.
Thanks is advance.

With 50 million nodes, there must be a large number of groups too I guess?

What defines a group? I mean, what is the underlying logic you use to turn a->b->c into group 1?

Do you have group nodes with properties besides an id, or is it just a way of encapsulating the path a->b->c?

With 50 million nodes, there must be a large number of groups too I guess?
yes
What defines a group? I mean, what is the underlying logic you use to turn a->b->c into group 1?
yes
Do you have group nodes with properties besides an id, or is it just a way of encapsulating the path a->b->c?
What does group nodes meaning?

Are there any (:Group) nodes in your graph? What defines a group from an outside perspective?

You could for example have (g:Group) where g.uuid = 1, and relate nodes (a), (b) and (c) to (g) somehow:

(a)-[:BELONGS_TO]->(g)

or alternatively attach a uuid property to the relationships you already have:

(a)-[:IS_GROUPED_WITH {uuid:1}]->(b)-[:IS_GROUPED_WITH {uuid:1}]->(c)

but it really comes down to how you plan out your data model.

only one type of nodes I have.

My Logic:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS
FROM 'file:///headers.csv' as line
MERGE (per1:person1 {person1: line.p1})
MERGE (per2:person1 {person1: line.p2})
CREATE (per1)-[:knows]->(per2)

Your label & property names are a bit confusing, having a 'person1' property on a 'person1' node will be hard to manage. You'll also probably have an easier time if you use the Neo4j conventions (node labels begin with uppercase letter, relationships all uppercase):

MERGE (p1:Person {id: line.p1})
MERGE (p2:Person {id: line.p2})
CREATE (p1)-[:KNOWS]->(p2)

But to try and solve your original issue - it looks like each "group" of people appears on 1 line from your CSV? If that's the case, and there is a property on each line to indicate the group number (e.g. p0), you could do:

MERGE (p1:Person {id: line.p1, groupId: line.p0})
MERGE (p2:Person {id: line.p2, groupId: line.p0})
CREATE (p1)-[:KNOWS]->(p2)

If there is no group id available, you could use apoc.load.csv to get a unique line number for each row of your csv, and make that the stand-in group id:

CALL apoc.load.csv('file:///headers.csv')
YIELD lineNo, list as line
MERGE (p1:Person {id: line.p1, groupId: lineNo})
MERGE (p2:Person {id: line.p2, groupId: lineNo})
CREATE (p1)-[:KNOWS]->(p2)

Thanks alot for the solution!

Actually, I have loaded the file already using neo4j import tool with relationship.
Now I just want to export data with group id as below(optimized way).

node,group_id
a,uuid_1
b,uuid_1
c,uuid_1
d,uuid_2
e,uuid_2
f,uuid_2
g,uuid_3
h,uuid_3

Logic you are suggesting would take so long time to upload.

If all of your nodes & relationships already exist in the graph, and you have no existing value for the group ids (they just need to be unique) you can use the apoc.path.subgraphNodes function to identify each unique cluster, and then label them with a randomly generated UUID (through another apoc function) to indicate their group:

match (p:Person) where p.groupId is null
with p, apoc.create.uuid() as newGroupId
call apoc.path.subgraphNodes(p, {relationshipFilter:"KNOWS", labelFilter:"Person"}) yield node as sibling
set p.groupId = newGroupId, sibling.groupId = newGroupId

Above solution working fine in small dataset.
But In case of big dataset (50 million node) its running forever.

I can load the csv again if neo4j has better option.

Thanks alot for the reply.

One of the periodic execution functions in apoc can probably help with that.

Thanks a lot, let me try this one.

nishspeak
Node Clone

I really appreciate for your help.

I have made minor changes in the query by replacing uuid with id of node.

In apoc.periodic.commit function what is the meaning of limit clause?
In my case Its only running for 10000 nodes only which I passed in limit size. Its supposed to be run for all nodes in batch of limit size.

call apoc.periodic.commit(
'match (p:Person) where p.groupId is null with p limit  {limit}
call apoc.path.subgraphNodes(p, {relationshipFilter:"KNOWS", labelFilter:"Person"}) yield node as sibling
set p.groupId = id(p), sibling.groupId = id(p)',{limit:10000);

Thanks in advance.

Issue resolved..

I just added return clause at the end.

Thanks a lot for the solution.

I tried one more thing and this one also working fine.
But I need to check the performance of this solution.

CALL algo.unionFind('Person', 'KNOWS', 
{write: true,writeProperty: 'groupId'}) 
yield nodes RETURN nodes