Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-27-2020 08:53 AM
I would like to remove small networks (connected components) that have less than x nodes.
So, if the network component has x nodes or less, the nodes and the edges that belong to this component will be deleted.
Is that doable ?
Solved! Go to Solution.
12-15-2020 12:30 AM
You didn't replace the id
property by EntityID
:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityIDIN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
08-28-2020 02:13 AM
Hello @1113
Yeah, it looks possible. Can you show us a little example with an image with what you want to keep and what you want to delete?
Regards,
Cobra
08-28-2020 05:34 AM
Hi,
Thank you for your reply
What I would like to remove are the nodes/edges in the red area.
Regards,
08-28-2020 05:50 AM
Can you execute CALL db.schema.visualization()
on your database and show us the result please?
08-28-2020 07:06 AM
I reduce the amount of data so it will probably be more clear.
Below an example of what I would like to do :
In the green circle the Nodes/Edges I would like to keep.
I would like to remove the rest because they are smaller than 6 nodes.
Attached the result of CALL db.schema.visualization()
Schema.txt (1.1 KB)
And The csv files that I used as data source
Edges.txt (1.3 KB) Nodes.txt (869 Bytes)
Thanks in advance !
08-28-2020 08:20 AM
This query should delete for example, connected components that have 10 nodes or less
You will need the APOC plugin installed on the database.
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sortText(apoc.coll.toSet(collect(DISTINCT b.id) + [a.id])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WHERE size <= 10
WITH nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
DETACH DELETE n
', {batchSize:1000, iterateList:true, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1
08-28-2020 02:00 PM
Thank you for your reply.
That looks great. I managed to installed apoc, run the query, 1 is returned, so the query seems to be executed with success. But the nodes and edges are still there :
I feel like this vodoo spell needs to be optimized 😉
08-28-2020 02:31 PM
Can you tell me the labels of your nodes and their properties?
08-28-2020 10:01 PM
08-29-2020 01:25 AM
I created an id property on my examples, that's why I'm asking.
Try this one, it use the Neo4j id:
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sort(apoc.coll.toSet(collect(DISTINCT id(b)) + [id(a)])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WHERE size <= 10
WITH nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE id(n) IN $nodes_list
RETURN n
', '
DETACH DELETE n
', {batchSize:1000, iterateList:true, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1
Regards,
Cobra
08-29-2020 07:03 AM
It works like charm.
I will try to understand this query.
Thank you very much !
08-29-2020 07:26 AM
Happy to hear
Don't hesitate if you have any trouble to understand my query
08-30-2020 01:33 PM
Hi,
Thank you for your feedback. I launch the query yesterday on a large database (5 million nodes, and 10 millions Edges) and the process is still running. So, I am not sure this way will fit my need.
What I do with Gephi : There's the possibility to run stats for a specific set of data. By running the stat based on network components, you obtain a component ID for each network. Then, you can filter out the networks that are smaller than a certain size.The pb with Gephi is that he can't manage big data.
Would it be possible to do more or less the same thing with Neo4j : First, obtaining some stats on the data and then filtering out unintersting data ?
Another approach would be to obtain these stats an rather than deleting small networks, make a query to obtain the list of a specific Nodes for all networks greater than a certain size.
I'm not suer I 'm very clear...
Base on your knoledge what would be the best option to that with a large database ?
Best regards,
08-30-2020 01:55 PM
Hello @1113 😉
First, did you use UNIQUE CONSTRAINTS to create your nodes?
Yeah, your way should be also possible on Neo4j, I will try tomorrow
Regards,
Cobra
08-30-2020 02:41 PM
Hi Cobra,
I didn't use unique constraints to create the nodes.
I will check the doc to determine how to do that.
Looking forward to get your feedback
Best regards,
@1113 😉
08-30-2020 10:55 PM
The UNIQUE CONSTRAINT should speed up the query, it's something to have when you work with Neo4j
08-30-2020 11:45 PM
Hi, I added the UNIQUE CONSTRAINT on all the entities (all are unique) and relaunched the query.
Let's see 🙂
Have a great day !
08-30-2020 11:51 PM
Can you tell me which property is unique? Like this we can use this one in the query I gave you
Thanks, you too!
08-31-2020 06:58 AM
Hi,
In fact I use several Entitities : Item1, Item2, ... they don't have any properties except their Label and they are all unique.
Best regards,
08-31-2020 07:15 AM
Ok, so we will have to create a community for each size and tag each node with his community in order to delete them but I don't know if it will be faster. In your case, the problem is you don't have a unique property, that's why everything takes time I think
08-31-2020 07:40 AM
This query will set a community_id
property for each node where the community_id
is the size of the network where the node is:
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sort(apoc.coll.toSet(collect(DISTINCT id(b)) + [id(a)])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WITH size, apoc.coll.flatten(collect(nodes_list)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE id(n) IN $nodes_list
SET n.community_id = $community_id
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
After if you want to delete the connected components that have less than 5 nodes:
CALL apoc.periodic.iterate('MATCH (n) WHERE n.community_id < 5 RETURN n', 'DETACH DELETE n', {batchSize:1000})
Regards,
Cobra
08-31-2020 02:00 PM
The query is running since a few hours. I will keep you posted tomorrow 😉
Thank you very much !
09-01-2020 12:57 AM
Hi Cobra,
The query is still running this morning. Do we have the possibility to know what % of the job is done ?
Have a great day !
09-01-2020 01:30 AM
Hi @1113
I'm confused because it should not be so long
Can you give me the configuration of your database?
How many nodes and relations do you have?
Did you use Hardware Sizing Calculator to choose your database?
With this query, you can get the percentage:
MATCH (a) RETURN toFloat(count(a.community_id)) / toFloat(count(a)) * 100
Regards,
Cobra
09-01-2020 05:10 AM
Hi Cobra,
Attached the DB configuration (I used the default config)
Neo4j-conf.txt (36.8 KB)
Regarding the number of nodes/relationships :
5,253,112 nodes (5 labels)
10,260,019 relationships (1 types)
I tried Hardware Sizing Calculator and here's the result :
Recommended System Requirements:
|Number of Cores|1|
|Size on Disk|1.0 GB|
Summary
Number of nodes 5,000,000
Number of relationships 10,000,000
Properties per Node 1
Properties per Relationship 1
Estimated graph size on disk 1.0 GB
Concurrent requests per second 1
Average request time 1 ms
The result of the query to obtain de % is : 0.0 (Strange isn't it ?)
Best regards ! 🙂
09-01-2020 05:22 AM
I think, you should increase the RAM of your database
09-01-2020 06:03 AM
I increased :
dbms.memory.heap.max_size=4G
and :
dbms.memory.pagecache.size=2G
Makes sense ?
Just relaunched the query with these new parameters
Let's try !
09-01-2020 06:19 AM
Ok, nice
You should try to add a unique property, for example create an id equal to the Neo4j id and put a unique constraint on it. After we could use it in the query and it should be faster.
09-01-2020 06:51 AM
I'm not sure to understand. At the moment I have :
Entity:ID,description:LABEL
232ecace75a347258eb690c045322173,Item1
b4ca6276726c4bd9997ca2650b7177b0,Item2
Do you mean I should have someting like :
Entity:ID,description:LABEL,Property:PROPERTY
232ecace75a347258eb690c045322173,Item1,UniqueString1
b4ca6276726c4bd9997ca2650b7177b0,Item2,UniqueString2
If so, Can I use the conacatenation of the ID and the label. So I would have :
Entity:ID,description:LABEL,Property:PROPERTY
232ecace75a347258eb690c045322173,Item1,232ecace75a347258eb690c045322173Item1
b4ca6276726c4bd9997ca2650b7177b0,Item2,b4ca6276726c4bd9997ca2650b7177b0Item2
09-01-2020 06:53 AM
Is Entity:ID unique for each node?
Can you execute CALL db.schema.visualization()
on your database and show us the screenshot please?
Can you take a screenshot of your labels and properties on the left please.
09-01-2020 12:27 PM
I reimported the data structured as :
Entity:ID,UniqEntity,description:LABEL
e53628fb3f714cbc9eb2546cecc7064c,e53628fb3f714cbc9eb2546cecc7064c,Item1
34c075e8781244bdb933c3539cdf167c,34c075e8781244bdb933c3539cdf167c,Item3
I add added a constraint on UniqEntity :
CREATE CONSTRAINT ON(l:UniqEntity) ASSERT l.id IS UNIQUE
(I hpe the syntax is ok (it seems to be))
And here's the screenshot :
Have a nice evening
09-01-2020 12:30 PM
Test this query:
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sort(apoc.coll.toSet(collect(DISTINCT b.id)) + [a.id])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WITH size, apoc.coll.flatten(collect(nodes_list)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
SET n.community_id = $community_id
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
After if you want to delete the connected components that have less than 5 nodes:
CALL apoc.periodic.iterate('MATCH (n) WHERE n.community_id < 5 RETURN n', 'DETACH DELETE n', {batchSize:1000})
09-01-2020 03:12 PM
I feel like my new import is not correct because I have the folowing message :
Invalid input ')': expected whitespace, '.', node labels, '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, AS, ',', ORDER, SKIP, LIMIT, WHERE, FROM GRAPH, USE GRAPH, CONSTRUCT, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE UNIQUE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, CALL, RETURN, UNION, ';' or end of input (line 2, column 83 (offset: 100))
"WITH id(a) AS id, apoc.coll.sort(apoc.coll.toSet(collect(DISTINCT b.id)) + [a.id])) AS nodes_list"
What is strange is that I don't have any special charatere the csv file. (except ',' as separator)
So, I'm a bit confused
09-02-2020 01:59 AM
Can you tell me what is the difference between your 3 properties (Entity, UniqEntity and id)?
There is a syntax error:
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sortText(apoc.coll.toSet(collect(DISTINCT b.id) + [a.id])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WITH size, apoc.coll.flatten(collect(nodes_list)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
SET n.community_id = $community_id
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
Regards,
Cobra
09-02-2020 02:17 AM
Hi Cobra,
Thank you, I just relaunched the Query.
Regarding your question, based on my CVS file :
Entity:ID,UniqEntity,description:LABEL
e53628fb3fy14cbc9eb2546cecc70645,e53628fb3fy14cbc9eb2546cecc70645,Item1
Have a great day !
09-02-2020 02:27 AM
Why did you not use Entity:ID for the unique constraint instead of duplicate it?
Have a great day too!
09-02-2020 03:24 AM
Ha sorry, I didn't understand correctly. So "UniqEntity" is useless, I gonna remove it, and I will put a constraint on ID and relaunch the query.
Can we keep the same query or should it be updated ?
Best regards,
09-02-2020 03:29 AM
If the property name is id
, you can use this one:
MATCH (a)-[*]-(b)
WITH id(a) AS id, apoc.coll.sortText(apoc.coll.toSet(collect(DISTINCT b.id) + [a.id])) AS nodes_list
WITH DISTINCT nodes_list, size(nodes_list) AS size
WITH size, apoc.coll.flatten(collect(nodes_list)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
SET n.community_id = $community_id
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
09-02-2020 03:54 AM
This requires more memory than the db size as Cypher needs to keep all these paths into memory. So, it can kick off garbage collection.
Have you tried the weakly connected components algo in GDS?
https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/
May be this can help identify the weekly connected components. Once you run you can determine smaller communities and delete them
09-02-2020 04:09 AM
Oh I forget this one
Yeah it could work
09-02-2020 04:45 AM
Thank you for the suggestion, I gonna try that.
Best regards !
09-02-2020 05:13 AM
Found that :
CALL gds.wcc.stream({
nodeProjection: "Library",
relationshipProjection: "DEPENDS_ON"
})
YIELD nodeId, componentId
RETURN componentId, collect(gds.util.asNode(nodeId).id) AS libraries
ORDER BY size(libraries) DESC;
would that be a good starting point ?
09-02-2020 08:22 AM
Yeah, good start!
If you want to do everything in one time (maybe you have to change the nodeProjection and the relationship Projection). In my query, it will delete communities which have less than 6 nodes.
CALL gds.wcc.stream({
nodeProjection: "Item",
relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WHERE size < 6
WITH apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1
If you want to do it in two times:
CALL gds.wcc.stream({
nodeProjection: "Item",
relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
CALL apoc.periodic.iterate('MATCH (n) WHERE n.community_id < $community_id RETURN n', 'DETACH DELETE n', {batchSize:1000, params:{community_id:6}})
Regards,
Cobra
09-03-2020 01:47 AM
Hi,
I tried to delete all the nodes to do another import with the following command :
match (a) -[r] -> () delete a, r
And after a while, I got this error message :
Neo.DatabaseError.Transaction.TransactionCommitFailed
Makes me think to a db settings issue (Maybe to root cause of the issue with the query no eonding ?)
My settings :
dbms.directories.import=import
dbms.security.auth_enabled=true
dbms.memory.heap.initial_size=512m
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=2G
dbms.tx_state.memory_allocation=ON_HEAP
dbms.connector.bolt.enabled=true
dbms.connector.http.enabled=true
dbms.connector.https.enabled=false
dbms.security.procedures.unrestricted=apoc.*
dbms.jvm.additional=-XX:+UseG1GC
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow
dbms.jvm.additional=-XX:+AlwaysPreTouch
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields
dbms.jvm.additional=-XX:+DisableExplicitGC
dbms.jvm.additional=-XX:MaxInlineLevel=15
dbms.jvm.additional=-Djdk.nio.maxCachedBufferSize=262144
dbms.jvm.additional=-Dio.netty.tryReflectionSetAccessible=true
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
dbms.jvm.additional=-XX:FlightRecorderOptions=stackdepth=256
dbms.jvm.additional=-XX:+UnlockDiagnosticVMOptions
dbms.jvm.additional=-XX:+DebugNonSafepoints
dbms.windows_service_name=neo4j
Seems ok to you ?
Have a great day !
09-03-2020 01:49 AM
Tried another time and got :
Neo.DatabaseError.Statement.ExecutionFailed
Java heap space
09-03-2020 01:55 AM
To delete everything in the database, you should use:
CALL apoc.periodic.iterate('MATCH (n) RETURN n', 'DETACH DELETE n', {batchSize:1000})
09-03-2020 02:32 AM
Thank you !
BTW, I increased dbms.memory.heap.max_size to 16G and the delete query have been executed
09-03-2020 04:01 AM
It's another way but it's always better to use the query I gave you
09-03-2020 06:02 AM
Hi Cobra,
The Query is sucesseful but it seems no nodes are deleted. The query terminates very fast too :
Query used :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
WHERE size < 16
WITH apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
DETACH DELETE n
', {batchSize:1000, params:{nodes_list:nodes_list}}) YIELD batch, operations
RETURN 1
09-03-2020 06:04 AM
Can you show me what is returned by:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH size(libraries) AS size, libraries
RETURN *
09-03-2020 06:20 AM
Sure. Here's the result :
09-03-2020 06:24 AM
Can you show me your properties on the right please? And tell me the one which is unique please
09-03-2020 06:41 AM
Sure. Here you go :
Here's the headers of my csv file :
Entity:ID,description:LABEL
Entity and ID is the same data . I added a unique constrainte on "Entity" even if I guess it is done automatically because Entity is used as ID.
Best regards
09-03-2020 06:48 AM
What is returned by:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).Entity) AS libraries
WITH size(libraries) AS size, libraries
RETURN *
09-03-2020 06:54 AM
Same result :
09-03-2020 06:56 AM
Try this:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId)) AS libraries
RETURN *
09-03-2020 07:15 AM
Same result, no data returned 🙂
09-03-2020 07:34 AM
There is something weird...
And?
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId, componentId
RETURN *
09-03-2020 07:54 AM
Same.
I tried also :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD nodeId
RETURN *
and
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "DEPENDS"
})
YIELD componentId
RETURN *
but same : no output
09-04-2020 12:32 AM
I have no idea...
You made it wotk before and now it's not working anymore...
Is GDS still installed?
Anyway, these queries in my message should solve the main issue but maybe you will have to change the property depending of the one which is unique
Regards,
Maxime
09-04-2020 05:52 AM
Hi Maxime,
Yes GDS is still installed. A nother approach would be for each node, assoicated a network componentID and the number of nodes associated to this Component. Do you think that would be possible to do that ?
Best regards
09-04-2020 06:43 AM
I'm sorry but I already put you the two ways on a previous message and both requests are working on my local database and I use the same labels and properties as yours
I don't know what to try anymore, maybe create a completely new database and retry. It was working on your database and now it doesn't...
09-04-2020 07:24 AM
The first query works but not for a large dataset. I will continue to seach why the second approach doesn't work. Anyway, I would like to thank you very much for your precious help.
I will keep you posted if I find a solution 😉
09-09-2020 03:09 AM
Hi Maxime,
Good news, I found the source if the issue : It was my csv headers who were not correct.
I fixed that but the query is still very, very slow. I started the query yesterday night and this morning, it was still running. I will create a new thread for this point with details 😉
Have a great day !
09-09-2020 03:17 AM
Oh nice @1113
Even with the GDS query it's slow?
Did you use UNIQUE CONSTRAINTS and change the query to use this unique constraint?
09-09-2020 04:21 AM
I applied a unique constraint :
CREATE CONSTRAINT ON(l:Entity) ASSERT l.EntityID IS UNIQUE
But not 100% sure that is correctly reflected in the query :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
I use just one LABEL : Entity.
EntityID is unique it is the ID of the LABEL
Here's my csv for Nodes :
EntityID:ID,description,:LABEL
232ec2ce7ea347258eb640c345322173,Item1,Entity
And the csv for Edges :
Source:START_ID,Target:END_ID,:TYPE
e53628fb3f414cbc9eb2546cedc70645,34c073e8781244bdb934c3539cdf1674,IRW
09-09-2020 09:41 AM
Yeah, the query is good
The only option I see now is to increase the power (RAM, CPU) of the database
But at least, you have two queries that works on smaller database
Regards,
Cobra
09-10-2020 01:17 AM
Hi Cobra,
Yes, and thank you once again for your help !
Best regards !
09-10-2020 03:32 PM
Hi Cobra, I have reduced the number of nodes, increased the ressources on the machine and I got the query achieved in less than 2 hours
That's really cool !
All the best!
12-14-2020 03:33 AM
Hi Cobra,
I would need to identify each network components. I use this query to segment the component network per size :
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH size(libraries) AS size, libraries
WITH size, apoc.coll.flatten(collect(libraries)) AS nodes_list
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityID IN $nodes_list
RETURN n
', '
SET n.community_id = $community_id
', {batchSize:1000, params:{nodes_list:nodes_list, community_id:size}}) YIELD batch, operations
RETURN 1
This part works like a charm.
I tried to do add a uuid with apoc.create.uuid() but I did something wrong because the uuid id is defined for all components with the same size. Here's the query that I run :
WITH n.community_id as p, collect(n) as nodes
WITH p, nodes, apoc.create.uuid() as uuid
FOREACH (n in nodes | SET n.uuid = uuid)
Do you know how could I have one uuid per network component. So every network components (even if they have the same size will have a different uuid) ?
Thanks in advance !
12-14-2020 12:50 PM
Hello @1113
You could just use the same query but with a little modification:
CALL gds.wcc.stream({
nodeProjection: "Item",
relationshipProjection: "BELONGS_TO"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
Regards,
Cobra
12-14-2020 04:08 PM
Thank you, it looks very promising.
The query runs successfully (a lot of 1 returned) but it seems the variable uuid is not set :
After the query, uuid doesn't appear in the property Keys.
After the query, running MATCH (n) RETURN n LIMIT :
{"EntityID":"80f99c52240f432fbe396b091dedb0d6","community_id":6,"Description":"Entity1"}
...
I modified a bit the query to match my schema (renaming nodeProjection and relationshipProjection)
Bellow, the query that I run.
I would say that the variable $nodes_list is empty but I'm really not sure. Any idea ?
Thanks in advance !
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).id) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.id IN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
12-15-2020 12:30 AM
You didn't replace the id
property by EntityID
:
CALL gds.wcc.stream({
nodeProjection: "Entity",
relationshipProjection: "IRW"
})
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).EntityID) AS libraries
WITH componentId, libraries, apoc.create.uuid() AS uuid
CALL apoc.periodic.iterate('
MATCH (n)
WHERE n.EntityIDIN $nodes_list
RETURN n
', '
SET n.uuid = $uuid
', {batchSize:1000, params:{nodes_list:libraries, uuid:uuid}}) YIELD batch, operations
RETURN 1
12-15-2020 01:10 AM
Brilliant ! It works like a charm ! Many thanks !
All the sessions of the conference are now available online