Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-31-2020 11:43 AM
Hello!
I have 2 node labels, one of them has around 750K nodes, and another one with almost 50millions nodes. I have a stand-alone neo4j 3.5 running with 8 core and 96g memory. I am creating a relationship between these node labels using apoc.periodic.itererate
but this query is been running for too long.
Some of the node label properties get refreshed every day and I want to be able to refresh the relationship once the node properties are updated in both labels. The last time when I ran the query to update the relationship, it was running for more than 8 hrs and I had to kill it.
Do you guys have any solution for this to speed up this process? Should I be doing this in a different way? If so, would you please let me know the approach.(I'm new to Neo4j).
Thanks!!
08-31-2020 12:35 PM
call apoc.periodic.iterate("match (i:Inventory) , (p:Products) where p.p3 <> '0' and i.p1= p.p1 OR where p.p3 <> '0' and i.p2= p.p2 return i,p","merge (i) -[:PRODUCES]->(p)",{batchSize:20000, parallel:false})
This is how the query looks like and I have indexes on p1,p2 and p3.
08-31-2020 03:14 PM
First, lest get this one running faster:
MATCH (i:Inventory), (p:Products)
WHERE p.p3 <> '0' AND i.p1= p.p1 OR # this one is gonna *kill* your cpy and ram
WHERE p.p3 <> '0' AND i.p2= p.p2
RETURN i,p
So... let's make sure we only need to do that negation once, and group the rest of the WHERE
clause more carefully
MATCH (p:Products)
WHERE p.p3 <> '0'
WITH p
MATCH (i :Inventory)
WHERE i.p1= p.p1 OR i.p2= p.p2
RETURN i,p
...which will get you to a faster solution:
CALL apoc.periodic.iterate("
MATCH (p:Products) WHERE p.p3 <> '0' WITH p
MATCH (i :Inventory) WHERE i.p1= p.p1 OR i.p2= p.p2
RETURN i,p
","
MERGE (i) -[:PRODUCES]->(p)
",{batchSize:20000, parallel:false})
However, these would probably be better handled when importing data, or changes. When you're updating the inventory, the additional Cypher to adjust these relationships wouldn't have to check the entire DB. I strongly advise taking a very close look at whatever is creating/updating this data, as it will be orders of magnitude more efficient to do it then, instead of after the fact.
08-31-2020 03:41 PM
Thank you @tony.chiboucas!! I'll try your query and see if it helps.
I'm sorry about the formatting. 🙂
I absolutely agree with you. I should be creating this relationship when the Product
nodes are created/updated batch-wise. I am using golang-neo4j
driver and session.WriteTransaction
to write/update 20000 Product nodes at a time using go-routine. I will just plug in this cypher query into that go-routine function.
Since I am still in the testing phase I wanted to give this query a try by running manually in the browser.
Thanks again!
09-01-2020 09:13 AM
Could you share the query/command that is creating/updating the nodes to begin with?
09-01-2020 11:11 AM
sure .. here is it is - I pasted only the data create portions
func InsertToNeo(data interface{}) error {
m := ArrayStructToMap(data)
session, err := NewSession(InfraObj.NeoDriver)
if err != nil {
return err
}
defer session.Close()
_, err = session.WriteTransaction(**execute**(m), neo4j.WithTxTimeout(300*time.Second))
if err != nil {
return err
}
return nil
}
func execute(data interface{}) func(tx neo4j.Transaction) (interface{}, error) {
return func(tx neo4j.Transaction) (interface{}, error) {
result, err := tx.Run(
`unwind $events as event MERGE (a:Products {key: event.key}) SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key
.
.
.
.
.
a.p10 = event.p10`,
map[string]interface{}{"events": data})
if err != nil {
return nil, err
}
return nil, result.Err()
}
}
09-01-2020 11:13 AM
I am still not sure how to plug these events
(data) to relationship creation query. If you know how to do it please let me know. That would save a ton of my time Thanks in advance!
09-01-2020 02:21 PM
UNWIND $events as event
MERGE (a:Products {key: event.key})
SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key,
...
# Just add at the end...
WITH a
MATCH (i :Inventory)
WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i)-[:PRODUCES]->(a)
Note: This won't clean up old connections... so as your data changes, old i - p
relationships will persist. You'll want to come up with a strategy for when/how to clear old :PRODUCES
rels.
09-01-2020 02:28 PM
WOW, I honestly didn't think this is going to be this easy Thanks so much!!!!
Yes, I am going to wipe the DB and reload the data from scratch.
09-01-2020 02:36 PM
Ohh .. a qq - do you think I still need to plug in the below query to my Products update
function? or just enough running this query in the create
function like above?
MATCH (i :Inventory)
WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i)-[:PRODUCES]->(a)
09-01-2020 02:38 PM
That's the point of doing it during CREATE
, the relationships are created then. No extra work needed. Everything better, faster, stronger.
09-01-2020 02:40 PM
gotcha.. thanks a ton, Sir!
Have a great day!
09-02-2020 12:05 PM
I'm sorry to bother you again.
It seems that it's still taking too long to create the relationship. Below is my code snippet
result, err := tx.Run(
`unwind $events as event MERGE (a:Products {key: event.key}) SET
a.p1 = event.p1,
a.p2 = event.p2,
a.key = event.key,
.
.
.
a.pN = event.pN
WITH a MATCH i:Inventory) WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i) -[:PRODUCES]->(a)`,
map[string]interface{}{"events": data})
For testing purposes, I am only creating 25 a
records at a time and trying to create PRODUCES
rel with 750K i
records. It's taking forever.... (I just killed the program).
If you have any insight on this please let me know. Thank you!
09-02-2020 02:17 PM
CREATE INDEX inventory_p1 FOR (i:Inventory) ON (i.p1)
CREATE INDEX inventory_p2 FOR (i:Inventory) ON (i.p2)
You're missing a parenthesis: WITH a MATCH i:Inventory)
Isolate code steps, and get the profile of that Cypher.
result, err := tx.Run(
`UNWIND $events AS event CREATE (a:TempEvent) SET a = event`,
map[string]interface{}{"events": data})
Once that's run once, open a Neo4j Browser, and run the next step there, with EXPLAIN
. That'll show you where the bottleneck is.
EXPLAIN
MATCH (a:TempEvent)
MATCH (i:Inventory) WHERE i.p1 = a.p1 OR i.p2 = a.p2
MERGE (i) -[:PRODUCES]->(a)
If you really are trying to copy all $event
properties onto a :Product
node, and an $event
never contains an object in a property, you can simply do the following:
result, err := tx.Run(
`UNWIND $events AS event
MERGE (a:Product {key: event.key}) SET a = event`,
map[string]interface{}{"events": data}
)
If an event does not meet that criteria, I'd give that Class a method which returns the structure you want in Neo that does meet that criteria.
09-02-2020 02:58 PM
Thank you for your time.
$event
does not contain an object and your unwind
just worked fine. I'll follow this for creating new nodes.Here is the EXPLAIN
o/p. I didn't get much out of it. I am reading about EXPLAIN
now and meanwhile, I thought I would share the screenshot here. Thanks!
09-02-2020 03:23 PM
....i just tried below query in the browser without OR
condition.. it finished within 1ms.
MATCH (a:TempEvent)
MATCH (i:Inventory) WHERE i.p1 = a.p2
MERGE (i) -[:PRODUCES]->(a)
09-02-2020 03:37 PM
That profile looks good. It is using the index. These should be running pretty quick, even with such a large dataset.
... and how'd it do when you keep the OR
condition?
09-02-2020 03:42 PM
I had to kill it after 30 mins. I was tracking CPU usage as well while it was running. I did not see it's going beyond 13%
09-02-2020 03:59 PM
Okay, the for whatever reason, it's trying to load all the :Inventory
before applying the OR
to exclude them. Shouldn't be doing that, but that'll be a tough one to debug.
We can work around the problem by never using an OR, and just calling the MATCH ... MERGE
once for each property. The MERGE
prevents creating duplicate rels, so there will only ever be one or none [:PRODUCES]
for each i : p
pair.
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
...sorry there's not really a cleaner way to do that without writing your own Neo4j procedure.
Now at least we know why your process was taking forever... it was trying to load your entire :Inventory
into RAM.
09-02-2020 04:06 PM
Ok .. that makes sense. Thanks!!
(My earlier statement was incorrect so deleted it)
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
I tried this but it was still hanging and my program wouldn't complete. I killed it and interchanged the position of 2 queries in my code and voila it finished within a few ms.
result, err := tx.Run(`
UNWIND $events AS event
MERGE (a:Product {key: event.key})
SET a = event
WITH a MATCH (i:Inventory) WHERE i.p2 = a.p2 MERGE (i)-[:PRODUCES]-(a)
WITH a MATCH (i:Inventory) WHERE i.p1 = a.p1 MERGE (i)-[:PRODUCES]-(a)
`,
map[string]interface{}{"events": data}
)
I am going to test it against 5000 nodes now. Will see how it behaves.
Thanks!
09-03-2020 11:18 AM
That is very odd. There might be something off with your i.p1 data or index.
You're missing the direction on your :PRODUCES
relationship:
(i)-[:PRODUCES]-(a)
Which leads me to three graph-data-design questions:
:Inventory
or :Product
?i
and p
?The answers to these questions should inform the directionality of the :PRODUCES
relationship, because query performance starts with choosing either in or out relationships from initially selected nodes. When in doubt, point from many to few, or children to parents. For your case, I suspect you have many more Products than Inventory, and each Inventory item "produces" many different Products? If that is the case, you should consider (:Product)-[:PRODUCED_BY]->(:Inventory)
.
Code Standards Note: :Products
shouldn't be plural unless each node represents many products.
09-03-2020 01:55 PM
Sorry, my bad. There were some issues with i.p1
. I fixed it this morning and it's not hanging anymore!!
Your understanding is right. I have many more Products
(~10-20M) that are PRODUCED_BY
by Inventory
nodes.
I am very new to Neo4j and I was reading a blog last week where they recommended to go with one -> many
relationship so I chose Inventory -> Product
.
:Inventory
or :Product
?Inventory
. As I am planning to connect other node labels (Price, Owner, Location.. etc) with my Inventory
as well.Inventory
node and then it will pull all data connected to that inventory node.i
and p
?Inventory
node can produce
many products. Similarly, same product can be produced by more than one Inventory
node.09-03-2020 02:26 PM
Could you link me, so I can go yell at them?
Information Architecture, Entity Markup Language, Type Theory, Object Modeling, and many, many other notation systems which are inter-related all follow a standard rotational system of child --> parent
or many --> one
. For Neo4j and Cypher it doesn't matter much, as long as you stay consistent in your data and queries. However, in logic, object modeling, type theory, and information architecture (which can all inform or be informed by the graph) directionality can have significant impact.
09-24-2020 03:15 AM
Obsessed! This really makes sense and thanks to all parties involved in deriving this solution.
All the sessions of the conference are now available online