Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-04-2023 11:03 AM
I am running following query.
LOAD CSV WITH HEADERS FROM 'file:///records.csv' AS freq
match f:sFreq) where f.slfreq=freq.fr
match (d:Date) where d.date=date(freq.dt)
CREATE (fr)-[r:STATISTICS_ON]->(d)
I created index with followng two queries---
CREATE INDEX unique_id_index FOR (n:sFreq) ON (n.fr)
CREATE INDEX unique_id_index1 FOR (n:Date) ON (n.dt)
I have Date and sFreq nodes. I want to create relationships based on matching frequencies and date from csv.
CSV is having huge records (4800000 rows) and query is running for more than 8 hours now.
Is there any way to reduce this time. I have multiple csv files with which I need to create relationships in same database.
Regards
Vaishali
02-04-2023 01:29 PM
According to the query, you are searching on properties 'slfreq' and 'date', not 'fr' and 'dt'. The properties 'fr' and 'dt' and the column headings in your csv file. As such, I think you want the flowing indexes:
CREATE INDEX unique_id_index FOR (n:sFreq) ON (n.slfreq)
CREATE INDEX unique_id_index1 FOR (n:Date) ON (n.date)
That may help.
You can see what indexes are being used by looking at the query plan.
https://neo4j.com/docs/cypher-manual/current/query-tuning/query-profile/
02-04-2023 01:45 PM
That was typo. I used slfreq and date. As I was trying various options I pasted wrong line here.
Regards
Vaishali
02-04-2023 01:55 PM
How is your csv file organized? Can you group the data so that you can one frequency at a time with all its date/
Can you share the query plan?
02-04-2023 02:19 PM
Actually, I can write the query to group the dates by frequency, so you don't have to search for the same frequency many times. that may help speed things up.
02-04-2023 04:16 PM - edited 02-04-2023 08:52 PM
Can you explain how can I write this? Grouping? I have same frequencies on multiple dates.
Regards
Vaishali
02-05-2023 02:09 PM
try this. I collected the dates by frequency in order to do the grouping. Let's see what happens to the memory requirements with the many rows you have:
LOAD CSV WITH HEADERS FROM 'file:///records.csv' AS freq
with freq.fr as freq_fr, collect(distinct freq.dt) as freq_dates
match (f:sFreq{slfrq:freq_fr})
with f, freq_dates
unwind freq_dates as freq_date
match (d:Date{date:date(freq_date)})
create (f)-[r:STATISTICS_ON]->(d)
It's also possible the grouping adds time to the query, negating any benefit. What the query avoids is matching on the same frequency node more than once.
02-05-2023 08:16 PM
44.475 │-108.16310501098631│{"date":"2017-01-01"}│ ├────────────────┼───────────────────┼─────────────────────┤ │44.475 │-108.16310501098631│{"date":"2017-01-01"}│ ├────────────────┼───────────────────┼─────────────────────┤ │44.475 │-108.16310501098631│{"date":"2017-01-01"}│ ├────────────────┼───────────────────┼─────────────────────┤ │44.475 │-108.16310501098631│{"date":"2017-01-01"}│ ├────────────────┼───────────────────┼─────────────────────┤ │44.475 │-108.16310501098631│{"date":"2017-01-01"}│
02-05-2023 09:44 PM
Since you are using ‘create’ fir the relationships, it will create a new one each time. Have you executed this script multiple times in an effort to troubleshoot it? If so, that would explain the duplicates. I agree with you that you should get one new rekationship per row in the csv file.
02-05-2023 10:17 PM
No I am running only once. Also I tried using MERGE with parallel as false. But still its creating duplicate relationships.
02-06-2023 05:53 AM
I saw one post saying following can avoid creating multiple relationships.
MATCH (n), (m) WHERE ID(n) = {id1} AND ID(m) = {id2}
SET n.locked = true, m.locked = true
MERGE (n)-[r:TEST]->(m)
SET n.locked = Null, m.locked = Null
What will be my query like this?
02-06-2023 07:24 AM
That will not help, as you are not setting properties on nodes n nor m. The syntax on the 'SET' clause is not correct. You don't include the label in the set operations, just the variable and the property.
Are you sure you don't have multiple rows with the same frequency and date combinations? You can change the 'create' to 'merge' so the relationships is not created twice. Just a note merge is not multi-thread safe, so running the batches in parallel could result in multiple relationships if you have multiple combinations of frequency and date.
02-06-2023 07:37 AM
Also, using parallel:true when creating relationships can result in locking conditions, that may negatively impact performance. True with parallel:false as well.
02-06-2023 08:03 AM
Can you run the following to ensure there are not duplicate rows?
LOAD CSV WITH HEADERS FROM 'file:///records.csv' as freq
WITH freq.freq as sfreq, freq.new_date as fdate, freq.mean as fmean, count(*) as cnt
WHERE cnt>1
RETURN *
02-06-2023 08:10 AM
I got this.
02-06-2023 04:13 PM
I can see new issue now. I have splitted CSV with small number of records.
mean | freq | occup | new_date |
-108.163 | 44.475 | 0.813008 | 1/1/2017 |
-108.433 | 44.478 | 0.813008 | 1/1/2017 |
-107.406 | 44.481 | 0.813008 | 1/1/2017 |
-108.059 | 44.484 | 0.813008 | 1/1/2017 |
-107.676 | 44.487 | 0.813008 | 1/1/2017 |
-107.018 | 44.577 | 0.813008 | 1/1/2017 |
-107.148 | 44.58 | 0.813008 | 1/1/2017 |
1/2/2017 | |||
-108.664 | 50.28 | 0.813008 | 1/3/2017 |
-108.369 | 50.283 | 0.813008 | 1/3/2017 |
-108.856 | 50.286 | 0.813008 | 1/3/2017 |
Now if I create relationship with following query
For date 2017-01-01 it is creating total of 203 relationships. Actually it should create only 7 relationships with same date node.
Can you please let me know what is wrong now?
Regards
Vaishali
02-06-2023 04:58 PM
I think I got the error. I have duplicate date nodes which is the cause of the problem.
Thanks for all your help. Will get back if any new issue arise.
Regards
Vaishali
All the sessions of the conference are now available online