Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-06-2021 11:41 PM
Hi all,
It is me again ,
I have some trouble loading a dataset that has over 10 million rows.
For now this is how my query looks like,
:auto USING PERIODIC COMMIT 5500
//Load the has_positions table
LOAD CSV WITH HEADERS
FROM 'file:///has_positions.csv'
as row
merge (p:POSITIONS {id: toInteger(row.device_id)})
ON CREATE SET
p.position_id=row.position_id,
p.device_id=row.device_id,
p.latitude=row.latitude,
p.longitude=row.longitude,
p.time=row.time
but it is taking more than 2 hours, I have added the following constraint,
CONSTRAINT ON (positions:POSITIONS) ASSERT (positions.device_id) IS UNIQUE
My desired outcome would be to have all my POSITIONS nodes in an interlinked chain of events sequenced by time. But I have no idea how to accomplish it
Any suggestions would be helpful
Solved! Go to Solution.
12-07-2021 06:29 AM
Hi @DMF194 ,
MERGE
, SET
, and CONSTRAINT
are using inconsistent property names and values for device_id
.
In the MERGE
you are using the property name id
with the value of row.device_id
converted to an integer. Probably this should be changed to MERGE (p:POSITIONS {device_id: toInteger(row.device_id)})
.
In the SET
you are setting p.device_id
to row.device_id
as a string (without conversion to an integer). Probably this should be changed to p.device_id = toInteger(row.device_id)
In the CONSTRAINT
you are using device_id
. Probably that is correct.
Try that.
Best,
ABK
12-07-2021 06:29 AM
Hi @DMF194 ,
MERGE
, SET
, and CONSTRAINT
are using inconsistent property names and values for device_id
.
In the MERGE
you are using the property name id
with the value of row.device_id
converted to an integer. Probably this should be changed to MERGE (p:POSITIONS {device_id: toInteger(row.device_id)})
.
In the SET
you are setting p.device_id
to row.device_id
as a string (without conversion to an integer). Probably this should be changed to p.device_id = toInteger(row.device_id)
In the CONSTRAINT
you are using device_id
. Probably that is correct.
Try that.
Best,
ABK
12-07-2021 05:58 PM
Hi @abk ,
Solved, I was definitely not consistent with my naming of property labels.
It took only 8 minutes to run for 14 million rows
In addition, is there any general examples of how to connect nodes in sequence, ordered by datetime ?
Any suggestions, would be appreciated
12-07-2021 06:45 PM
regarding connecting nodes in sequence ordered by datetime, I dont see where there is a date time value in your MERGE.
But also I have had the same concern for example loading similar to these posts here at community.neo4j.com into Neo4j where a single post might have N number of replies and wanting to build out a sequence for example
(n:Post {id1}) --> (n:Reply {id:1}) --> (n:Reply {id:2}) --> (n:Reply {id:3}) --> (n:Reply {id:4})
but choose not to build out a N relationship path but rather simply had
(n:Post {id:1}) --> (n:Reply {id:1})
(n:Post {id:1}) --> (n:Reply {id:2})
(n:Post {id:1}) --> (n:Reply {id:3})
(n:Post {id:1}) --> (n:Reply {id:4})
and then ran cypher and used match (n:Post {id:1})-[]-(n2:Reply) order by n2.datetime
so as to get the order
12-09-2021 08:52 AM
Substitute your date attribute for the n.order attribute to enforce the sort order you want and change the relationship type to yours. This will chain them by earliest to latest dates. Add DESC to the order by clause if you want them from latest to earliest. Also, add a where clause to the leading match clause if you don't want all the nodes chained together.
match (n)
with n
order by n.order
with collect(n) as list
with list, range(0, size(list)-2) as indexes
unwind indexes as i
match (x) where id(x)=id(list[i])
match (y) where id(y)=id(list[i+1])
merge (x)-[:RELATIONSHIP_TYPE]->(y)
12-09-2021 09:50 AM
@DMF194 I found below query loads fast for me - posting here in case helpful -
call apoc.periodic.iterate('call apoc.load.csv("file:///has_positions.csv")
yield map as row
return row
','
MERGE, SET queries', {batchSize:10000, iterateList:True, parallel:True})
some other documentation is here - apoc.periodic.iterate - APOC Documentation
12-13-2021 12:00 AM
@mengjiakang2017 Thanks, appreciate the additional resources for importing large CSV files
12-16-2021 06:43 PM
Hi @mengjiakang2017,
I just tried out your apoc.periodic .iterate but I get the following error
There is no procedure with the name
apoc.periodic.iterate
registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.
I am unable to search for resources on this not even the docs in Neo4j clearly explain
12-17-2021 07:48 AM
Hi @DMF194 ,
APOC is a commonly used Neo4j plugin which provides many super useful functions and procedures. It must be installed and enabled on the Neo4j DBMS.
For details, see Awesome Procedures On Cypher (APOC) - Neo4j Labs
Best,
ABK
12-21-2021 10:15 PM
Thanks for the reply, the issue was due to the upgrading process from graph database from 4.3.5 to higher version.
I have uninstalled my neo4j desktop and reinstalled
All the sessions of the conference are now available online