Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-10-2018 04:33 AM
Hello,
I am trying to import data into neo4j.
i am having 1 million of records and i am inserting this data using Jar.
this jar calling a method which inserting the data into neo4j . i am using Merge statement for that .
but after inserting 50k records its stuck .
i am not getting any error but data also not getting inserted .
what can be wrong here
could you all please suggest
12-10-2018 08:15 AM
You may want to look at means to batch transactions, 10k entries per batch tends to work well.
Are you using LOAD CSV (and if so, are you USING PERIODIC COMMIT)? Or are you using some other means? An example of your load code/query would help.
12-10-2018 11:08 PM
Hi Andrew,
Thank you for your reply.
actually i am not doing batch transaction since there is some direct connectivity issue with Impala to neo4j.
right now i am using kind of witty approach.here i am getting full data from impala and parsing the result one by one and inserting it into neo4j.
when i started the process it was running fine but after 200000 records insertion now speed is like 10 node per second .
should i change some hardware configuration for this ??
if yes could you please let me know about configuration i am using separate cluster for neo4j
12-11-2018 01:54 AM
The most common cause for that kind of symptom, slowing down as data is entered, is that you're missing an index on the label/property of the MERGEd nodes from your insertion queries.
It would help to supply your insertion query and an EXPLAIN of the query (with all elements expanded), which should show how your match/merges are operating and if indexes are being used for lookup. That would also help us see if there are any other things to fix.
As for the insertion, you say you're getting the full data first. If so, this is a good opportunity to do batch insertion. A separate query per insert isn't going to be performant if this is meant to be a data load. See this blog entry for optimal approaches to batching modifications/insertions into the graph.
12-11-2018 02:30 AM
Yeah, Thank you andrew .I saw your blog but problem is in my case that i can't create CSV file in file system using this data ,i can store this data runtime only(like in some variable).
i need one suggestion from you . if i store my data like in below format then i can insert data in batch ?
Data = [{name:"Alice",age:32},{name:"Bob",age:42}]
And one more query :i have index on nodes properties but Can we create index on relationships ?? as i can not find any way to do this .
12-11-2018 03:08 AM
Yes, providing a list of maps like this as a parameter is the right approach for batching, see the blog entry I linked to earlier.
Even if you have an index present, you should EXPLAIN your cypher query in the browser so you can make certain your query will use that index. If the query plan doesn't show an index being used you should provide the query and plan so we can help figure out why.
Our schema indexes are currently for label/property combinations only, and don't apply to relationships. However, with the recent Neo4j 3.5 release, we introduced supported full text indexing (automatically updating as your graph data changes), which can be used to index based on relationship type + property.
12-11-2018 03:23 AM
okay..
Then if my table has 10 million records and i am storing it into a list of map then it works ??
12-11-2018 08:28 AM
@andrew.bowman sorry i can't post data since i am working on client data.but i can give you the overview of query and index info .
i am trying to import some entity data and creating relationship with properties between them .
my data contain some column like id, name, code, display_name, active, services, product
i am creating index on Id here and my query for import like below
Merge(n:Entity{id:'id'})
Set n.name= 'name',
n.code = 'code',
n.display_name = 'display_name',
n.active = 'active'
could you please suggest me what am i doing wrong for data import
12-11-2018 08:49 AM
Batching 10k records at a time is usually our recommendation, so try to break up your batches accordingly.
As for your import query, make sure you have an index on :Entity(id) so your MERGEs are quick.
If you want to only set the properties when the MERGE results in a CREATE, then use ON CREATE SET` instead of SET.
12-11-2018 08:52 PM
if i will use create set then later if i want to update the properties then if will update it or create another node or property ??
12-11-2018 10:16 PM
Remember that ON CREATE SET
and ON MATCH SET
are both clauses you can only use after a MERGE. MERGE guarantees the node will be there, so no matter if the node was created or matched to an existing node, it is now in the graph. You can use SET on it if you want. These two just allow you to do different things depending on if MERGE resulted in node creation or simply matched to an existing node.
After reviewing the official MERGE documentation, please review this knowledge base article on using MERGE to get some better clarity on what it's doing and how to work with it.
12-13-2018 03:18 AM
Hi @andrew.bowman Thank you for all your help .
i need one help in batch import.
using java code now i have my jsoon data in a "MY_DATA" variable .
e.g.
MY_DATA =[{ tk_locon=0, repble=0, design=1, id=7196979,version=1, type_name=SALE, conc=0, name=SEQU,security=0, location=0},{ tk_locon=0, repble=0, design=1, id=1222,version=1, type_name=SALE, conc=0, name=qwe,security=0, location=1},--------100 Records]
now what will be the query to import this data .??
12-13-2018 09:24 AM
You would pass the list as one of the parameters to the query, UNWIND the parameter list within the query and then start working with the properties of each record. Something like this, assuming the list is available under the parameter MY_DATA
:
UNWIND $MY_DATA as data
Merge(n:Entity{id:data.id})
Set n.name= data.name,
n.version = data.version
...
12-13-2018 10:11 AM
'You would pass the list as one of the parameter '
Sorry,i am very confuse about this
My main concern is this only that how i would pass my variable as parameter
12-13-2018 12:16 PM
If you're using the Neo4j Browser, you can use :help param
to get the syntax of how to use :param
. You would set this before executing your query, and not part of the query itself.
If you're using a Neo4j driver, you should consult the language guides on using the appropriate driver for your language, and how to submit a query with parameters. Typically when you execute the query, the parameter map is submitted as an additional parameter to the execution call.
12-21-2018 01:22 AM
I am getting my data as in below format and data with my query
UNWIND '[{ id=12345, project_ids=50, has=ABC10}, {entity_id=859685, project_ids=50, has=DCV12}]' as row MERGE (c:TEst{ID: row.id}) ON CREATE SET c.PROJECT_IDS=row.project_ids,c.HAS=row.has
but when i am running my query then i am facing issue with '=' that is in data and it expecting data in string . it is giving error for has that Variable ABC10
not defined
do we have any way to handle it in query side only ??
12-21-2018 12:30 PM
Keep in mind that's not valid JSON format. You need to replace your =
with :
.
And if this is a string as opposed to a an actual JSON object (lists and maps), then you need to transform it into a format Neo4j can use. Try using apoc.convert.fromJsonList() using APOC Procedures.
12-23-2018 05:56 AM
Hi @andrew.bowman
thank you for all your help. I am following your suggestions and i am having good progress with this
Now i am using batch import (creating map of 20k records) . i tried it with 500 records and it is working fine .
but when i am passing data like 200000 records then it get stuck or may be taking so much time .
do i need to increase my cluster configuration??
since in future i need to insert 8 million record in neo4j. so what is ideal sys configuration for neo4j to handle this much data .??
12-24-2018 01:48 AM
20k should certainly be doable for your batch sizes. You may want to do an EXPLAIN on your query and make sure it's using index lookups for your starting nodes, and not label scans or all node scans. That would indicate you need to create an index on the relevant label/property combinations that you're merging on.
For large inserts batching is a must. We usually recommend 10k-50k for batch sizes per transaction.
12-24-2018 10:15 AM
as i mentioned above , i am using below query for inserting data
UNWIND '[{ id=12345, project_ids=50, has=ABC10}, {entity_id=859685, project_ids=50, has=DCV12}]' as row MERGE (c:Test{ID: row.id}) ON CREATE SET c.PROJECT_IDS=row.project_ids,c.HAS=row.has
here i am having index on id and i have tried with 10k batches as well but same result. it is taking so much time even i waiited for 10 min but no records inserted.
then for testing i tried with 1000 records and they have imported.
that's why i asked about sys config
please suggest
12-24-2018 10:45 AM
You have an odd spelling in the label of your match: :TEst
. As labels are case sensitive, this would require your index to be on :TEst(ID)
.
You can confirm in an EXPLAIN of the plan if an index will be used for the merge. Verify that first. The timing differences are still pointing to a (lack of) index issue.
12-24-2018 10:48 AM
sorry that was spelling mistake.
i am just replicate my query since i can't disclose my data.
but i would provide you my explain of query may be that would help you better to understand
All the sessions of the conference are now available online