Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-22-2019 06:22 AM
Hi I am new to Neo4j but have searched and tried to come to a resolution for a week now with no success. I have a DB with the OffShore_Leaks in it. I have imported the Nodes of Bahamas_Leaks and am trying to get the Relationships of Bahamas inserted.
I have filtered and created a filtered relationship with a header
node_1,rel_type,node_2,sourceID,valid_until,start_date,end_date
23000001,intermediary_of,20000035,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
23000001,intermediary_of,20000033,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
23000001,intermediary_of,20000041,Bahamas Leaks,The Bahamas Leaks data is current through early 2016.,,
....
And have checked that these IDs exist in the Intermediary and Entity Nodes.
I have created a number of Cyphers to import as the bulk importers must have nodes and seem to be mainly to instantiate DB only.
LOAD CSV WITH HEADERS FROM "http://IP_ADDRESS/bulk/import/intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Node) WHERE n1.node_id = row.node_1
MATCH (n2:Node) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
Syntactically this looks to be correct
when run via the Desktop I get "(no changes, no records)".
Going in circles on this one.
06-22-2019 08:55 AM
Hi,
Looks like you imported bahamas_leaks_nodes only. Did you import bahamas_leaks_intermediary?
MATCH (n1:Node) WHERE n1.node_id = "23000001" is failing as this id does not exist in bahamas_leaks_nodes. This id exists in bahamas_leaks_intermediary.
Here is the schema that I used for offshore_leaks:
06-22-2019 10:26 AM
I have the correct scheme and all the data in from offshore_leaks, I have the nodes from bahamas_leaks and can search and find them individually,
I have changed my cypher and gone to the command line, hoping to get a better error code.
neo4j> LOAD CSV WITH HEADERS FROM "http://IP_ADDRESS/bulk/import/intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Node) WHERE n1.node_id = row.node_1
MATCH (n2:Node) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database
Everytime I run this cypher the database dies.
06-22-2019 10:27 AM
To note I also changed the import to be file:/// with the same results
06-22-2019 11:25 AM
Did you label the bahamas_leaks_nodes and the intermediary as 'Node'?
Intemediary nodes should have different label. In your MATCH the label is same for both MATCH statements.
Run this
MATCH (n1:Node) WHERE n1.node_id = "23000001" RETURN n1
and see if you get any result.
06-22-2019 11:57 AM
Yes I have tried multiple queries over the week
I have named lables inline with the following Cypher.
neo4j> LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
....but still get the following error.
Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database
06-22-2019 11:59 AM
The Cypher
MATCH (n1:Node) WHERE n1.node_id = 23000001 RETURN n1
returns
n1
{
"sourceID": "Bahamas Leaks",
"name": "Internal User",
"valid_until": "The Bahamas Leaks data is current through early 2016.",
"node_id": 23000001
}
06-22-2019 01:18 PM
If the node_id is stored as integer then try this:
MATCH (n1:Node) WHERE n1.node_id = toInteger(row.node_1)
MATCH (n2:Node) WHERE n2.node_id = toInteger(row.node_2)
06-22-2019 01:43 PM
neo4j> MATCH (n1:Intermediary) WHERE n1.node_id = 23000001 RETURN n1;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Intermediary {sourceID: "Bahamas Leaks", name: "Internal User", valid_until: "The Bahamas Leaks data is current through early 2016.", node_id: 23000001}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row available after 103 ms, consumed after another 187 ms
neo4j> MATCH (n2:Entity) WHERE n2.node_id =20000035 RETURN n2;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n2 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Entity {sourceID: "Bahamas Leaks", name: "TINU HOLDINGS LIMITED", valid_until: "The Bahamas Leaks data is current through early 2016.", node_id: 20000035}) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row available after 98 ms, consumed after another 572 ms
neo4j> LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database
Caused the DB to exit.
neo4j> USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = row.node_1
MATCH (n2:Entity) WHERE n2.node_id = row.node_2
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
0 rows available after 1207271 ms, consumed after another 2 ms
Although this did not cause the DB to exit this time in the prompt it was dead.
neo4j> USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///intermediary_of.csv" AS row
WITH row WHERE row.rel_type = "intermediary_of"
MATCH (n1:Intermediary) WHERE n1.node_id = toInteger(row.node_1)
MATCH (n2:Entity) WHERE n2.node_id = toInteger(row.node_2)
CREATE (n1)-[:INTERMEDIARY_OF]->(n2);
Connection to the database terminated. This can happen due to network instabilities, or due to restarts of the database
VERY Frustrating!
06-22-2019 02:52 PM
Please share your LOAD CSV code that you used to create Entity and Intemediary nodes. I will use that in my DB and check.
06-22-2019 04:59 PM
Hi I used neo4j-admin import to initially setup the DB with the nodes and a set of edges.
I am now trying to use load CVS to add some additional edges/relationships.
This is not straight forward, very unstable,
06-22-2019 07:01 PM
To load additional nodes I use
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'http://IP/bahamas_leaks.nodes.intermediary.csv
' AS line CREATE (:Intermediaries { name: line.name, internal_id: line.internal_id, address: line.address, valid_until: line.valid_until, country_codes: line.country_codes, countries: line.countries, status: line.status, node_id: toInt(line.node_id), sourceID: line.sourceID})
Its all importing right and can get data out correctly.....schema is in as understood.
Very strange.
I have the Enterprise Version on an AWS cluster.
06-22-2019 10:10 PM
OK looks like I have a partial import of the edge/relation now when I slow down the processing via the periodic commit
Maybe a bad character in the cvs input stream
06-22-2019 11:34 PM
Good to hear that. All is well that ends well!
06-23-2019 04:54 AM
INot quite solved yet. The output on the state of the import from Neo4j gives little clues on what the issue is. Looked at the input stream and cant find an issue. Very painful!
06-23-2019 05:19 AM
Looked at the data and it is clean no special chars.
Stops exactly on 6500 entries.
Now I am wondering if I am hitting an Neo4j limitation.
06-23-2019 10:32 AM
This looks to be a heap size issue limitation with Neo4j.
Will look to do the following.
Break up import sizes into multiple imports.
Increase Java and Neo4j heapsize.
Increase the periodic commit even further.
Looks like Neo4j tries to do everything in memory before committing...that will always be a limiting factor in any systems architecture esp when dealing with big data....they prob. should look at doing some of this via virtual memory
I will next work out how to use apoc.periodic.iterate to see if that helps
06-23-2019 02:34 PM
There's no mention of indexes or constraints here. If you were running into heap issues when using periodic commit CSV loading using MATCHes, more than likely you don't have indexes up on the label/property used for lookup, meaning for each row you're doing an entire label scan, which would explain the heap pressure.
Please use an EXPLAIN on your load query, and if you see NodeByLabelScan it means you aren't using index lookups, and should create indexes (or unique constraints) to make your matches quick and ease up heap pressure.
06-23-2019 10:11 PM
Hi Andrew,
OK understood.
I did start to look at the indexing over the weekend as I also thought that maybe an issue.
I will get this in place and feedback as required.
Thank you for the feedback.
All the Best
Mike
06-23-2019 10:32 PM
Hi Andrew,
Thank God!
It works and was fast!
Now I can move forward!
Thank you so much for that feedback.
All the Best
Mike
06-24-2019 09:25 AM
Glad to help! Profiling in the future may help you identify these issues faster, it's a valuable tool!
Best of luck!
06-24-2019 10:05 AM
Hi Andrew,
Yes wil use EXPLAIN and PROFILE when debugging in future.
Not used these tools before so again thank you for the help.
All the Best
Mike
All the sessions of the conference are now available online