Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-20-2019 05:45 AM
Dear comunity,
when loading a good amount of datasets from a MySql Database into a Neo4J Database (batches of 300 000 entries with connected data) actually load works fine until roughly 4.5 mio connected nodes and then with the next batch fails with the following error (there is 30 more batches of 300000 connected datasets which each results in one or more connected nodes).
Database versions:
MySQL: 5.1.47
Neo4J: 3.5.2
Process as follows:
Logging:
129000:Committing and Reopening Neo4J-Transaction.
....................................................................................................
130000:Committing and Reopening Neo4J-Transaction.
....................................................................................................
131000:Committing and Reopening Neo4J-Transaction.
....................................................................................................
132000:Committing and Reopening Neo4J-Transaction.
.................................registerShutdownHook....about to close Graph DB
java.lang.Exception
at eu.whitewolf2000.mysql.MySQLDataLoader$1.run(MySQLDataLoader.java:252)
java.lang.Exception
type
ENTITY
Rolling back Neo4J-Transaction.
This database is shutdown. - 132325
org.neo4j.graphdb.DatabaseShutdownException: This database is shutdown.
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInUnterminatedTransaction(ThreadToStatementContextBridge.java:96)
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.getKernelTransactionBoundToThisThread(ThreadToStatementContextBridge.java:87)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.createNode(GraphDatabaseFacade.java:260)
at eu.whitewolf2000.mysql.MySQLDataLoader.loadData(MySQLDataLoader.java:102)
at eu.whitewolf2000.mysql.MySQLDataLoader.main(MySQLDataLoader.java:403)
Closing Neo4J-Transaction.
Tue Nov 19 05:19:49 CET 2019
Graph database closed.
The following Method is invoked by the calling main method:
MySQLDataLoader.java (class):
package eu.tester.sql (dummy)
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Date;
import java.util.Properties;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Label;
import org.neo4j.graphdb.MultipleFoundException;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
...
private void loadData(String tableName, String idColumn, String nameColumn, String labelColumn) {
System.out.println("loadData");
System.out.println(con);
// String query = "SELECT * from " + tableName + " LIMIT 100";
String query = "SELECT * from " + tableName;
System.out.println(query);
String pst_query = "select * from node_properties where node_id = ?";
String pstc3query = "select * from country_codes where country_code = ?";
String pstc2query = "select * from country_codes where country_code_iso2 = ?";
// int i = 0;
String labelName = null;
int j = 0;
Transaction tx = null;
try (Statement st = con.createStatement();
// System.out.println("1");
ResultSet rs = st.executeQuery(query);
// System.out.println("1");
PreparedStatement pstNodeAttribs = con.prepareStatement(pst_query);
PreparedStatement pstc3 = con.prepareStatement(pstc3query);
PreparedStatement pstc2 = con.prepareStatement(pstc2query);
){
tx = graphDB.beginTx();
// System.out.println("1");
ResultSetMetaData rsmd = rs.getMetaData();
while (rs.next()) {
// System.out.println("2");
// TODO Knoten anlegen
String value = rs.getString(idColumn);
//String labelName = rs.getString(labelColumn);
labelName = rs.getString(labelColumn);
if (getNode(idColumn, value, labelName) != null) {
continue;
}
Label l = Label.label(labelName);
Node node = graphDB.createNode(l);
node.setProperty("source", rs.getString("source"));
node.setProperty("processingDate", rs.getString("processing_date"));
for (int i = 1; i <= rsmd.getColumnCount(); i++) {
String columnName = rsmd.getColumnName(i);
String columnValue = rs.getString(columnName);
if (columnValue == null || columnValue.trim().isEmpty()) {
continue;
}
// Eigenschaften anlegen
String propertyKey = columnName.equals(nameColumn) ? "name"
: columnName.equals(idColumn) ? "node_id" : columnName;
node.setProperty(propertyKey, columnValue);// einkommentieren
// if (j%10000==0) {
// System.out.println(columnName + ":" + columnValue + ", propertyKey=" +
// propertyKey);
// System.out.println(j+".");
// }
// if (j % 1000 == 0) {
// System.out.println(j+".");
// System.out.print(".");
// }
}
// if (j%10000==0) {
// System.out.println(columnName + ":" + columnValue + ", propertyKey=" +
// propertyKey);
// System.out.println(j+".");
// }
if (j % 1000 == 0 && j!=0) {
System.out.println();
System.out.print(j + ":");
System.out.println("Committing and Reopening Neo4J-Transaction.");
tx.success();
tx.close();
tx = graphDB.beginTx();
}
if (j % 10 == 0) {
System.out.print(".");
}
++j;
pstNodeAttribs.setString(1, rs.getString("node_id"));
ResultSet rsNodeAttribs = pstNodeAttribs.executeQuery();
// Node attribNode = graphDB.createNode(Label.label("LabelAttrib"));
// attribNode.setProperty("source", rs.getString("source"));
// attribNode.setProperty("processingDate", rs.getString("processing_date"));
while (rsNodeAttribs.next()) {
Node attribNode = graphDB.createNode(Label.label("ATTRIBUTE"));
Node attribNode1 = null;
if (rsNodeAttribs.getString("s_key") == "country_codes"
|| rsNodeAttribs.getString("s_key") == "jurisdiction"
|| rsNodeAttribs.getString("s_key") == "jurisdiction_2") {
// lookup für countries machen mit prepstatment und dann gib ihm
// setzte
if (rsNodeAttribs.getString("s_key") == "country_codes"
|| rsNodeAttribs.getString("s_key") == "jurisdiction") {
pstc3.setString(1, rsNodeAttribs.getString("s_key"));
ResultSet rs3 = pstc3.executeQuery();
while (rs3.next()) {
// Node mit country key
attribNode.setProperty(rsNodeAttribs.getString("name"),
rsNodeAttribs.getString("s_value"));
attribNode.setProperty(rsNodeAttribs.getString("type"),
rsNodeAttribs.getString("s_key"));
attribNode.setProperty(rsNodeAttribs.getString("country"), rs3.getString("country"));
// Node mit country name
attribNode1 = graphDB.createNode(Label.label("ATTRIBUTE"));
attribNode1.setProperty("source", rs.getString("source"));
attribNode1.setProperty("processingDate", rs.getString("processing_date"));
attribNode1.setProperty("name", rs3.getString("country"));
}
}
if (rsNodeAttribs.getString("s_key") == "jurisdiction_2") {
pstc2.setString(1, rsNodeAttribs.getString("s_key"));
ResultSet rs2 = pstc2.executeQuery();
while (rs2.next()) {
// Node mit country key
attribNode.setProperty(rsNodeAttribs.getString("name"),
rsNodeAttribs.getString("s_value"));
attribNode.setProperty(rsNodeAttribs.getString("type"),
rsNodeAttribs.getString("s_key"));
attribNode.setProperty(rsNodeAttribs.getString("country"), rs2.getString("country"));
// Node mit country name
attribNode1 = graphDB.createNode(Label.label("ATTRIBUTE"));
attribNode1.setProperty("source", rs.getString("source"));
attribNode1.setProperty("processingDate", rs.getString("processing_date"));
attribNode1.setProperty("name", rs2.getString("country"));
}
}
} else {
attribNode.setProperty("source", rs.getString("source"));
attribNode.setProperty("processingDate", rs.getString("processing_date"));
// attribNode.setProperty(rsNodeAttribs.getString("name"),
// rsNodeAttribs.getString("s_value"));
// attribNode.setProperty(rsNodeAttribs.getString("type"),
// rsNodeAttribs.getString("s_key"));
attribNode.setProperty("name", rsNodeAttribs.getString("s_value"));
attribNode.setProperty("type", rsNodeAttribs.getString("s_key"));
}
if (attribNode != null) {
Relationship in1Path = node.createRelationshipTo(attribNode,
RelationshipType.withName("Attributes"));
in1Path.setProperty("source", rs.getString("source"));
}
if (attribNode1 != null) {
Relationship in2Path = node.createRelationshipTo(attribNode1,
RelationshipType.withName("Attributes"));
in2Path.setProperty("source", rs.getString("source"));
}
}
// System.out.println();
}
System.out.println("Nach der While-Loop: "+j);
System.out.println("Committing Neo4J-Transaction.");
tx.success();
} catch (Exception ex) {
if (tx != null) {
System.out.println(labelColumn);
System.out.println(labelName);
//System.out.println(rs.getString(labelColumn));
System.out.println("Rolling back Neo4J-Transaction.");
tx.failure();
}
System.out.println(ex.getMessage()+" - "+j);
ex.printStackTrace(System.out);
} finally {
if (tx != null) {
System.out.println("Closing Neo4J-Transaction.");
tx.close();
}
}
}
...
public static void main(String[] args) throws SQLException {
// TODO Auto-generated method stub
System.out.println(new Date());
String url = null;
String user = null;
String password = null;
try {
InputStream input = new FileInputStream("./config.properties");
Properties prop = new Properties();
prop.load(input);
url = prop.getProperty("url");
user = prop.getProperty("user");
password = prop.getProperty("password");
if (url==null || user==null || password== null) {
throw new IOException();
}
}catch(IOException e) {
System.out.println("config.properties missing or incomplete!");
System.exit(0);
}
if (args != null && args.length > 0) {
for (int i = 0; i < args.length; i++) {
String string = args[i];
System.out.println(i + ": " + string);
}
MySQLDataLoader loader = new MySQLDataLoader();
try {
System.out.println(new Date());
loader.initialize("test2graphDB");
// System.out.println(new Date());
loader.buildConnection(url, user, password);
// System.out.println(new Date());
if (args[0].equalsIgnoreCase("0")){
if (args[1]!=null) {
System.out.println(args[1]);
loader.loadData(args[1], "node_id", "name", "type");
System.out.println(new Date());
}
}
if (args[0].equalsIgnoreCase("1")){
if (args[1]!=null) {
System.out.println(args[1]);
loader.createEdges(args[1]);
System.out.println(new Date());
}
}
// System.out.println(new Date());
// loader.createEdges("relationships");
// System.out.println(new Date());
loader.closeConnection();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
statistics of data already loaded:
+----------+-------+
| count(*) | 300K |
+----------+-------+
| 563543 | 300K |
| 566324 | 600K |
| 352020 | 900K |
| 600036 | 1200K |
| 2391745 | 1500K |
| 4473668 | ALL |
+----------+-------+
6 rows in set (7.96 sec)
I do a commit on the neo4J every 1000 entries as shown in line 131.
Anyone, any idea how to do further tracing where the error is originated from?
I wouldn't consider this already big data.
Thank you very much in advance.
Kind regards,
MalteR
11-21-2019 02:05 PM
This is really hard to read, can you put in some effort with code blocks and indentation to make it more readable.
11-22-2019 06:00 AM
Hi Michael,
actually the first couple of edits did work and now I seem to be no longer able to edit.
Is there any hint you can provide me with how to do formatting accordingly?
Thanks.
Malte
11-24-2019 10:56 AM
I did the formatting for you, perhaps a limit in the forum system to prevent spamming.
Your code looks ok, I would remove all the System.exit(0)
and see how / when the shutdown hook could be called elsewhere (it's called when the JVM shuts down.).
I hope the tx.success()
is after your loop ends, was not 100% visible in the long code block (try to use smaller methods)
11-25-2019 03:21 AM
Hi Michael,
right now I do, when calling the "tx.sucess()", try to do a transaction based approach.
In SQL I would understand this as a commit each 1000 entries to be more memory efficient.
So this is wrong, if I do get you right? How do I just add a new session then each 1000 loops?
The code part looks like this:
if (j % 1000 == 0 && j!=0) {
System.out.println();
System.out.print(j + ":");
System.out.println("Committing and Reopening Neo4J-Transaction.");
tx.success();
tx.close();
tx = graphDB.beginTx();
}
Btw.: how do I actually do an indentation of code blocks? Is this all done by enclosing code segments in ``` & ´´´ and then it does code formatting automatically and accepts all standard formatting attempts?
Thank you very much and kind regards,
Malte
11-27-2019 02:56 AM
Hello community,
hello Michael,
we changed the code accordingly in the main() - method as follows:
But actualy this didn't change anything regarding the error which still remains as follows:
public static void main(String[] args) throws SQLException {
// TODO Auto-generated method stub
System.out.println(new Date());
String url = null;
String user = null;
String password = null;
try {
InputStream input = new FileInputStream("./config.properties");
Properties prop = new Properties();
prop.load(input);
url = prop.getProperty("url");
user = prop.getProperty("user");
password = prop.getProperty("password");
if (url == null || user == null || password == null) {
throw new IOException();
}
if (args != null && args.length > 0) {
for (int i = 0; i < args.length; i++) {
String string = args[i];
System.out.println(i + ": " + string);
}
MySQLDataLoader loader = new MySQLDataLoader();
System.out.println(new Date());
loader.initialize("test2graphDB");
loader.buildConnection(url, user, password);
if (args[0].equalsIgnoreCase("0")) {
if (args[1] != null) {
System.out.println(args[1]);
loader.loadData(args[1], "node_id", "name", "type");
System.out.println(new Date());
}
}
if (args[0].equalsIgnoreCase("1")) {
if (args[1] != null) {
System.out.println(args[1]);
loader.createEdges(args[1]);
System.out.println(new Date());
}
}
// System.out.println(new Date());
// loader.createEdges("relationships");
// System.out.println(new Date());
loader.closeConnection();
}
} catch (Exception e) {
e.printStackTrace();
}
}
´´´
**Logging as follows:**
194000:Committing and Reopening Neo4J-Transaction.
....................................................................................................
195000:Committing and Reopening Neo4J-Transaction.
....................................................................................................
196000:Committing and Reopening Neo4J-Transaction.
...................registerShutdownHook....about to close Graph DB
java.lang.Exception
at eu.whitewolf2000.mysql.MySQLDataLoader$1.run(MySQLDataLoader.java:250)
java.lang.Exception
type
ENTITY
Rolling back Neo4J-Transaction.
This database is shutdown. - 196182
org.neo4j.graphdb.DatabaseShutdownException: This database is shutdown.
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInUnterminatedTransaction(ThreadToStatementContextBridge.java:96)
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.getKernelTransactionBoundToThisThread(ThreadToStatementContextBridge.java:87)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.createNode(GraphDatabaseFacade.java:260)
at eu.whitewolf2000.mysql.MySQLDataLoader.loadData(MySQLDataLoader.java:100)
at eu.whitewolf2000.mysql.MySQLDataLoader.main(MySQLDataLoader.java:397)
Closing Neo4J-Transaction.
Wed Nov 27 06:08:20 CET 2019
Graph database closed.
11-28-2019 01:54 AM
Dear community,
the next try was closing (shutting down) the whole neo4j databse as follows:
if (j % 2500 == 0 && j != 0) {
System.out.println();
System.out.print(j + ":");
System.out.println("Committing and Reopening Neo4J-Transaction.");
tx.success();
tx.close();
graphDB.shutdown();
graphDB = new GraphDatabaseFactory().newEmbeddedDatabase(new File("data/" + "test2graphDB"));
registerShutdownHook(graphDB);
tx = graphDB.beginTx();
}
actually my question here is how can I open a database (the same) freshly?
actually following errors occur:
Wed Nov 27 14:10:02 CET 2019
0: 0
1: nodes_1800000
Wed Nov 27 14:10:02 CET 2019
buildConnection
nodes_1800000
loadData
com.mysql.jdbc.JDBC4Connection@43034809
SELECT * from nodes_180000
..........................................................................................................................................................................................................................................................
2500:Committing and Reopening Neo4J-Transaction.
.type
ADDRESS
Rolling back Neo4J-Transaction.
This database is shutdown. - 2501
org.neo4j.graphdb.DatabaseShutdownException: This database is shutdown.
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.assertInUnterminatedTransaction(ThreadToStatementContextBridge.java:96)
at org.neo4j.kernel.impl.core.ThreadToStatementContextBridge.getKernelTransactionBoundToThisThread(ThreadToStatementContextBridge.java:87)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.kernelTransaction(GraphDatabaseFacade.java:942)
at org.neo4j.kernel.impl.core.NodeProxy.safeAcquireTransaction(NodeProxy.java:496)
at org.neo4j.kernel.impl.core.NodeProxy.createRelationshipTo(NodeProxy.java:542)
at eu.whitewolf2000.mysql.MySQLDataLoader.loadNodeData(MySQLDataLoader.java:470)
at eu.whitewolf2000.mysql.MySQLDataLoader.main(MySQLDataLoader.java:542)
Closing Neo4J-Transaction.
Wed Nov 27 14:18:07 CET 2019
registerShutdownHook....about to close Graph DB
registerShutdownHook....about to close Graph DB
java.lang.Exception
at eu.whitewolf2000.mysql.MySQLDataLoader$1.run(MySQLDataLoader.java:250)
java.lang.Exception
at eu.whitewolf2000.mysql.MySQLDataLoader$1.run(MySQLDataLoader.java:250)
java.lang.Exception
java.lang.Exception
Graph database closed.
Graph database closed.
11-29-2019 08:10 AM
Is there anything more in your neo4j.log or debug.log
It really looks as if something is shutting down your JVM?
Sometimes Linux does this if the VM exceeds memory limits.
How much JVM memory config do you give your java program?
I recently also had an issue with the postgres java driver materializing the whole resultsset in memory blowing it up and not streaming according to the fetch-size set on the statement.
12-03-2019 06:49 AM
Hi Michael, hi community,
Actually what I don't understand is the following:
Trying to shutdown and restart only brings me further if there's no internal limit of nodes and relationships, can it have to do with community version, is there any limit?
This is the log of a run that failed without hard shutdown within the java code:
Thank you very much in advance!
12-19-2019 12:49 AM
The limit is 34 billion or such in CE.
Can you just not do the shutdown hook at all, and just shut down the db when the import completes (e.g. in a try-finally block) ?
Do you by any chance create a previous transaction e.g. in setup or around loadData() which is is not properly closed? if so that would cause the inner transactions to accumulate memory until it dies.
01-09-2020 01:51 AM
Dear Michael,
I have managed to load the nodes with LOAD CSV.
The issue that I am struggeling right now with is creation of the relationships accordingly. Is this a thing which is done quicker with Java due to the internal structure of neo4j?
Thank you very much.
Kindest Regards,
Malte
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///relationship_officeraa' AS row
FIELDTERMINATOR ';'
MATCH (m),(n)
WHERE m.node_id = trim(row.start_id) and n.node_id = trim(row.end_id) and n.source = trim(row.source) and m.source = trim(row.source)
MERGE (m)-[r:OFFICER_OF {source: row.source, status: row.status, start_date: row.start_date, end_date: row.end_date, type: row.links }]-(n)
return r;
This is how the relationship files do look like:
start_id;end_id;source;status;start_date;end_date;links
605;729;RISE NETWORK;N;N;N;ALEXANDRU SODRINGA partner AQEEL MOHAMED DHUYAB
Loading relationships like this takes ages:
for 3000 created relationships something like 5 days.
indexes are online and on each node type combined node_id and source.
description | indexName | tokenNames | properties | state | type | progress | provider | id | failureMessage |
---|---|---|---|---|---|---|---|---|---|
"INDEX ON :ADDRESS(node_id, source)" | "index_19" | ["ADDRESS"] | ["node_id", "source"] | "ONLINE" | "node_label_property" | 100.0 | { "version": "1.0", "key": "native-btree" } | 19 | "" |
match(n) return count(n);
"count(n)"
58570409
01-19-2020 03:06 PM
You forgot to add/use labels for your nodes.
Without those you cannot have indexes on :Label(id)
and the database has to scan the full db for each id lookup.
You probably also want to only do MERGE on the relationship. and set the properties afterwards ?
All the sessions of the conference are now available online