Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-03-2021 10:49 AM
I am using the latest Neo4j.Driver package (4.2.0) and the latest community edition of the Neo4j server (4.2.3).
I must be doing something wrong, because my query takes hours to complete.
I have 4 CSV files:
The following code should be very simple. It just needs to load all the CSV and create the respective Types, Methods and the relationships.
Here is my code:
var driver = GraphDatabase.Driver("bolt://localhost:7687", AuthTokens.Basic("neo4j", "1"));
var session = driver.AsyncSession(o => o.WithDatabase("neo4j"));
try
{
Console.Write("[DI");
await session.RunAsync("DROP INDEX type_id_index IF EXISTS");
await session.RunAsync("DROP INDEX method_id_index IF EXISTS");
Console.Write("][C");
await session.WriteTransactionAsync(async tx =>
{
await tx.RunAsync("match ()-[r]->() delete r");
await tx.RunAsync("match (n) delete n");
return default(object);
});
Console.Write("][T");
await session.WriteTransactionAsync(async tx =>
{
await tx.RunAsync(@"
LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzTypes.csv' AS line
CREATE (:Type {
typeId: toInteger(line.id),
name: line.name,
fullName: line.fullName,
isCompilerGenerated: toBoolean(line.isCompilerGenerated),
asmName: line.asmName
})");
return default(object);
});
Console.Write("][M");
await session.WriteTransactionAsync(async tx =>
{
await tx.RunAsync(@"
LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzMethods.csv' AS line
CREATE (:Method {
methodId: toInteger(line.id),
name: line.name,
fullName: line.fullName,
isCompilerGenerated: toBoolean(line.isCompilerGenerated)
})");
return default(object);
});
Console.Write("][CI");
await session.RunAsync("CREATE INDEX type_id_index FOR (t:Type) ON (t.typeId)");
await session.RunAsync("CREATE INDEX method_id_index FOR (m:Method) ON (m.methodId)");
Console.Write("][TT");
await session.RunAsync(@"
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzTypeTypeDependencies.csv' AS line
MATCH (src:Type), (dst:Type)
WHERE src.typeId = toInteger(line.src) AND dst.typeId = toInteger(line.dst)
CREATE (src)-[:DEPENDS_ON]->(dst)
");
Console.Write("][TM");
await session.RunAsync(@"
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzTypeMethods.csv' AS line
MATCH (src:Type), (dst:Method)
WHERE src.typeId = toInteger(line.src) AND dst.methodId = toInteger(line.dst)
CREATE (src)-[:DECLARES]->(dst)
");
Console.Write("] ... ");
}
finally
{
await session.CloseAsync();
await driver.CloseAsync();
}
The CREATE INDEX
queries return immediately. Could be legit, I do not know how fast Neo4j can index a number property in about 1M nodes. Running :schema
in the browser confirms the two indices, but I have a feeling they are not working.
Running the above code takes almost 3 hours. What am I doing wrong?
EDIT 1
So I changed the last two queries to use the MERGE
clause:
Console.Write("][TT");
await session.RunAsync(@"
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzTypeTypeDependencies.csv' AS line
MERGE (src:Type {typeId: toInteger(line.src)})-[:DEPENDS_ON]->(dst:Type {typeId: toInteger(line.dst)})
");
Console.Write("][TM");
await session.RunAsync(@"
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/XyzTypeMethods.csv' AS line
MERGE (src:Type {typeId: toInteger(line.src)})-[:DECLARES]->(dst:Method {methodId: toInteger(line.dst)})
");
It is supposed to be much better now, because I think what I did before caused cartesian multiplication between the nodes. Yet the last query is taking an unknown amount of time (no idea how long at the moment) - still bad.
I also asked this question on SO - https://stackoverflow.com/questions/66450859/net-neo4j-drivers-asyncsession-runasync-seems-to-block-...
03-04-2021 02:33 AM
Did the reply on StackOverflow sort it out?
03-04-2021 07:18 AM
Yes, it does. Thank you very much.
03-04-2021 09:02 PM
To tie this one up, the critical piece was calling CALL db.awaitIndexes()
after the index creation, to ensure that we wait until the indexes are online before making the query that will rely on those indexes.
The cartesian product warning can also be disregarded, as that is required when you're matching on the nodes with the intent to create the relationship between them (it just ends up being a 1 x 1 cartesian product per row, so no issues with cardinality).
All the sessions of the conference are now available online