cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Fastest way to read in a DirectoryTree?

I am reading in a directory tree from the disk. While I traverse the directories I am directly streaming them via IEnumerable to my asynchronous database methods. Since they are asynchronous I can't rely on the correct order (parent before child directory). So I just create all nodes and then connect them with relations in a later step. This makes use of all neo4j threads and seems a lot faster than writing synchronous.

But it's still really slow.

Would it be faster to just write the nodes to disk first e.g. into a csv file. And then bulk insert them into neo4j?

Any suggestions would be appreciated.

1 ACCEPTED SOLUTION

Hi Thypari,

If I were you I would collect the directories into lists of 1-2000 then make one query call using unwind to handle all in one go.

For example, if creating a Dir was:

CREATE (:Directory $param)

it would now be:

UNWIND $param AS dir
CREATE (d:Directory) SET d = dir

That's typically a lot faster.

All the best

Chris

View solution in original post

3 REPLIES 3

Hi Thypari,

If I were you I would collect the directories into lists of 1-2000 then make one query call using unwind to handle all in one go.

For example, if creating a Dir was:

CREATE (:Directory $param)

it would now be:

UNWIND $param AS dir
CREATE (d:Directory) SET d = dir

That's typically a lot faster.

All the best

Chris

Do you still have to convert all user-defined types to Dictionaries?

Directory{ size long, string name, ShareInformation shareInformation }
ShareInformation {string someProperty1, int someProperty2}

So I can't just pass a List<Directory> into the cypher because it contains a ShareInformation property? The same goes for none-defined types like GUIDs:

Is the recommended approach still to convert objects into nested Dictionaries?

Yep, unfortunately so - you'd need to parse the output, something like this:

async Task Main()
{
	var directory = new DirectoryInfo("d:\\Projects\\");
		var directories = directory
		.GetDirectories()
		.Select(d => new Directory { Name = d.Name, Size = d.GetFiles().Length + d.GetDirectories().Length, ShareInformation = new ShareInformation { PropInt = 1, PropString = d.FullName } })
		.ToList();

	var query = new Query(
	@"UNWIND $directories AS dir 
	  CREATE (d:Directory) SET d = dir", 
	  new Dictionary<string, object> { 
	  	{ "directories", ConvertToDriverFormatFromCollection(directories)}
	  });
	
	var driver = GraphDatabase.Driver("neo4j://localhost:7687", AuthTokens.Basic("neo4j", "neo"), config => config.WithEncryptionLevel(EncryptionLevel.None));
	var session = driver.AsyncSession();
	var x =await session.RunAsync(query);
	await x.ConsumeAsync();
}

public IEnumerable<IDictionary<string, object>> ConvertToDriverFormatFromCollection<T>(IEnumerable<T> items)
{
	return items.Select(i => ConvertToDriverFormat(i));
}

public IDictionary<string, object> ConvertToDriverFormat<T>(T item)
{
	return item.GetType().GetProperties().Where(i => i.CanRead && i.PropertyType.IsValueType || i.PropertyType == typeof(string)).ToDictionary(x => x.Name, x => x.GetValue(item));
}

public class Directory
{
	public long Size { get; set; }
	public string Name { get; set; }
	public ShareInformation ShareInformation { get; set; }
}

public class ShareInformation
{
	public string PropString { get; set; }
	public int PropInt { get; set; }
}
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online