Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-15-2020 07:16 AM
Hello,
My company is currently working on a Neo4j model for a large amount of data where query time has to be extremely fast to answer some particular questions.
To give a little of context, one of the things our model has to support is to associate a person to the file it owns using the person's computer information.
One way we though for speeding queries up is to insert a direct link between a person and a file, as shown in this image
You can see we put the file's name inside the relationship, so it's faster to know quickly if a user owns a file by name.
We have questions regarding this solution, because there can be plenty of users (let's say around 50 000), that all have at least one computer, and all computers can have tens of thousands of files. We read that iterating over relations of a node is extremely fast and we tried creating a model that uses this.
What we'd like to know is:
05-15-2020 08:23 AM
If I understand the situation correctly, I believe I would index a filename property on the "File" node and index a name property on the User node, then search like this
MATCH (u:User {name:'sunny'})-[:OWNS]->(f:File {filename:'readme.txt'})
RETURN u.name, f.filename
or using a WHERE clause
MATCH (u:User)-[:OWNS]->(f:File)
WHERE u.name='sunny' and f.filename='readme.txt'
RETURN u.name, f.filename
Then I'd load up a large dataset and find out how it performs before attempting to optimize.
Note: In a general sense, a user will own multiple files of the same name, in different directories.... So I imagine the File node will have a path property too?
I don't know the unique relationship upper limits, but it occurs to me that even if the database is OK with that, having that many relationships will probably cause other unwanted side effects outside of the DB when interacting with it. Sounds easy enough to spin up and test though, let me know the results if you do, I'm curious. If you have the data, trying different models is easy. The bulk loader is very fast.
05-15-2020 09:27 AM
Doing some tests this morning, we realised there is actually a limit to relation types count. We are currently working to change a little the model to index properties in the file name as you proposed.
We will then test both requests to see how it performs.
All the sessions of the conference are now available online