Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-17-2020 04:35 PM
Hi,
I'm using neo4j to store filesystems and version them.
I have reimplemented the concept of git database with Trees
, Blobs
and SHA1 to reuse the same hierarchy if it already exists.
Git database concepts:
My implementation so far:
master
branch in green, 2 OS
commits in red, each one having their own filesystem root Tree
.Blob
is a new file that I inserted in the filesystem after the first commit, that's why only the new fs hierarchy is pointing to him.
What I would like to do now is generating a diff between these 2 hierarchies, find out what has changed:
I managed to write a Cypher query that can retrieve all the modified Trees from the new hierarchies (the Trees who have a new SHA1, because one or more of their children has changed), and to match in the old filesystem the existing Tree, in order to compare them later on.
// params
// old_root_sha1: sha1sum of the old filesystem root Tree
// new_root_sha1: sha1sum of the new filesystem root Tree
//
// 1. MATCH all the subTrees* starting from the old filesystem's root Tree
MATCH (old_root:Tree {sha1sum: $old_root_sha1})-[:HAS_CHILD_TREE*]->(old_child_t:Tree)
// 2. Get them as a list of sha1sum ["ad233c48940b8355e457e11af561c32f1309e6c9", ...]
WITH old_root, collect(old_child_t.sha1sum) as old_sha1
// 3. MATCH all the subTrees* starting from the new filesystem's root Tree
MATCH new_paths = (new_root:Tree {sha1sum: $new_root_sha1})-[:HAS_CHILD_TREE*]->(new_child_t:Tree)
// 4. filter by selecting only those who don't appear in the sha1sum list of the old Trees
// we have the modified Trees
WHERE NOT new_child_t.sha1sum IN old_sha1
// 5. associate modified Tree sha1sum and list of the modified Tree path parts
// "68804712c8f5f54b76b15c8f67fed90a51ddf5ea" ["bin"]
// "7eae3c09dd5c6642f5ee58e2d7d1bd79adc592a1" ["boot", "grub", "`i386-pc`"]
// up to this point, query is fast (200 ms)
WITH old_root, new_child_t.sha1sum as new_tree_sha1, [rel IN relationships(new_paths) | rel.name] as new_path_parts
// 6. query the old root again to get its children
MATCH old_path = (old_root:Tree)-[:HAS_CHILD_TREE*]->(old_child_t:Tree)
// build the old path parts list
WHERE [rel IN relationships(old_path) | rel.name] = new_path_parts
RETURN new_tree_sha1, old_child_t.sha1sum as old_tree_sha1, new_path_parts
Retrieving the list of modified Trees is fast and causes me no trouble.
But I'm struggling with the part where, for each one of these modified Trees, I have to query the existing Tree in the old filesystem.
The query above is building relationships lists: eg ["boot", "grub", "
i386-pc"]
, and building the same relationships for each child Tree
of the old filesystem: eg ["usr", "bin"]
, and see if both matches.
But this is extremely slow, and there are many useless comparison.
What I would like to do is to dynamically build the pattern that I'm looking for.
Example:
currenty modified Tree path: ["boot", "grub", "
i386-pc"]
pattern to build:
WITH xxxx as new_path_list // [["boot", "grub", "`i386-pc`"], ....]
// foreach path in new_path_list
// build pattern based on path items in the llist
// do the match and get the child Tree
MATCH (old_root:Tree)-[:HAS_CHILD_TREE {name: "boot"}->(:Tree)->[:HAS_CHILD_TREE {name: "grub"}]->(:Tree)->[:HAS_CHILD_TREE {name: "`i386-pc`"}]->(old_child_t:Tree)
RETURN old_child_t
Any idea how I could achieve this, and return the new_child_tree_sha1
, old_child_tree_sha1
, path_list
?
Please provide the following information if you ran into a more serious issue:
3.4
py2neo 4.3.0
Thanks you !
06-17-2020 04:40 PM
The query shown above works, but it's super slow:
Profiling:
All the sessions of the conference are now available online