cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Convert properties representing file paths to tree structure

Hi there,

I'm having a couple of nodes each with a path poperty representing a file path:

/foo/File1.java
/foo/bar/File2.java
/foo/bar/File3.java

I'd like to convert this into a tree structure consisting of nodes/relationships, i.e.

(foo)-[:CONTAINS]->(file1)
(foo)-[:CONTAINS]->(bar)
(bar)-[:CONTAINS]->(file2)
(bar)-[:CONTAINS]->(file3)

I'm looking for an elegant Cypher/APOC based solution, any suggestion out here?

Cheers

Dirk

5 REPLIES 5

In terms of pseudo-code, you can split the string by slash, (split(path, "/") and then deal with that array, unwind the array and create it as nodes, and create relationships between them)

It gets a bit more complicated if you want to say that /foo contains /foo/bar, rather than just /foo contains bar.

I'd recommend trying some things yourself, and then coming back with what you tried and what doesn't work about it. It's easier for the community to support questions rather than to write the code.

Hi David,

thanks for your response, I posted that question in that form because I hoped that someone else already solved this and could directly provide a solution.

Before that I already tried on my own and ran into problems: I've used split-function to get the path segments, created nodes using apoc.create.node and linked them:

WITH split("foo/bar/File1.java","/") as segments
UNWIND segments AS segment
CALL apoc.create.node(['Path'], {path:segment}) YIELD node
WITH collect(node) as nodes
CALL apoc.nodes.link(nodes,'CONTAINS') 
RETURN nodes

This looks good at first but it comes with a problem: If I feed it now with an overlapping path (e.g. "foo/bar/File2.java") it will create a complete new list but what I need is that the "foo" and "bar" nodes should be re-used/merged. So the correct solution would be something like "merge on every fully qualified path segment", e.g. "foo", "foo/bar", "foo/bar/File.java".

As a workaround I see to create all those independent lists, run some reduce-query on all created nodes to create the fully qualified paths and merge the duplicates afterwards. But this sounds a bit strange to me and I'm now looking for a more elegant solution.

Cheers

Dirk

Here's my current solution:

  1. Create linked lists of :Path labeled nodes for each relativePath property (e.g. "/foo/bar") found on a :Git:File nodes, e.g. "(foo)-[:CONTAINS->(bar)":
MATCH
  (f:Git:File)
WHERE
  exists(f.relativePath)
WITH
  f, split(f.relativePath, "/") as segments
UNWIND
  segments AS segment
CALL
  apoc.create.node(['Path'], {path:segment}) YIELD node
WITH
  f, collect(node) as nodes
CALL
  apoc.nodes.link(nodes,'CONTAINS') 
RETURN
  count(nodes)

Compute for each :Path node a relativePath property representing the path from the root, e.g. "/foo/bar"

MATCH
  (root:Path)
WHERE NOT
  ()-[:CONTAINS]->(root)
WITH
  root
MATCH
  path=(root)-[:CONTAINS*0..]->(segment:Path)
SET
  segment.relativePath = reduce(result = "", n in nodes(path) | result + "/" + n.path)
RETURN
  count(path)

Merge :Path duplicates using APOC

MATCH
  (p:Path)
WITH
  p.relativePath as relativePath, collect(p) as paths
CALL
  apoc.refactor.mergeNodes(paths, {mergeRels:true}) YIELD node
RETURN
  relativePath, count(paths)

Remove left-over duplicates of CONTAINS relations between merged :Path nodes

MATCH
  (p:Path)-[r:CONTAINS]->(c:Path)
WITH
  p,c, collect(r) as relations
WHERE
  size(relations) > 1
UNWIND
  tail(relations) as duplicate
DELETE
  duplicate
RETURN  
  p,c

Any suggestion on how to improve that?

Cheers

Dirk

Here's a different approach that may work for you.

We use an APOC function to get the indexes of all slashes in the string (this gets us a list of indexes), then we use an extract on that list to get us the substring from the start of the path to the given index, and we make sure we add the full path at the end:

WITH "path/to/the/thing.txt" as path
WITH path, apoc.text.indexesOf(path, "/") as delimiters
WITH path, [del in delimiters | substring(path, 0, del)] + path as paths
RETURN paths

This results in: ["path", "path/to", "path/to/the", "path/to/the/thing.txt"]
Now that you have the absolute path to each node, you can MERGE the nodes (with a FOREACH), and then MERGE the relationships between each node.

To avoid creating duplicates, you can use apoc.coll.pairsMin() on the nodes to get you a list of list pairs of adjacent nodes in the list, then UNWIND that and MERGE the relationships between.

Looks good, will give it a try and come back with the results!

Thanks a lot,

Dirk