cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Neo4j Fabric - sharded subgraph reassembly for advanced querying?

Hi Graphistas!

I am fooling around with fabric and was curious about enterprise scale query federation and symbolic linking across shards...

using a configuration like this:

fabric.database.name=neo4jfabric

fabric.graph.0.uri=neo4j://localhost:7687
fabric.graph.0.database=neo4jshard1
fabric.graph.0.name=neo4jshard1

fabric.graph.1.uri=neo4j://localhost:7687
fabric.graph.1.database=neo4jshard2
fabric.graph.1.name=neo4jshard2

I split the movie graph into two shards, one with the -[ACTED_IN]- relationships, and one with all the other relationships (after loading the whole :play-movies graph into each shard):

//only (:Movie)<-[:ACTED_IN]-(:Person)
:use neo4jshard1;
MATCH (n:Person) WHERE NOT (n)-[:ACTED_IN]->() DETACH DELETE n;
MATCH (n:Person)-[r:DIRECTED|WROTE|PRODUCED]->() DELETE r;
MATCH (n:Movie) WHERE NOT (n)<--() DETACH DELETE n;

//every other type of role for (Movie)<-[*]-(Person), except ACTED_IN
:use neo4jshard2;
MATCH (n:Person) WHERE NOT (n)-[:DIRECTED|WROTE|PRODUCED]->() DETACH DELETE n;
MATCH (n:Person)-[r:ACTED_IN]->() DELETE r;
MATCH (n:Movie) WHERE NOT (n)<--() DETACH DELETE n;

I can do lots of queries that inspect these shards:

UNWIND neo4jfabric.graphIds() AS graphId
CALL {
  USE neo4jfabric.graph(graphId)
  MATCH (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
  RETURN m,r,p
}
RETURN *

but what I'd really like to do is create a new virtual graph that recognizes that the two instances of (m:Movie {title: "The Matrix" ) are in fact (semantically) the same node, by creating a virtual node that bridges the two actual nodes (movie_shard1)-(movie_virtual)-(movie_shard2). This would open the door for some really advanced federated queries using fabric.

I've experimented with creating nodes in the fabric graph (failed)

UNWIND neo4jfabric.graphIds() AS graphId
CALL {
  USE neo4jfabric.graph(graphId)
  MATCH (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
  RETURN m,r,p
}
WITH DISTINCT m.title as title
CREATE (:Movie_V {title: title})

Invalid combination of query execution types: READ_WRITE, EXPLAIN:WRITE

and I've done some initial testing with apoc.create.virtual.fromNode(node, [propertyNames]) and failed to import the fabric node...

Failed to invoke function `apoc.create.virtual.fromNode`: Caused by: org.neo4j.internal.kernel.api.exceptions.EntityNotFoundException: Unable to load NODE with id 1125899906842625.

..and if I completely rip the nodes and rels apart, I can rebuild a workable virtual graph:

UNWIND neo4jfabric.graphIds() AS graphId
CALL {
  USE neo4jfabric.graph(graphId)
  MATCH g = (m:Movie {title: 'The Matrix'})<-[r]-(p:Person)
  RETURN
  labels(m) AS m_lbl,
  properties(m) AS m_prop,
  type(r) AS r_type,
  properties(r) AS r_prop,
  labels(p) AS p_lbl, 
  properties(p) AS p_prop
}
WITH m_lbl,m_prop,COLLECT([p_lbl,p_prop,r_type,r_prop]) AS rows
CALL apoc.create.vNode(m_lbl,m_prop) YIELD node AS movie
UNWIND rows AS row
CALL apoc.create.vNode(row[0],row[1]) YIELD node AS person
CALL apoc.create.vRelationship(person,row[2],row[3],movie) YIELD rel
RETURN movie,rel,person

so maybe there's an more efficient way to accomplish this directly from the fabric nodes?

Any tips or suggestions would be welcome!

Thanks, Michael

2 REPLIES 2

to get a more complete graph:

UNWIND neo4jfabric.graphIds() AS graphId
CALL {
  USE neo4jfabric.graph(graphId)
  MATCH (m:Movie)<-[r]-(p:Person)
  RETURN
  labels(m) AS m_lbl,
  properties(m) AS m_prop,
  type(r) AS r_type,
  properties(r) AS r_prop,
  labels(p) AS p_lbl, 
  properties(p) AS p_prop
}
WITH DISTINCT m_lbl,m_prop,r_type,r_prop,p_lbl,p_prop
WITH 
COLLECT(DISTINCT [m_lbl,m_prop]) AS movies,
COLLECT(DISTINCT [p_lbl,p_prop]) AS persons,
COLLECT([m_lbl,m_prop,r_type,r_prop,p_lbl,p_prop]) AS rels
UNWIND movies AS m
CALL apoc.create.vNode(m[0],m[1]) YIELD node AS movie
WITH persons, rels, COLLECT(movie) AS v_movies
UNWIND persons AS p
CALL apoc.create.vNode(p[0],p[1]) YIELD node AS person
WITH rels, v_movies, COLLECT(person) AS v_persons
UNWIND rels AS r
UNWIND v_movies AS movie
UNWIND v_persons AS person
WITH r,apoc.convert.toNode(movie) AS movie, apoc.convert.toNode(person) as person
WHERE r[1].title = movie.title AND r[5].name = person.name
CALL apoc.create.vRelationship(person,r[2],r[3],movie) YIELD rel
RETURN movie,rel,person

phill240
Node Link

I am also interested.
In addition, I want to ask is there any possibility of querying fabric graph by the traversal api?
It would be a convenient way to merge multiple subgraphs.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online