cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How do I compare two graphs for equality

If you are looking to compare 2 graphs (or sub-graphs) to determine if they are equivalent, the following Cypher will produce a md5sum of the nodes and properties to make that comparison. For example, you may wish to compare a test/QA instance with a production instance.

Neo4j 3.1 forward

MATCH (n:Movie)
WITH n
ORDER BY n.title
WITH collect(properties(n)) AS propresult
RETURN apoc.util.md5(propresult);

pre 3.1

MATCH (n:Movie)
WITH n
ORDER BY n.title
WITH collect(properties(n)) AS propresult
CALL apoc.util.md5(propresult) YIELD value AS md5_property
RETURN md5_property

and when run against the default Movie Graph which includes 38 nodes with a label of Movie, this returns:

md5_property                              
3f8d4737d078783e12f7cf57a207dd67            

The above Cypher requires the installation of the apoc stored procedures set.

In the above example, we are examining all nodes with the label :Movie and producing a md5sum of all properties those nodes, using that sum to produce a md5sum hash.

To get correct results we need to order the nodes by a property value that is both defined for each node and unqiue. For this reason
you might want to use a property that is defined as a
property existence constraint and unique property constraint.

For example if the :Movie nodes had multiple nodes with the same title property, and since the Cypher above is ordering by n.title,
then the results are passed to the md5 stored procedure in the order they are found. This is typically based upon the order the nodes were created. If you had two :Movie nodes with title='The Matrix' created with the following Cypher:

CREATE (n:Movie {title:'The Matrix', genre:'Sci-Fi'})
CREATE (n1:Movie {title:'The Matrix', genre:'Action'})

then simply running the Cypher to produce the md5 hash will produce a md5_property of:

md5_property
5bc18a680ef59ba09466da4217166d30

However, if you reversed the order of the CREATE statements, like this:

CREATE (n1:Movie {title:'The Matrix', genre:'Action'})
CREATE (n:Movie {title:'The Matrix', genre:'Sci-Fi'})

the result of the same md5 hashing Cypher will yield a different md5_property:

md5_property
c3c565b45457d2182731050e0cbab221

In the above example, so as to get the correct md5 values, regardless of the order of the creates, we need to run Cypher which
will return data in a guarenteed order, using an ORDER BY clause:

MATCH (n:Movie)
WITH n
ORDER BY n.name, n.genre
WITH collect(properties(n)) AS propresult
CALL apoc.util.md5(propresult) YIELD value AS md5_property
RETURN md5_property

which will always return:

md5_property                               
c3c565b45457d2182731050e0cbab221            

NOTE: Additionally, we cannot simply collect(n) (i.e. the entire node) for internally it includes the internal
node id (a unique internal identifier).

If you run the same Cypher on two separate environments and get the same md5 sums, the nodes can be proven to be the same in terms of defintion of labels and proerties.

0 REPLIES 0