Neo4j

adam_cowley · ‎08-26-2020

We’re now going to explore the graph embeddings using the Python programming language, the Neo4j Python driver, and some popular Data Science libraries. We’ll create a scatterplot of the embedding and we want to see whether it’s possible to work out which town a country belongs to by looking at its embedding.

The required libraries can be installed by running the following command:

pip install neo4j sklearn altair

Let’s create a file called roads.py and paste the following statements:

from neo4j import GraphDatabase
from sklearn.manifold import TSNE
import numpy as np
import altair as alt
import pandas as pd
driver = GraphDatabase.driver("bolt://localhost", auth=("neo4j", "neo"))

The first few lines import the required library and the last line creates a connection to the Neo4j database. You’ll need to change the Bolt URL and credentials to match that of your own database.

We’re going to use the driver to execute a Cypher query that returns the embedding for towns in the most popular countries, which are Spain, Great Britain, France, Turkey, Italy, Germany, and Greece. Restricting the number of countries will make it easier to detect any patterns once we start visualizing the data. Once the query has run, we’ll convert the results into a Pandas data frame:

with driver.session(database="neo4j") as session:
    result = session.run("""
    MATCH (p:Place)-[:IN_COUNTRY]->(country)
    WHERE country.code IN $countries
    RETURN p.name AS place, p.embeddingNode2vec AS embedding, country.code AS country
    """, {"countries": ["E", "GB", "F", "TR", "I", "D", "GR"]})
    X = pd.DataFrame([dict(record) for record in result])

Now we’re ready to start analyzing the data.

At the moment our embeddings are of size 10, but we need them to be of size 2 so that we can visualize them in 2 dimensions. The t-SNE algorithm is a dimensionality reduction technique that reduces high dimensionality objects to 2 or 3 dimensions so that they can be better visualized. We’re going to use it to create x and y coordinates for each embedding.

The following code snippet applies t-SNE to the embeddings and then creates a data frame containing each place, its country, as well as x and y coordinates.

X_embedded = TSNE(n_components=2, random_state=6).fit_transform(list(X.embedding))
places = X.place
df = pd.DataFrame(data = {
    "place": places,
    "country": X.country,
    "x": [value[0] for value in X_embedded],
    "y": [value[1] for value in X_embedded]
})

The content of the data frame is as follows:

Table 3. Results place country x y Larne GB 23.597162 -3.478853 Belfast GB 23.132071 -4.331254 La Coruña E -6.959006 7.212301 Pontevedra E -6.563524 7.505499 Huelva E -11.583806 11.094340

We can run the following code to create a scatterplot of our embeddings:

alt.Chart(df).mark_circle(size=60).encode(
    x='x',
    y='y',
    color='country',
    tooltip=['place', 'country']
).properties(width=700, height=400)

From a quick visual inspection of this chart we can see that the embeddings seem to have clustered by country.

This is a companion discussion topic for the original entry at https://neo4j.com/developer/graph-data-science/applied-graph-embeddings/

Neo4j

Tutorial: Applied Graph Embeddings