Neo4j

greta · ‎10-28-2020

Not too long ago, Neo4j announced the ability for users to purchase Neo4j Aura through the GCP Marketplace. So now there is a fully managed graph database as a service available in GCP, which is sweet.

PubSub on the other hand is a native GCP messaging service that GCP users use to send data between application components. In this post, we’re going to walk through how to make the two work together nicely; how can we take data from PubSub and get it into Neo4j Aura?

By the end of this post, we’re going to have a pipeline which will let us:

Upload any image to a Google storage bucket
Automatically annotate that image with Cloud Vision API — this lets us detect animals, objects, people and so on in image files
Send the annotations to a PubSub topic, where they’ll be written into a connected graph within Neo4j Aura.

![](upload://lbItMehNgNjfKFiVVCFUWNNA3KT.png)Our pipeline

Some of this material was discussed in this NODES 2020 talk, so if you’d like to watch a video with a deeper-dive on Google Cloud services, check it out.

Step 0: Provision an Aura Instance

If you don’t have an Aura instance, you can follow these instructions to get started on GCP quickly. It’s just a few clicks through the GCP Marketplace.

Step 1: File Upload Notification & Image Annotation

We will be using Cloud Functions, which are an FaaS offering that lets us trigger code in response to an event. Google Cloud already conveniently lets you automatically trigger functions like this on Cloud Storage bucket events.

In this repo, we have all the code for this example, but let’s focus on the most important part:

const { PubSub } = require(‘@google-cloud/pubsub’);
const annotateImage = require(‘./annotateImage’);

const annotateImageToPubsubTopic = async (file, context) => {
 console.log(` Event: ${context.eventId}`);
 console.log(` Event Type: ${context.eventType}`);
 console.log(` Bucket: ${file.bucket}`);
 console.log(` File: ${file.name}`);
 console.log(` Metageneration: ${file.metageneration}`);
 console.log(` Created: ${file.timeCreated}`);
 console.log(` Updated: ${file.updated}`);

 const uri = `gs://${file.bucket}/${file.name}`;

 const pubsub = new PubSub();
 const topic = pubsub.topic(process.env.OUTPUT_TOPIC);

 // Use the vision API to annotate the image
 const labels = await(annotateImage(uri));

 // Add the URI into each label so we know what the label is for
 labels.forEach(label => {
     label.uri = uri;
 });

 // Construct a message to send via Pubsub
 const messageBuffer = Buffer.from(JSON.stringify(labels), 'utf8');

 // Publishes a message
 const res = await topic.publish(messageBuffer);
 console.log(`Labels published for ${uri} successfully`, res);
}

module.exports = {
    annotateImage: annotateImageToPubsubTopic,
};

The magic there is the annotateImage function, which looks like this:

const vision = require('@google-cloud/vision'); 
async function annotateImage(uri) {    
     // Creates a client    
     const client = new vision.ImageAnnotatorClient();     
     // Performs label detection on the image file    
     const [result] = await client.labelDetection(uri);    
     return result.labelAnnotations;
}

This is the simplest way of using the Cloud Vision client libraries to run all of the feature detection offered by that API. What we’ll get is a list of labelAnnotations back that look like this, with scores & confidences that tell us how confident the Cloud Vision model is in its identification.

{
   "uri": "gs://my-bucket/10016.jpg"
   "description": "Dog",
   "mid": "/m/0bt9lr",
   "confidence": "0.82",
   "score": "0.99",
   "topicality": "0.995"
}

We added back the image URI to this label object in our implementation, (it didn’t come from Cloud Vision) so we know what the label is for when it moves on in the pipeline. This is important, which we’ll see in a later step.

When we deploy this image annotation function, we will do it like so:

gcloud functions deploy annotateImage \
  --ingress-settings=all --runtime=nodejs12 \
  --allow-unauthenticated \
  --timeout=300 \ 
  --service-account=my-sa-address@project.iam.gserviceaccount.com \
  --set-env-vars GCP_PROJECT=my-project-id \
  --set-env-vars OUTPUT_TOPIC=imageAnnotation \
  --trigger-bucket my-bucket \
  --project my-project-id

The important parts here are the --trigger-bucket argument, which will call the function whenever a file gets uploaded to my-bucket, and the --service-account argument, which runs our function with a particular account with the correct rights. Also notice the OUTPUT_TOPIC which tells the function where to send the messages; in this case to the imageAnnotation topic in PubSub.

Step 2: Getting Data into our Aura Graph

OK, so we have images getting uploaded, processed into an array of JSON labels, and sent to another PubSub topic. Now we need to get that data into Neo4j Aura. To do that, we’ll deploy a second Cloud Function that’s triggered by PubSub. We’ll use this code repo that contains serverless functions for working with Neo4j.

In the directions for that repository, we want to deploy a custom cypher function; this will basically listen on a PubSub topic, and use a particular cypher statement that we define to sink all of the data coming in to Aura.

First, we need to set up a list of environment variables, which will make our deploy easier. So I’ll create an env.yaml file that contains this:

GCP_PROJECT: graphs-are-everywhere
URI_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_URI/versions/latest
USER_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_USER/versions/latest
PASSWORD_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_PASSWORD/versions/latest

CYPHER: "MERGE (i:Image {uri:event.uri}) MERGE (l:Label { mid: event.mid, description: event.description }) MERGE (i)-[:LABELED {score:event.score, confidence:event.confidence, topicality:event.topicality}]->(l)"

The variables that deal with secrets tell the function to get Aura credentials from Google Secret Manager; this is optional, you can use regular environment variables if you prefer. The most important part is the cypher statement. When we get a list of messages via PubSub, the function will unwind that list for us as a variable called event, which we can use to access the message payload.

Deploying the Function

From within the code repo, we execute this:

gcloud functions deploy imageAnnotationsToNeo4j \
  --entry-point=customCypherPubsub \
  --ingress-settings=all --runtime=nodejs12 \
  --allow-unauthenticated --timeout=300 \
  --service-account=my-sa-address@project.iam.gserviceaccount.com \
  --env-vars-file env.yaml \
  --trigger-topic imageAnnotation

This is similar to deploying the first; we need a service account. We specify a custom entry point of customCypherPubsub to get the right implementation, and name our function. The --trigger-topic effectively implements part of our workflow, because we know things coming to imageAnnotation are messages that are coming from our previous function.

Putting it All Together

Here’s what the resulting graph looks like, for a small sample of the images in it.

![](upload://uMt2jZ0IlQRdDWWs7mj6eWBymnm.jpeg)Labeled images in Neo4j Aura

We can see that mammals, vertebrates, and Primates are central in this graph. This makes sense, since the image corpus I’m using is a collection of animal images.

Let’s take a particular single image and how it was tagged, and look at it together with the actual underlying image.

![](upload://jcpOUTP5lO9nluVWwlHeyvr1f8G.png)An individual labeled image in our graph![](upload://3S5fsXD1EKlfkZeehXeWN96uF4P.jpeg)animals_0820.jpg — which was labeled in the graph above

All driven by files in a bucket:

![](upload://A4ul7cWUB0730P8eu097OgWNaUR.png)

And just two deployed Cloud Functions:

![](upload://bncdgjkiYUY09SdHyqDlT6yMQdC.png)

Conclusion

The option to purchase Neo4j Aura through the GCP Marketplace lets you set up the best managed graph database in a few minutes, on your GCP bill.
Using neo4j-serverless-functions, you can quickly deploy Cloud Functions that take data from either HTTP or PubSub and get it into Neo4j Aura.
Those two things together give you the ability to use any of GCP’s cloud services together with graphs. In this example, we’ve used the Cloud Image API to label images. But you could use the Translation API, the Google Docs API, or anything else in the same kind of deployment; the sky is the limit

Happy graph hacking!

References and code repos used in this post:

neo4j-serverless-functions (Cloud Functions: consume data from PubSub or HTTP to Neo4j & Aura)
pubsub-file-process (Cloud Function: do image annotation on an uploaded file to a bucket)

![|1x1](upload://6w7HOLoKuTDtEXRteNiYA53kW94.gif)

Neo4j Aura & PubSub on Google Cloud: Image Annotation was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.