Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-28-2020 05:00 PM
Not too long ago, Neo4j announced the ability for users to purchase Neo4j Aura through the GCP Marketplace. So now there is a fully managed graph database as a service available in GCP, which is sweet.
PubSub on the other hand is a native GCP messaging service that GCP users use to send data between application components. In this post, we’re going to walk through how to make the two work together nicely; how can we take data from PubSub and get it into Neo4j Aura?
By the end of this post, we’re going to have a pipeline which will let us:
Some of this material was discussed in this NODES 2020 talk, so if you’d like to watch a video with a deeper-dive on Google Cloud services, check it out.
If you don’t have an Aura instance, you can follow these instructions to get started on GCP quickly. It’s just a few clicks through the GCP Marketplace.
We will be using Cloud Functions, which are an FaaS offering that lets us trigger code in response to an event. Google Cloud already conveniently lets you automatically trigger functions like this on Cloud Storage bucket events.
In this repo, we have all the code for this example, but let’s focus on the most important part:
const { PubSub } = require(‘@google-cloud/pubsub’);
const annotateImage = require(‘./annotateImage’);
const annotateImageToPubsubTopic = async (file, context) => {
console.log(` Event: ${context.eventId}`);
console.log(` Event Type: ${context.eventType}`);
console.log(` Bucket: ${file.bucket}`);
console.log(` File: ${file.name}`);
console.log(` Metageneration: ${file.metageneration}`);
console.log(` Created: ${file.timeCreated}`);
console.log(` Updated: ${file.updated}`);
const uri = `gs://${file.bucket}/${file.name}`;
const pubsub = new PubSub();
const topic = pubsub.topic(process.env.OUTPUT_TOPIC);
// Use the vision API to annotate the image
const labels = await(annotateImage(uri));
// Add the URI into each label so we know what the label is for
labels.forEach(label => {
label.uri = uri;
});
// Construct a message to send via Pubsub
const messageBuffer = Buffer.from(JSON.stringify(labels), 'utf8');
// Publishes a message
const res = await topic.publish(messageBuffer);
console.log(`Labels published for ${uri} successfully`, res);
}
module.exports = {
annotateImage: annotateImageToPubsubTopic,
};
The magic there is the annotateImage function, which looks like this:
const vision = require('@google-cloud/vision');
async function annotateImage(uri) {
// Creates a client
const client = new vision.ImageAnnotatorClient();
// Performs label detection on the image file
const [result] = await client.labelDetection(uri);
return result.labelAnnotations;
}
This is the simplest way of using the Cloud Vision client libraries to run all of the feature detection offered by that API. What we’ll get is a list of labelAnnotations back that look like this, with scores & confidences that tell us how confident the Cloud Vision model is in its identification.
{
"uri": "gs://my-bucket/10016.jpg"
"description": "Dog",
"mid": "/m/0bt9lr",
"confidence": "0.82",
"score": "0.99",
"topicality": "0.995"
}
We added back the image URI to this label object in our implementation, (it didn’t come from Cloud Vision) so we know what the label is for when it moves on in the pipeline. This is important, which we’ll see in a later step.
When we deploy this image annotation function, we will do it like so:
gcloud functions deploy annotateImage \
--ingress-settings=all --runtime=nodejs12 \
--allow-unauthenticated \
--timeout=300 \
--service-account=my-sa-address@project.iam.gserviceaccount.com \
--set-env-vars GCP_PROJECT=my-project-id \
--set-env-vars OUTPUT_TOPIC=imageAnnotation \
--trigger-bucket my-bucket \
--project my-project-id
The important parts here are the --trigger-bucket argument, which will call the function whenever a file gets uploaded to my-bucket, and the --service-account argument, which runs our function with a particular account with the correct rights. Also notice the OUTPUT_TOPIC which tells the function where to send the messages; in this case to the imageAnnotation topic in PubSub.
OK, so we have images getting uploaded, processed into an array of JSON labels, and sent to another PubSub topic. Now we need to get that data into Neo4j Aura. To do that, we’ll deploy a second Cloud Function that’s triggered by PubSub. We’ll use this code repo that contains serverless functions for working with Neo4j.
In the directions for that repository, we want to deploy a custom cypher function; this will basically listen on a PubSub topic, and use a particular cypher statement that we define to sink all of the data coming in to Aura.
First, we need to set up a list of environment variables, which will make our deploy easier. So I’ll create an env.yaml file that contains this:
GCP_PROJECT: graphs-are-everywhere
URI_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_URI/versions/latest
USER_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_USER/versions/latest
PASSWORD_SECRET: projects/graphs-are-everywhere/secrets/NEO4J_PASSWORD/versions/latest
CYPHER: "MERGE (i:Image {uri:event.uri}) MERGE (l:Label { mid: event.mid, description: event.description }) MERGE (i)-[:LABELED {score:event.score, confidence:event.confidence, topicality:event.topicality}]->(l)"
The variables that deal with secrets tell the function to get Aura credentials from Google Secret Manager; this is optional, you can use regular environment variables if you prefer. The most important part is the cypher statement. When we get a list of messages via PubSub, the function will unwind that list for us as a variable called event, which we can use to access the message payload.
From within the code repo, we execute this:
gcloud functions deploy imageAnnotationsToNeo4j \
--entry-point=customCypherPubsub \
--ingress-settings=all --runtime=nodejs12 \
--allow-unauthenticated --timeout=300 \
--service-account=my-sa-address@project.iam.gserviceaccount.com \
--env-vars-file env.yaml \
--trigger-topic imageAnnotation
This is similar to deploying the first; we need a service account. We specify a custom entry point of customCypherPubsub to get the right implementation, and name our function. The --trigger-topic effectively implements part of our workflow, because we know things coming to imageAnnotation are messages that are coming from our previous function.
Here’s what the resulting graph looks like, for a small sample of the images in it.
![](upload://uMt2jZ0IlQRdDWWs7mj6eWBymnm.jpeg)Labeled images in Neo4j AuraWe can see that mammals, vertebrates, and Primates are central in this graph. This makes sense, since the image corpus I’m using is a collection of animal images.
Let’s take a particular single image and how it was tagged, and look at it together with the actual underlying image.
![](upload://jcpOUTP5lO9nluVWwlHeyvr1f8G.png)An individual labeled image in our graph![](upload://3S5fsXD1EKlfkZeehXeWN96uF4P.jpeg)animals_0820.jpg — which was labeled in the graph aboveAll driven by files in a bucket:
![](upload://A4ul7cWUB0730P8eu097OgWNaUR.png)And just two deployed Cloud Functions:
![](upload://bncdgjkiYUY09SdHyqDlT6yMQdC.png)Happy graph hacking!
References and code repos used in this post:
Neo4j Aura & PubSub on Google Cloud: Image Annotation was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
All the sessions of the conference are now available online