Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-11-2021 04:00 PM
We are continuing our journey through different use cases with GraalVM and Neo4j. In the last blog post, we covered a few of the different ways we could use Neo4j and GraalVM together. As a refresher, the list is shown below.
The previous post also covered the first scenario to connect to Neo4j from a variety of languages by using GraalVM’s polyglot capabilities and the official Neo4j Java driver. This round, we will go a little more complex and walk through the 3rd item for building a Neo4j database extension that will run various language code within a Cypher procedure call.
Let’s get started!
Many databases provide the ability to write custom code for handling functionality that isn’t provided out-of-the-box, and Neo4j is no exception. When certain capabilities are not available or are complex in Cypher, users can write their own procedures and functions, package them, and add them as a plugin to the database. Things like the APOC library, GDS, and more were built this way!
However, traditionally, these extensions are written in a JVM-based (Java virtual machine) language like Java, Groovy, Scala, etc. With GraalVM, we are granted a shared polyglot environment that allows a Java-based source (like Neo4j) to understand a non-JVM target (like Python). GraalVM translates various languages that have been implemented with their Truffle language framework, allowing interaction between languages.
First, if you don’t already have it, we will need Neo4j. Unfortunately, Neo4j Desktop and Sandbox both have pre-defined environments that make it difficult to run a GraalVM environment in conjunction, so the easiest approach is to download Neo4j server community edition for this example.
Next, if you installed GraalVM and accompanying languages with the last blog post, feel free to skip to the next section. Otherwise, we will walk through the steps again here.
GraalVM is another JDK (Java Development Kit) install, which is a bundle of tools for developing Java applications. If you’re familiar with this, feel free to use however you are most comfortable to handle java versions and JDKs. If you’re new to JDKs, an article I found explains the components of the Java environment. For managing all the options on my machine, I really like using SDKMAN!. It automatically syncs classpaths and seamlessly allows me to change versions and providers with a command or two. The commands to install the GraalVM JDK with SDKMAN! are listed below.
#List available Java vendors and versions in SDKMAN!
% sdk list java
#Install one for GraalVM (my current version)
% sdk install java 20.3.0.r11-grl
#Switch Java versions
% sdk use java 20.3.0.r11-grl
#(optional) Set it as the default JDK for your system
% sdk default java 20.3.0.r11-grl
#Verify Java version on your system (and results for my environment)
% java -version
openjdk version “11.0.9” 2020–10–20
OpenJDK Runtime Environment GraalVM CE 20.3.0 (build 11.0.9+10-jvmci-20.3-b06)
OpenJDK 64-Bit Server VM GraalVM CE 20.3.0 (build 11.0.9+10-jvmci-20.3-b06, mixed mode, sharing)
Note: when you install a version of Java, it may prompt you to set it as default in the install. However, if it doesn’t or you choose to set it as default later, I included the command to do that.
Ok, those are the base requirements to install — GraalVM and Neo4j. There are a couple of other setup needs to run various languages with that. Though you can use standard language environments, I’ve opted for the built-in GraalVM languages, as I assume those have less setup overhead. To install each of the GraalVM-supported languages, we can use the GraalVM Updater (gu) tool. Commands for using gu to install each language are shown below.
#See what’s there already
gu list
#Python
gu install python
#Javascript (included)
#R
gu install r
#Ruby
gu install ruby
Note: gu is included in the base install of GraalVM. If you haven’t installed any other languages before you run the gu list command shown first in the code block above, you may notice that a couple of things are already there. That’s because these are built into the GraalVM general install.
For the R install, there are a couple other dependencies listed in the documentation that are needed. My Mac already had these installed on my system, but depending on your operating system and version, you might want to verify them.
With Ruby, there are a couple of extra dependencies that need to be installed, as well. Most of these were already installed on my Mac, but you can verify these for your operating system and version. After those are complete, the first command in the code block below runs a script to connect openssl and libssl.
I also had some issues with the recommendation to use a Ruby manager. It moved the path around where I couldn’t execute Ruby. I ended up uninstalling my Ruby manager and remapping TruffleRuby. In the end, the two commands below should help you see if your environment looks similar to mine. Note that SDKMAN! is in my path for TruffleRuby.
#After installing deps, make the Ruby openssl C extension work with your system libssl)
<path to your GraalVM JDK>/languages/ruby/lib/truffle/post_install_hook.sh
% truffleruby -v
truffleruby 20.3.0, like ruby 2.6.6, GraalVM CE Native [x86_64-darwin]
% which truffleruby
/Users/jenniferreif/.sdkman/candidates/java/current/bin/truffleruby
You can check that all the desired languages are installed by running the gu list command again to see all the languages you now have. Let’s start Neo4j with the command bin/neo4j start, and finally, we are ready to get our project up and running!
I have used Maven for this project, but you can use Gradle or something else, if you prefer. The dependencies are pretty straightforward, as we just need to include Neo4j and the GraalVM SDK in the pom.xml file. We also have a couple of interesting additions in the build section of the pom, so let’s look at those.
<build>
<plugins>
<plugin>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<phase>prepare-package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.2</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
If you read the previous blog post, you may remember the first dependency plugin. This packages up the dependencies during the prepare-package phase of the build and drops them into the /target/lib folder of our project.
The second dependency is not one we had in the last example. It generates a jar file with our custom code, along with dependencies specified with <scope>compile</scope>. Before we get to packaging, though, let’s look at the rest of the code!
For the polyglot procedures, the project provides the code for you and others to write procedures and functions in other languages and run them via the provided procedure (polyglot.run). We could even expand or adjust this to run entire scripts in other languages, as shown in this article.
The src/main/java folder houses a polyglotProcedures folder containing two Java programs. One is the PolyglotUtil class that defines the polyglot.run procedure, and the other is the TypeConverter (absconded from David Allen’s similar project). We will start with the PolyglotUtil, then walk through the TypeConverter.
public class PolyglotUtil {
@Context
public GraphDatabaseService db;
@Context
public Log log;
…
}
The PolyglotUtil class starts by defining the context for connecting to our Neo4j graph database and sets up logging. Next, we are ready to define any procedures and functions we want to be able to call.
@Procedure(value = “polyglot.run”)
@Description(“polyglot.run(language, code) — Executes the code given. Throws things otherwise.”)
public Stream<Output> execute(@Name(“language”) String language, @Name(“code”) String code) throws IOException {
try (var context = org.graalvm.polyglot.Context.newBuilder().allowAllAccess(true).build()) {
var bindings = context.getPolyglotBindings();
bindings.putMember(“db”, db);
Value v = context.eval(language, code);
log.info(“Check value equals “ + v);
Object result = convert(v);
//Map these to a generic output as a type hack around the uncertainty of what comes back
//Neo4j procs require a stream of concrete types
return Stream.of(new Output(result));
} catch (Exception exc) {
exc.printStackTrace();
throw exc;
}
}
Here, I have set up a single procedure called polyglot.run() that takes 2 parameters — 1 for the language of the code we want to run, 1 for the actual code we want to execute. First, we need to label it as a procedure (@Procedure annotation) and define the documentation for it (@Description annotation). We need the return type to be a Stream of an Object type (in this case Output) because Cypher needs a stream of a generic object type to process and return results. We’ll cover more on this in a minute when we get to the next code block.
Inside the procedure call, we will put all of our code in a try/catch block that ensures GraalVM can build context for the rest of our code. Within the try, we get the polyglot bindings from the context and put our database (Neo4j) in that. This allows everything to communicate. The next line of code defines a variable v that’s a Value type, which is a GraalVM type that allows us to translate between various language’s data types. We set the variable equal to the evaluation of the code passed into the procedure (language and code string itself). The log statement following simply outputs the variable to check that the value matches our expectations.
Next, we try to convert the value variable to an Object type called result. Our convert() method is in the TypeConverter class that we’ll discuss in a minute, so more on this shortly. Then, the last line in the try block maps the result to a stream of Output, which is explained in the comment and defined in the code below. Lastly, we do a catch to grab any exceptions and give the stacktrace.
public class Output {
public Object result;
public Output(Object thingy) {
result = thingy;
}
}
Cypher expects procedures to return a stream of concrete types, so we need to define a concrete type that contains our generic object coming from the GraalVM code execution. Because the return from our GraalVM code execution could be a variety of things — Java Decimal, Python dict, Javascript hash, etc., we need to accept all of those and yet convert them to a concrete type that Cypher can expect.
Cypher will actually specify this with a helpful error message that you can see if you comment out the Output class and tweak the return on the procedure to return a Stream of result, instead of Output. When you execute, Cypher should show an error that it expects return results of a specific type and even recommends a class of Output to solve the problem (exactly what we used here).
Ok, now to our TypeConverter class! This class is taking our GraalVM Value variable and ensuring that the data type is something Neo4j can understand.
public class TypeConverter {
public static final List<String> stringList = Arrays.asList(“class”, “constructor”, “caller”, “prototype”, “__proto__”);
public static Object convert(Value v) {
if (v == null || v.isNull()) return null;
if (v.isProxyObject()) {
System.err.println(“Warning: proxy objects are not yet supported from guest languages for neo4j serialization”);
return null;
}
if (v.isHostObject()) {
return v.asHostObject();
}
Set<String> memberKeys = v.getMemberKeys();
if (!memberKeys.isEmpty()) {
Map<String,Object> result = new HashMap<>();
for(String key : memberKeys) {
if (!stringList.contains(key) &&
!v.getMember(key).canExecute()) {
System.out.println(“Recursing on “ + key);
result.put(key, convert(v.getMember(key)));
}
}
return result;
}
if (v.isBoolean()) return v.asBoolean();
if (v.isNumber()) return v.asDouble();
if (v.isString()) return v.asString();
System.err.println(“Unsupported guest language values cannot be mapped, and will be returned as null”);
return null;
}
}
In the above code, we have a series of if statements that check if the GraalVM Value is an Object (null, proxy, host, or map), boolean, number, or string and tries to convert it to the Java type of that.
Now it’s time to deploy our code and test it out!
In your preferred IDE or on the command line, build your project (or a clone of this one), followed by the mvn clean package command. This packages everything up and drops a .jar file of our project into the target directory. Copy the created .jar into the /plugins folder of your Neo4j database. You will need to find where you installed Neo4j, and the plugins folder should be in there.
If Neo4j was already running, we will need to restart it (bin/neo4j restart). Otherwise, we can start the database with bin/neo4j start. After a few seconds, it should start, but I like to verify by opening another command line window, going to the Neo4j directory, then cd logs and run tail -f neo4j.log to check nothing gives an error code. The place where I run the start command doesn’t always show a clear error/shutdown message when things go wrong, so viewing the log file makes this a little more visible.
Now we can access Neo4j Browser by opening a web browser and going to localhost:7474. If you are not familiar with Neo4j Browser, then the top input bar is where we type and run Cypher queries and procedures. In that command line, execute the procedure with CALL polyglot.run(arg1, arg2). You can run a variety of languages and code for the arguments, but I have included a few examples below to get you started.
//returns the string “hello” in a result pane
CALL polyglot.run(‘js’, ‘“hello”’)
//prints “hello” to neo4j.log output
CALL polyglot.run(‘js’,’print(“hello”)’)
//executes the math and returns the result
CALL polyglot.run(‘python’, ’CALL polyglot.run(‘python’, ‘import math; totalEntities = 3000; callsNeeded = int(math.ceil(totalEntities / 100)); callsNeeded’);’)
Languages that the first parameter accepts include anything implemented with GraalVM Truffle, including llvm, R, js, python, ruby.
You can use this project as-is to run a variety of custom procedures, functions, or code snippets in the Cypher call. Or, you can use this as a template to build your own procedures and functions that accept script files or write additional code in other languages. Feel free to try it out and let us know what you need or would like to see in this project!
As with the previous GraalVM project, we are looking for feedback to better understand what is needed or used in this project area. We’d be happy to hear from you either via Github (liking the project or creating issues/feature requests) or via the Neo4j Community Site (getting help or letting us know what you like/dislike). Happy coding!
Language Buffet — Using Neo4j with GraalVM, part 2 was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
All the sessions of the conference are now available online