Neo4j

marvin-hansen · ‎11-24-2019

Hi,

I am building an MVP on top of the GRAND Stack and as long as we deal with neo4J / GraphQL, things were relatively easy and we already manage > 100 relations between several dozens of nodes. The central data schema gradually approaches 1K LoC and, so far, it runs surprisingly stable and hassle-free in development and testing.

There are just two REST API calls, both POST, and to integrate them, I did the following:

Replicated the JSON spec's as GraphQL Schema
Wrote a RESTData Source with the required authentication headers
Wrote custom resolver
Added custom resolver to as parameter to makeAugmentedSchema
Added RestDataSource to Apollo Server

However, when I run the query, I get a proper JSON / REST response logged to the console but the GraphQL return type remains null(!) means for some strange reason there is no matching between the REST return type and the GraphQL type. In Playground, a non-nullable error is thrown, which is kinda expected when trying to return something from a null entity. However, I just cannot figure out how exactly to match the REST return to the GraphQL type.

I made a striped down sandbox for reproducing the issue. Any help or advice of how to integrate these two REST endpoints is most appreciated because I am totally stuck:-(

Also, is there a better way to do REST integration?

Thank you

https://codesandbox.io/s/neo4j-graphql-js-custom-resolvers-example-xe1yf?fontsize=14&hidenavigation=...

type or paste code here

marvin-hansen · ‎11-25-2019

[SOLVED ]

Essentially, the REST return string needed to be parsed back as JSON.

return JSON.parse(policy);

That said, integrating REST with GRANDStack needs as a lot of improvements.

MuddyBootsCode · ‎12-03-2019

I hope that this doesn't come across wrong or anything but the whole point of the GRANDstack is that you don't use REST at all. Now if you wanted to make a RRANDstack I'm sure there would be a lot of support for that. For some small projects I've played with we've found quite a bit of success just using the neo4j-javascript driver to deal with the few things that we're REST oriented. But the main focus is the synergy between GraphQL and using a native graph DB.

marvin-hansen · ‎12-03-2019

With all due respect Sir,

I am using internally 100% GraphQL & GraphDB. The underlying issue of the post was entirely related to integrating an external legacy REST service from a business partner. Because we do not want to deal with REST or SOAP in our internal projects let alone our partner developers, it is my job to integrate all external services and expose them through one single unified business graph API.

After having developed a fair codebase with the GRANDstack, I am increasingly underwhelmed by its lack of flexibility beyond CRUD operations. I don't believe it will survive the test of time.

MuddyBootsCode · ‎12-03-2019

That’s an interesting point. I don’t work for Neo4j or anything and I’ve also done quite a bit with the GRANDstack. I’m curious what all do you find lacking? I’m wondering what limitations others have found in it as well as some I’ve run across. This might be a good thread to share some of that info in. Maybe get a broader dialogue started?

marvin-hansen · ‎12-03-2019

Grandstack lacks:

Proper documentation for system integration. My next avenue is SOAP exposure as GraphQL...
Proper debugging
Customizable generator

Beyond that:

Interfaces & type inheritance is completely broken and results in massive code duplication
Seeding data breaks with very few seedfiles, I just re-wrote it to use async/await
Appolo Schema Federation support, well, broken for now

The last point inflicts some significant pain as my master schema keeps growing and service modularity will eventually become a hard requirment but I guess, by then, the GrandStack has been retired.

All that would almost be excusable if and only if there would be no JavaScript. I wasted so much time debugging internals of the stack just to find a missing configuration in Babel. JS is just way too fragile for anything other than browser scripting.

For operations, the GRANDStack in its current form is a nightmare, at best. However, I believe the idea in itself is truly transformative with proper execution in a proper statically checked programming language.

The idea of a schema first development system with a statically typed language that has strong and stable tooling would make the entire idea a complete no-brainer.

I can only think of Go-lang at this point because Swift, while being a good and modern language, lacks the server-side tooling required for serious operations. Luckily, excellent GraphQL support exists for Go, so it all comes down to build a Cypher Query generator to get the best of both worlds, that is, schema first development with a statically typed language and the flexibility of statically generate all query / mutation scripts for neo4j with the option of customization.

A GoGraphQL stack with Go and gqlgen for neo4j would certainly elevate the entire idea from a "nice but PoC at most" project to a dead serious contender for serious mission-critical projects.

https://gqlgen.com/feature-comparison/

MuddyBootsCode · ‎12-04-2019

Thanks for sharing your thoughts. I'll have to look more into Go. I haven't had the same issues with JavasScript that you have but my use cases are pretty simple. If/when I start to experience some growing pains I'll definitely explore this route.

marvin-hansen · ‎12-04-2019

Thanks,

Just a quick question, have you dealt with extracting fields from one graphQL query and use these fields as input for another query before returning the result to the client?
By any chance?

This one really bugs me because I simply cannot figure out whether I am too stupid or whether that task is just unreasonable cumbersome in the GRANDStack.

Details:

imkleats · ‎12-05-2019

First of all, thanks for your thoughts @marvin-hansen! You've said just about everything I have been thinking for a little while now, especially with respect to the customization of the generators and Go as an apt ecosystem for developing that.

It might be a little hackish to implement, but with regard to your problem and the limited extent you'd have to do it, I might suggest that you can use the custom cypher directive to call the REST API you need through APOC library functions:

WITH "https://random.site/api/resource/$id" as url
CALL apoc.load.jsonParams(url, {`Method`: 'Post', `Accept`: 'application/json', `Other-Headers`: $context.variables}, <payload>) yield value AS res
CALL apoc.create.vNode(['ResourceType'], res) YIELD node AS resNode
RETURN resNode

If there are any relationships between your REST data and the data housed in Neo4j, this approach would allow you to mock up synthetic relationships as well. The concept also works for direct database-to-database connections or even other GraphQL endpoints.

Maybe this helps, maybe not, but good luck in any case!

marvin-hansen · ‎12-06-2019

Thank you @imkleats,

that APOC function is pretty damn cool. IMHO, Apoc is one of the three top reasons to use neo4j.

That said, I already done with the REST integration at this point. I have two more SOAP legacy services to integrate in my core system, so let's see how that goes. GraphQL in itself remains invaluable for heterogeneous online system integration. It just needs proper middleware.

Speaking of that middleware issue, I had and still have to battle certain issue:

Weaving functions and resolver together
Enforcing strict order of function calling
Maintaining a modular & re-usable code base b/c mine is growing way faster than I wish it would

Addressing these issues, the guys from prisma labs came up with some solid solution I want to share.

https://github.com/prisma-labs/graphql-middleware

The main driver of those core issues come from the following applied best practices:

Implement (public) business logic as resolvers and expose as GraphQL API
Implement workflows across the API as resolvers and expose the workfflow entry as GraphQL API
Integrate all external (non-graph-QL) webservices as GraphQL and abstract them away as workflows interwoven with the remaining API

Obviously, on paper, the GRAND stack seems like a good idea because you get all the CRUD operations generated while still manually overriding custom queries & mutations and link them to custom resolvers.

On paper.

Three weeks in, I sincerely regret my decision. I just wish neo4j would actually build their own projects with the GrandStack to get an idea how painful it really is.

When you don' eat your own dog-food, how can you expect anyone else to do so?

That said, once this project is over, the first and foremost top priority follow up project
will be a proper Go based GraphQL software infrastructure that replaces the GrandStack entirely with something we can rely on in operations.

Lesson learned.

imkleats · ‎12-06-2019

I'm not sure it's fair to criticize the neo4j team quite so harshly :-p, if only because it diminishes the effort their folks, like William Lyon, have put in to build the starter projects/demos and as an evangelist for the stack (all while continuing to support additional features/baseline functionality). That being said, it's hard to argue that it wouldn't be nice to have some more documentation/examples for advanced use cases. At some point, the community needs to take the reins, but I think the foundation might need to be a little stronger before that can happen.

This might be an incorrect impression, but it looks like a lot of the architecture behind the package might have been inspired by Join-Monster's approach to SQL transpilation. One of the issues with this is that an AST for an SQL query is quite a bit different in structure from the GraphQL query AST, which requires more bookkeeping in a recursive depth-first traversal. This bookkeeping has been done through arguments in the recursive function calls.

When you stop to think about the Cypher AST, it is practically in the exact same shape as the GraphQL query AST already. Moreover, because of this similarity, when you traverse the GraphQL query AST with visit (or visitWithTypeInfo really), you have access to all information you need to construct the Cypher AST node the moment you enter the node (i.e the visit function had already abstracted out the bookkeeping) and can simply use whatever construct floats your boat to await the results of nested children nodes (i.e. Promise, rxjs Observable, Channels in Go).

I think the Neo4j team is aware of some of the limitations around extensibility and customization that the current architecture causes. Whether they pursue a visitor pattern like I've described above to address it, I do not know, but I'd be happy to work with you on your Go port. I was actually thinking of doing it independently on my own anyway. Let me know.

marvin-hansen · ‎12-10-2019

Thank you @imkleats,

Yes, you are right, the neo4j team does the best they can with that approach, and I openly admit, that most of my recent frustration really comes form Javascript. I used to program in Scala for some time and because it's compiled & statically checked with a pretty smart compiler, you get deployment-ready code from the get-go. For a number of reasons, the JVM is not an option anymore even though I dealy miss writting Scala.

Moving forward, I was thinking about your proposed visitor pattern and had the following considerations:

IMHO, transpilation is the root of all evil. If you cannot fix bugs in the compiler, debugging is doomed. Take it or leave it, but the most pressing pain in JS I had to deal with recently ended up with the transpiler. Source to source re-write is just so fundamentally wrong. I know I am extremely opinionated here, and everyone is free to disagree. However, in Scala, all re-writes were handled in the Compiler by macros for the very specific reason that only that way they could ensure proper AST traversal and emitting proper Bytecode.
In Go, static file generation during the pre-compiler stage nicely integrates through language support to generate code. If you need Go-Bindings to generated files, you just generated & compile the matching bindings...

https://blog.golang.org/generate

Isomorphism simplifies integration. I did the same observation about the similarities between the GraphQL and GraphQL AST.

Honestly, I don't think the neo4j team can do any meaningful re-write of the GRANDstack in Go let alone implementing the visitor pattern anytime soon because well, they, have a job to do. Also, asking for that would be too much of a stretch.

Looking beyond all that, I had recently a fairly radical question:

Why do we need middleware?

We have data, that are graphs, and we have API's, that are graphs.

Expressing one business domain after the other as a sub-graph essentially leaves us with

Data
Structure
Relations

To make any business domain work, we need, in its purest essence:

Data
Structure
Relations
Processes

With neo4j, we get 1 - 3, but cannot do proper processes across domains. Likewise, with GraphQL, we get 1 - 3 out of the box, but would need to model processes either on the client or on, well, a middleware sitting between the graph and the client.

So what's a process?

A process is a series of steps, with a beginning, a series of steps of which some require sequential execution, other steps can be executed in parallel, and an end.

In its purest essence, a process is a direct acyclic graph, a DAG.

Why, then, don't we store the process in neo4j and expose the process as GraphQL endpoint.

Why, then, we don't write a very thin Go-layer that loads a process through GraphQL from neo4j, executes it according to its structure, and returns the process result again through GraphQL?

Why, then, don't we model Data & relations as GraphQL schema, and then, model work-flows as graphs to connect the different data from the graph?

Why then, don't we just call the process GraphQL with a handful of ID's, let it run the process, and execute it for us?

Why don't we use graph isomorphism in its purest and strictest form to bring a very strong and sound foundation to build stable and reliable software for the 21 Century at the speed of thought?

Why?

My humble questions to you are the following:

Do you think it is possible to write a minimal proof of concept in Go that traverses a GraphQL AST, applies the visitor pattern to construct the matching Cypher Query and executes that query?
Do you think a structurally similar approach can be done for loading a process graph from the DB, for which you actually need step 1 to construct the query from the GraphQL schema before executing that query to get a process graph from the DB?
How would you architect a very thin unified data and process layer to build robust systems that concentrate all effort on data & process design while leaving query and execution to the layer?
What would you say are the first critical steps to sustain successful implementation?
and what would you need to get the first critical steps started?

imkleats · ‎12-10-2019

I'm glad my thoughts sparked something for you. To be completely honest, I'm probably out of my depths in speaking to some of these things. With no formal CS background or work history (undergrad studies in philosophy & economics and graduate studies in economics), I just do my best to solve interesting logic problems by thinking in graphs. I don't know enough to know all the reasons why people smarter or more knowledgeable than myself haven't taken these routes that you've touched on above and that I've been seeing as possible too.

As for your questions:

Yes. I've been thinking about this particular problem for a while now and feel pretty confident that this is feasible. If you think of each (sub)component of the cypher query as its own struct that implements the Stringer interface, the visitor could spin up a go routine on AST Node entry to populate its fields and open one or more channels for child nodes to send dependent fields back on.
This seems very straightforward assuming #1 is implemented elegantly. If you haven't checked out Dgraph at all, it uses a GraphQL-like language for directly querying the database (called GraphQL+/-). One thing it allows is one or more blocks in the query to define variables. I'm thinking this is a useful way to provide a connection between #1 and #2.
This is where my lack of formal training might limit me and apologies if I'm misunderstanding the question. My initial thought is that the process graph nodes/relationships could be abstractions for gRPCs that communicate with other microservices (however, I don't recall seeing much if any gRPC support built into Neo4j Core or APOC).
I have no idea. No freaking clue.
Time? Lol, I've got my day job (public policy, not software engineering), a wife, and three kids, so it's about all I can manage to find an hour or so each evening and a few more hours on the weekend to do this fun stuff.

marvin-hansen · ‎12-10-2019

@imkleats

Thank you,
good to know that even the Philosopher has a hard time answering why so many aren't doing things some seeing as possible. I should have mentioned that I do have a certain formal training in CS, but for the most part, my day work requires me to solve interesting but quite hard problems. That said, none of that discussed here is part of formal CS training, and, please bear in mind, Stephe Jobs never got a CS degree either.

Innovation only needs a truly sparkling idea so precious that it's worth building.

And here it is:

Dgraph already comes with native GraphQL support, means, upload your GraphQL Schema and query it with their GraphQL+/- query language.

https://graphql.dgraph.io/

Addressing your two key points, about architecture and time

Time:
There are just three ingredients needed:
A) GraphQL schema that describes a process (graph) deployed to DGraphDB.
B) Process execution micro-service
C) Apollo server with reasonable generic custom resolver to map processes to GraphQL endpoints.
In worst case, the resolvers can be written manually b/c that really isn't that hard and allows fine-grained customization.

Architecture:

Process execution service: A simple Go microservice that does the following:

A query to retrieve a process, say by ID
Takes the process, and calls the functions defined in the process in the defined order.
The functions, then again, can be simple go microservices with gRPC.

It is essential to register the process execution service with an (go) Apollo service to access the process from the exposed GraphQL API. This can be either with a generic resolver, or, as mentioned before with manual written resolvers in case that works. I think, it should be feasible to write a generic resolver that takes a function name as parameter with n arguments, but I am not 100% sure about returning a variable return type. I have never looked so deep into customer resolvers before.

With that, we're essentially done.

Simple?

In case a feature-complete MVP can be accomplished, I am willing to replace neo4J & GRANDStack in my GKE cluster and do the first real deployment.

I can certainly contribute a GraphQL schema that models business processes as much graph-like as possible with GraphQL and I can certainly contribute a DGraphDB server, an Apollo server on my GKE, as well as CI/CD with auto-deployment to GKE. I just build the entire pipeline three weeks ago so I can piggyback this project on the existing infrastructure. With some fiddling, I can write a query generator, if it helps.

However, I have a hard time writing a microservice in Go b/c I am short on time to learn another programming language that is not used in my daywork.

Do you think you can contribute some go-code within your time constraints?

imkleats · ‎12-10-2019

The microservices don't have to be written in Golang to leverage the power of gRPC; you can write gRPC servers and clients in any number of languages that use a protobuf defined through IDL. I'd gladly contribute to something if it'd be helpful, though.

What you've laid out might be the simplest approach for getting an MVP up quickly, but once you look under the hood in Dgraph, I think you could come to the conclusion that you don't need to include Apollo at all. This is a doozy, so apologies for the lack of brevity, but I hope it's ultimately clear what I mean.

Dgraph stores its data as RDF triples and partitions the data across Alpha Nodes (their method of horizontal scalability) based on predicate, not object. As Dgraph traverses the full query, each successive subgraph is queried via gRPC from the appropriate Alpha Node. I've been toying with the notion of modifying the subgraph routing algorithm so that, when the query parser comes across a predicate that is flagged on the schema as representing some external resource (i.e. not data stored in the database), the subgraph gRPC is passed on to some user-defined network address that behaves exactly as it would expect an Alpha Node to for Dgraph's purposes (in terms of a conforming gRPC response) but which is actually some other microservice.

In any case, it's clear that this design pattern isn't necessarily tied to any single system. Just now, we've worked through how it could be implemented through Neo4j (with something akin to neo4j-graphql-js & the appropriate tooling) or through Dgraph, maybe even any other native graph database.

marvin-hansen · ‎12-12-2019

@imkleats I need to think about this in more depth and will come back to you in a few days.

Correct me if I am wrong, essentially you are saying this:

Send a query to a graph DB
As the query gets traversed, each successive subgraph is queried via gRPC (data parallelism)
When the starting predicate of a sub-graph is labeled as "external" (i.e. service), the subgraph gRPC is passed on to an external URI, say microservice, and returns whatever is specified there.

In essence, you unify data & computation in one single query because

Data schema is in the graph, modeled as GraphQL
Service interfaces is in the graph, again modeled as GraphQL
A query goes either against data, a service interface routed to a service or an entire process exposed as service and routed to that service endpoint

Is that what you trying to convey?

Also, do you have some sample code of a modified subgraph routing algorithm or could sketch some?

As I said, I need to think more deeply about this will come back in a few days.

imkleats · ‎12-12-2019

I think you've summarized it correctly. I've gotten close to tracing out some entry points that might accomplish it, but it's a pretty intimidating codebase. It could take a while to change and test, and who knows what the project designers opinion of those modifications would be. That's why I'd say getting an MVP up to prove the concept is more important than immediately going down this rabbit hole.

marvin-hansen · ‎12-12-2019

Honestly,

I think its brilliant. Would do a fork of the project, keep digging into the entry points,
hack a proof of concept, expand, and then ask the project designers for help on preparing a proper Pull Request that aligns with their contributing guidelines.

The original question: Why do we need middle-ware?

With something like that, we don't, and the value of a unified data & process graph is beyond obvious.

If you don't mind, please setup a GH repo with a fork so we can collaborate
on modding the actual code base there.

I think this is a complete game changer in terms of developer productivity b/c you eradicate so much boilerplate, overhead and unnecessary frameworks and instead replace it with a unified data & procss graph.

Again, I need to drill deeper into this one as everything comes down to execution quality and getting the details right from the get go.

marvin-hansen · ‎12-17-2019

@imkleats an important update, I got very recently a new project that requires me to build an entire block-chain infrastructure to enable the implementation of new business requirements, means I have to stop exploring this avenue because my work with the GrandStack is going to conclude before the end of the year. That said, keep that idea rolling b/c there is a lot to it!

Neo4j

Has anyone integrated a REST API with GRANDStack? Struggling to fetch from legacy REST API