cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Spark UDF calling neo4j

Hi,

On my app I have data of 5 million users. I load it into Neo4j as a graph.

Also, on Spark I processed the data, and for each user I need to query Neo4j.

I did it using a spark UDF, that http calls Neo4j server.

It took too long and get connection errors.

What is the better way to do 5M queries to neo4j?

2 REPLIES 2

The better way to do 5 million queries is to not do 5 million queries to neo4j. :).

The better way would be to use something like the neo4j spark connector. Use 1 cypher query to pull all of the data you need from Neo4j into a single DataFrame, and then use standard Spark SQL to join that resulting dataframe to the data that you have.

That's 1 big query pulling 5 million results, which you can then further partition and join in spark.

Thanks for you response.

Now I need to figure how to craft the query that ask 5M questions with getting out of memory 🙂 . Will open new topic if needed.