Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-16-2020 02:19 AM
Hi,
On my app I have data of 5 million users. I load it into Neo4j as a graph.
Also, on Spark I processed the data, and for each user I need to query Neo4j.
I did it using a spark UDF, that http calls Neo4j server.
It took too long and get connection errors.
What is the better way to do 5M queries to neo4j?
04-16-2020 03:28 PM
The better way to do 5 million queries is to not do 5 million queries to neo4j. :).
The better way would be to use something like the neo4j spark connector. Use 1 cypher query to pull all of the data you need from Neo4j into a single DataFrame, and then use standard Spark SQL to join that resulting dataframe to the data that you have.
That's 1 big query pulling 5 million results, which you can then further partition and join in spark.
04-20-2020 02:28 AM
Thanks for you response.
Now I need to figure how to craft the query that ask 5M questions with getting out of memory 🙂 . Will open new topic if needed.
All the sessions of the conference are now available online