cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Spark Connector only returns empty DataFrames

Hi,

I need help troubleshooting a rather weird error. I have setup a (single instance) Azure VM to run Neo4j following the official documentation to feed data to an Azure Databricks cluster running Spark. I connected to the Neo4j VM via HTTP on port 7474 to populate it with some data. For the Databricks cluster, I installed the connector and followed this documentation, basically just setting the connection address and login credentials as Spark parameters.

When I run a sample query via the spark connector on the Databricks cluster, I can successfully establish a connection - however, it only returns empty data:

%scala
import org.neo4j.spark._
val neo = Neo4j(sc)
# => neo: org.neo4j.spark.Neo4j = org.neo4j.spark.Neo4j@7c444d23
%scala
val rdd = neo.cypher("MATCH (n:Person) RETURN id(n) as id ").loadRowRdd
rdd.count
# => rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = Neo4jRDD partitions Partitions(1,9223372036854775807,9223372036854775807,None) MATCH (n:Person) RETURN id(n) as id  using Map()
# => res1: Long = 0

the same happens for .loadDataFrame, .loadGraphFrame etc:

# => java.lang.RuntimeException: Cannot infer schema-types from empty result, please use loadDataFrame(schema: (String,String)*)

I can confirm that the query should in fact not return an empty DF by connecting to the remote VM from my local Neo4j Desktop and running it there:

Where is my mistake here? Thanks in advance!

(Logs and specs, see below)

1 ACCEPTED SOLUTION

You are using the old driver, which works in a different way and has different versions of spark that it works with and supports.

Please consider having a look at the new Neo4j connector for spark - it's easier to use, more modern, and is under active development https://neo4j.com/developer/spark/

View solution in original post

1 REPLY 1

You are using the old driver, which works in a different way and has different versions of spark that it works with and supports.

Please consider having a look at the new Neo4j connector for spark - it's easier to use, more modern, and is under active development https://neo4j.com/developer/spark/