Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-04-2021 01:10 PM
Hi,
I'm trying to read nodes from my local neo4jdb for practice purposes by using pyspark and neo4j connector. I've already downloaded the last version of neo4j-connector-apache-spark (2.12) and integrated it in pyspark as explained in the repo [GitHub - neo4j-contrib/neo4j-spark-connector: Neo4j Connector for Apache Spark, which provides bi-di...] at README.
However when I try to perform a read using:
spark.read.format("org.neo4j.spark.DataSource") \
.option("url", "bolt://localhost:7687") \
.option("authentication.basic.username", "neo4j") \
.option("authentication.basic.password", "psw") \
.option("labels", "Person") \
.load() \
.show()
I get the following error:
Blockquote
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Blockquote
I think it could be related the format string "org.neo4j.spark.DataSource", but don't know how to fix.
Thanks for your attention,
Justin
01-05-2021 01:08 PM
My first thought. Have you double checked the Spark version (what you are using versus expected?)
Capability varies by Spark version, and major updates have breaking changes.
01-05-2021 02:25 PM
First of all thanks for your help Joel.
As you suggested I checked the used and required versions of Spark:
since I'm using pyspark 3.0.1 doc that runs on scala 2.12, I use neo4j-connector-apache-spark_2.12-4.0.0.jar
according to github
I've even tried to install pyspark 2.4.0 which runs on scala 2.11, in order to try another connector (neo4j-connector-apache-spark_2.11-4.0.0.jar
)
Be that as it may in both cases I'm still getting the same error I report entirely:
Traceback (most recent call last):
File "c:/Users/arman/Desktop/prova/sparkneo4jconn.py", line 14, in <module>
spark.read.format("org.neo4j.spark.DataSource") \
File "C:\Users\arman\Desktop\prova\venv\lib\site-packages\pyspark\sql\readwriter.py", line 184, in load
return self._df(self._jreader.load())
File "C:\Users\arman\Desktop\prova\venv\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
return_value = get_return_value(
File "C:\Users\arman\Desktop\prova\venv\lib\site-packages\pyspark\sql\utils.py", line 128, in deco
return f(*a, **kw)
File "C:\Users\arman\Desktop\prova\venv\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$3(DataSource.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:653)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:733)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:248)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 27 more
01-05-2021 02:37 PM
Sorry to hear that, I can't think of anything else, I'm laser focused on
"java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport"
which seems to strongly suggest an issue with versions and/or pathing. On occasion any given error is a red herring (with Cobol every error is a red herring, but I digress)
I'd pursue this first, to rule it out.
I've been down in the version/pathing abyss (kid friendly word) with spark, and it can be a nightmare.
01-06-2021 02:23 AM
Likely I have made a mistake in adding the connector to pyspark.
Or Am I missing something such as drivers? I suspect I should add jdbc drivers but don't know how to do it
Could you please suggest me any guide or tutorial about how to set up properly pyspark in order to run neo4j connector?
Thanks again
02-02-2021 01:19 AM
@j.armanini did you solve this? Is it still an issue? If so please confirm:
Please lemme know so I'll try to help you.
(btw I'm in the team of the spark connector)
02-05-2021 03:49 AM
@conker84 Actually I pulled the same issue on github a month ago and you already helped me.
Not sure whether you remember it, but it was due to I was using pyspak 3.0.1 which wasn't supported yet at that moment. Thanks
02-09-2021 03:13 AM
Oh I see, btw we're working on Spark 3.0
and we hope to get it ready soon
02-12-2021 07:38 AM
@j.armanini we just merged the PR about Spark 3.0 support if you want you can download a preview version from here:
All the sessions of the conference are now available online