Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-10-2020 08:01 AM
Hello Neo4j Comunity,
My goal is to connect MongoDB, Apache Spark and Neo4j using Java. So far, the connection between the first two has been achieved but the connection between Spark and Neo4j hasn't ocurred. The only documentation I have found is written in Scala (https://neo4j.com/developer/apache-spark/).
I know how I could load a single JSON and transform it into a Graph thanks to a previous question in this forums, but so far the connectivity between Spark & Neo4j is missing.
My code so far:
package spark;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;
import com.mongodb.spark.MongoSpark;
import com.mongodb.spark.rdd.api.java.JavaMongoRDD;
import org.neo4j.spark.Neo4j;
public class Spark {
public static void createRdd() {
SparkSession spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnector")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/TheFoodPlanner.join")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaMongoRDD<Document> rddRecipes = MongoSpark.load(jsc);
SparkContext sc = new SparkContext();
Connector n = Neo4j(sc);
jsc.close();
}
}
where the line Connector n = Neo4j(sc)
fails to work. Probably because the import import org.neo4j.spark.Neo4j
doesn't work either.
Thanks you very much for the time spent reading.
PD: My dependencies are the following:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-xml_2.11.0-M4</artifactId>
<version>1.0-RC1</version>
</dependency>
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-kernel</artifactId>
<version>4.1.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.12.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.2</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.4.2</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-core</artifactId>
<version>4.1.0</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20180130</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.2.2</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>bson</artifactId>
<version>3.4.1</version>
</dependency>
09-28-2020 04:41 AM
We are working on an updated spark connector that uses the datasource APIs. A pre-release is going to be available September 30th 2020, and it will support pyspark. If this is something you're interested in trying out, let me know.
11-12-2020 04:29 AM
Following up on this thread - the new work, which includes polyglot support & python, can be found here:
All the sessions of the conference are now available online