cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Connect Spark with Neo4j to transform JSONs into Graphs

Hello Neo4j Comunity,

My goal is to connect MongoDB, Apache Spark and Neo4j using Java. So far, the connection between the first two has been achieved but the connection between Spark and Neo4j hasn't ocurred. The only documentation I have found is written in Scala (https://neo4j.com/developer/apache-spark/).

I know how I could load a single JSON and transform it into a Graph thanks to a previous question in this forums, but so far the connectivity between Spark & Neo4j is missing.

My code so far:

package spark;

import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;
import org.bson.Document;

import com.mongodb.spark.MongoSpark;
import com.mongodb.spark.rdd.api.java.JavaMongoRDD;

import org.neo4j.spark.Neo4j;

public class Spark {

	public static void createRdd() {
		
		SparkSession spark = SparkSession.builder()
				.master("local")
				.appName("MongoSparkConnector")
				.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/TheFoodPlanner.join")			
				.getOrCreate();
			
			JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
			
			JavaMongoRDD<Document> rddRecipes = MongoSpark.load(jsc);
			
			SparkContext sc = new SparkContext();
			
			Connector n = Neo4j(sc);
			
			jsc.close();
		
	}
}

where the line Connector n = Neo4j(sc) fails to work. Probably because the import import org.neo4j.spark.Neo4jdoesn't work either.

Thanks you very much for the time spent reading.

PD: My dependencies are the following:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.1.0</version>
</dependency>
 
<dependency>
<groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.8</version>
</dependency>
 
<dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-xml_2.11.0-M4</artifactId>
    <version>1.0-RC1</version>
</dependency>
<dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-kernel</artifactId>
    <version>4.1.1</version>
</dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <dependency>
  		<groupId>org.mongodb</groupId>
  		<artifactId>mongo-java-driver</artifactId>
  		<version>3.12.7</version>
  	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
	<dependency>
    	<groupId>org.apache.spark</groupId>
    	<artifactId>spark-sql_2.11</artifactId>
    	<version>2.4.2</version>
	</dependency>
	
  	<dependency>
	    <groupId>log4j</groupId>
	    <artifactId>log4j</artifactId>
	    <version>1.2.17</version>
    </dependency>
  <dependency>
    <groupId>org.mongodb.spark</groupId>
    <artifactId>mongo-spark-connector_2.11</artifactId>
    <version>2.4.2</version>
  </dependency>
  <dependency>
    <groupId>org.mongodb</groupId>
    	<artifactId>mongodb-driver-core</artifactId>
    	<version>4.1.0</version>
	</dependency>
  <dependency>
    <groupId>org.json</groupId>
    <artifactId>json</artifactId>
    <version>20180130</version>
	</dependency>
	<dependency>
		<groupId>com.google.code.gson</groupId>
		<artifactId>gson</artifactId>
		<version>2.2.2</version>
	</dependency>
	<dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>bson</artifactId>
            <version>3.4.1</version>
        </dependency>
2 REPLIES 2

We are working on an updated spark connector that uses the datasource APIs. A pre-release is going to be available September 30th 2020, and it will support pyspark. If this is something you're interested in trying out, let me know.

Following up on this thread - the new work, which includes polyglot support & python, can be found here: