Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-07-2021 01:06 PM
(already asked on SO)
I understand that the previous (deprecated) Neo4j Spark Connector allowed for the generation of Spark Graphs and GraphFrames using the corresponding methods of the org.neo4j.spark.Neo4j class. With the Neo4j class gone, the only examples I found using the new approach of generating DataFrames is based on something like:
spark.read.format("org.neo4j.spark.DataSource")
.option("url", "bolt://localhost:7687")
.option("query", "...")
.load()
How do I get Graphx Graph instances directly using the current Neo4j Connector for Apache Spark? Or would I need to combine separate DataFrames with edges and nodes?
02-09-2021 03:10 AM
Hi @mcsoini you can easily transform your Dataframe into an RDD by invoking df.rdd
i.e. given a graph like this (Person)-[:KNOWS]->(Person)
:
val persons: RDD[(VertexId, (String, String))] = spark.read.format("org.neo4j.spark.DataSource")
.option("url", "bolt://localhost:7687")
.option("labels", ":Person")
.load()
.rdd
.map(row => (row.getAs[Long]("<id>"), (row.getAs[String]("name"), row.getAs[Long]("surname"))))
val knows: RDD[Edge(VertexId, VertexId, String)] = spark.read.format("org.neo4j.spark.DataSource")
.option("url", "bolt://localhost:7687")
.option("relationship.nodes.map", "false")
.option("relationship", "KNOWS")
.option("relationship.source.labels", "Person")
.option("relationship.target.labels", "Person")
.load()
.rdd
.map(row => Edge(row.getAs[Long]("`<source.id>`"), row.getAs[Long]("`<target.id>`"), row.getAs[Long]("`<rel.type>`")))
// and then
val graph = Graph(users, knows)
All the sessions of the conference are now available online