Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-16-2020 03:34 PM
Greetings,
Apologies if info around this topic exists. I'm using Neo4j Community 3.5.1 w/ apoc-3.5.0.3-all.
Goal
Here is what I am trying to achieve... I want to run a cypher query that outputs a JSON file that is in the format that is consumable/readable by D3.js to display a Sankey Diagram. This site displays the D3 source files. Here is the JSON formatted structure I am hoping apoc can output that D3 requires...
{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}
This is what the properly formatted json file in D3 Sankey will look like...
What I'm trying
I've got a working cypher query that returns Source Target Value, but I don't know how to add to the query so that it first groups all of the "nodes" (and numbers the nodes starting at 0) and then groups all of the "links" as seen above.
With "MATCH (bry:Brewery)-[r:BREWS]->(b:Beer)<-[r2:IS_BARREL_AGED]-(ba:BarrelAged)
With bry as brewery, size(collect(b.name)) as number
Where number > 1
Match (brewery)-[:BREWS]->(b2:Beer)<-[:IS_BARREL_AGED]-(ba2:BarrelAged)
With brewery, ba2, count(b2) as cnt
Return brewery.name as source, ba2.name as target, cnt as value Order by target ASC
UNION All
MATCH (bry:Brewery)-[r:BREWS]->(b:Beer)<-[r2:IS_BARREL_AGED]-(ba:BarrelAged)
With bry as brewery, size(collect(b.name)) as number
Where number > 1
Match (brewery)-[:BREWS]->(b2:Beer)<-[:IS_BARREL_AGED]-(ba2:BarrelAged)
Optional Match (brewery)-[:BREWS]->(b2)-[:IS_A]-(bt:BeerType)
With brewery, ba2, count(b2) as cnt, bt
Return ba2.name as source, bt.name as target, cnt as value Order by source ASC"
as query
CALL apoc.export.json.query(query, "/filelocation/barrelsankey.json")
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
Does anyone have a suggestion as to what I need to do to my query to make its JSON output the same structure required of D3 Sankey?
06-26-2020 02:04 PM
Hi Keith,
I'm a SANKEY fan, so I had to take a quick look at this. I'd like to see the answer too!
My first thought was that there are more than a couple challenges here, and more than a few ways to solve this too. I'd personally sketch out a strategy, break it into pieces then solve each one at at time, finally string it together for a working solution. That still might be best, as I started to pick apart my first thought (virtual nodes/relationships sent to APOC JSON export), I then saw the APOC JSON output
{"nodes(p)":[{"type":"node","id":"0","labels":["....
It looks quite verbose, graph specific and would require post processing in javascript before use, so then I took a look at just generating the JSON directly.
Note, since I don't have your DB I tried to do something generic with one my own DBs that could be run as a starter / demonstration exercise. Just replace the match query clause to find the nodes and relationships you want to send to D3. (and remove the LIMIT 2 of course...)
match p=(a)-[]->(b)
with p limit 2
unwind nodes(p) as n unwind relationships(p) as r
with collect( distinct {node: ID( n), name: n.name}) as nl,
collect( distinct {source: ID(startnode(r)), target: ID(endnode(r)), value: r.score}) as rl
RETURN {nodes: nl, links: rl}
Here is the output
{
"nodes": [
{
"node": 0,
"name": "A"
},
{
"node": 526658,
"name": "B"
},
{
"node": 526657,
"name": "C"
}
],
"links": [
{
"source": 0,
"value": null,
"target": 526658
},
{
"source": 0,
"value": null,
"target": 526657
}
]
}
10-05-2020 01:59 PM
Thank you so much Joel for your time and energy with this!
One issue remains and I am unable to solve it or find information about it... most of the D3 Sankey demos found on the web based on Mike Bostock's version...
...are using a zero-based index for organizing each unique json object and its source to target linkage.
They are looking for the json objects to start with 0 (zero) and iterate up from there for each unique object.
Example:
{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}
While it's true my nodes have both an ID and a Uid property, without a json file organized like this, the Sankey rendering fails. There are versions of Bostock's Sankey out there that are based on a unique name, but I was really hoping to base it on the numbering version. If that's not possible, no worries.
...
Are you using D3 for your Sankeys? If so, how have you solved this unique identifier issue?
10-06-2020 12:12 PM
I've not created SANKEYs from Neo4j, so I was eager to see you do it...
At the moment I'm thinking
Aside from #3, what is required is a cross reference table for the node IDs, applied to both nodes and rel start/end, somewhere in the process... In the past I've oft built tables like this on the fly while iterating over the input list of numbers, look for orig_id entry in table if found use that new_id, if not there add a new entry to the table and use it, repeat until done. If run on the unique node list, it should then be fully populated when used to translate the rel IDs...
orig_id new_id
123 0
432 1
31 2
7 3
9991 4
....
Note: It feels like there ought to be another simple tricky way to do this in cypher, maybe using an obscure apoc function, but I didn't find it, yet....
11-10-2020 11:28 PM
Hi
Iam not sure if what I've done is relevant to this discussion, but I tried something similar and it worked
I am using 3d-force-graph which i think is also based on d3. I found this method on this community to render the graph. Its based largely on the Query as well as some processing in java script where I just create a map of records (all source and target pairs)
The graph is of process dependencies
MATCH (p:Level1Process)<-[r1]-(pl:Level2Process)<-[r2]-(t2:Level3Process)<-[r3]-(t3:Level4Process)<-[r4]-(a:Level5Process)<-[r5]-(a2:Level6Process)
RETURN {id: id(p),label:head(labels(p)),caption : p.title,color: "#fb9a99"} as source1 ,
{id: id(pl),label:head(labels(pl)),caption : pl.title,color: "#1f78b4"} as target1 ,
{id: id(pl),label:head(labels(pl)),caption : pl.title,color: "#1f78b4"} as source2 ,
{id: id(t2),label:head(labels(t2)),caption : t2.title,color: "#33a02c"} as target2,
{id: id(t2),label:head(labels(t2)),caption : t2.title,color: "#33a02c"} as source3,
{id: id(t3),label:head(labels(t3)),caption : t3.title,color: "#b2df8a"} as target3,
{id: id(t3),label:head(labels(t3)),caption : t3.title,color: "#b2df8a"} as source4,
{id: id(a),label:head(labels(a)),caption : a.title,color: "#f29f55"} as target4,
{id: id(a),label:head(labels(a)),caption : a.title,color: "#f29f55"} as source5,
{id: id(a2),label:head(labels(a2)),caption : a.title,color: "#69f5f5"} as target5
LIMIT 1000
The problem with this approach is that if I do an OPTIONAL MATCH some of the results are null , and that's where it fails
06-14-2021 06:26 PM
Thanks Mangesh, I'm hoping there's a way to do this without post-processing with javascript
06-14-2021 06:41 PM
Anyone able to share a cypher solution for the scenario seen above of setting up a JSON file of [nodes] & [links] where the [nodes] include zero-based numbering and the zero-based numbering is utilized for source to target in the [links]?
All this to set up a JSON file to be used to create a D3 Sankey Diagram?
11-12-2022 01:59 AM
D3 Javascript solution:
Even though it's been a while, big thanks to Joel for the query solution and a big thanks to Malcolm Maclean whose 4th edition book for D3 with the code solution is free (or rather pick your price) on leanpub: https://leanpub.com/d3-t-and-t-v4
The source code has 3 Sankey examples of which, the sankey-formatted-names.html solves the zero-based index problem.
All the sessions of the conference are now available online