Neo4j

WardCunningham · ‎11-11-2018

I am importing four json files using a process similar to that described by Michael Hunger.

Three of the files load successfully but one is giving me trouble because I can't get the cypher logic right. I feel like I need to use a case statement but that doesn't seem to be useful for choosing between different merge strategies. The file describes dataflows in a computer system. The json is an array of objects where each object describes a flow of one of three types: a database connection, a restful api, and a async-queue producer/consumer. The particular file is available online:

http://context.asia.wiki.org/plugin/json/dataflow-diagram

My Cypher code tries to use CASE to choose which merges to apply.

unwind {json} as obj
case obj.type
when 'store' then
  create (d:DB {store: obj.store})
  merge (f:PROGRAM {name: obj.from})
  merge (f)-[:STORE]->(d)
when ...

This fails with the message:

Invalid input 's': expected 'l/L' (line 2, column 3 (offset: 23))
"case obj.type"
   ^

Now having studied this grammar:
https://s3.amazonaws.com/artifacts.opencypher.org/M12/railroad/CaseExpression.html
I realize case was never intended to choose between merge queries. I'm thinking now I have to create incomplete nodes with only the type field and then run a series of match-merge sequences, one for each type in my json. Or is there a better way?

I should mention that I'm running these Cypher scripts as part of a Docker build.

github.com

WardCunningham/context-db/blob/master/Dockerfile

FROM neo4j:3.1.5

RUN apk --no-cache upgrade && \
    apk --no-cache add curl jq

WORKDIR /
COPY . /

# Configure neo4j for read/write
COPY neo4j.write.conf /var/lib/neo4j/conf/neo4j.conf

WORKDIR /var/lib/neo4j
ENV PATH /var/lib/neo4j/bin:$PATH
RUN neo4j start && \
    sleep 15 && \
    cat /var/lib/neo4j/logs/neo4j.log && \
    echo ================================ && \
    sh /load_json.sh organization-chart && \
    sh /load_json.sh source-code-control && \
    sh /load_json.sh dataflow-diagram && \

This file has been truncated. show original

WardCunningham · ‎11-11-2018

I think I have acceptable logic now but it appears to be surprisingly inefficient running 100x slower than my other cypher for similarly sized json file. Here is my approach (be sure to scroll to see every line):

create (s:SOURCE {
  name:'dataflow-diagram',
  title:'Dataflow Diagram'})
with s

unwind {json} as obj
  create (c:CASE)
  set c=obj
with s

match (c:CASE {type:'store'})
merge (d:DB {name:c.store})
merge (g:PROGRAM {name:c.from})
merge (g)-[:QUERIES {source:s.name}]->(d)
with s

match (c:CASE {type:'rest'})
merge (f:PROGRAM {name:c.from})
merge (t:PROGRAM {name:c.to})
merge (f)-[:CALLS {source:s.name}]->(t)
with s

match (c:CASE {type:'queue'})
create (q:QUEUE {name:c.queue})
merge (r:PROGRAM {name:c.read})
merge (r)-[:READS {source:s.name}]->(q)
merge (w:PROGRAM {name:c.write})
merge (w)-[:WRITES {source:s.name}]->(q)

I've included the source attribution that I omitted in my previous code as a simplification. Now I include it so that I am directly cutting and pasting my work. This is the terminal output I get while loading my data one file at a time.

organization-chart
real	0m 1.46s
user	0m 0.00s
sys	0m 0.00s

source-code-control
real	0m 1.19s
user	0m 0.00s
sys	0m 0.00s

dataflow-diagram
Cannot merge node using null property value for name
real	1m 49.67s
user	0m 0.00s
sys	0m 0.00s

service-traffic-reports
real	0m 0.50s
user	0m 0.00s
sys	0m 0.00s

I can't explain my "null property value" error either. I've inspected my data with jq and haven't found anything amiss. I could be struggling too long. Time to set this project aside.

WardCunningham · ‎11-13-2018

William_Lyon · ‎11-13-2018

Hi Ward -

There is a bit of a hack you can use for this sort of conditional logic that involves creating dummy lists using CASE and then iterating over them. Something like this:

create (s:SOURCE {
  name:'dataflow-diagram',
  title:'Dataflow Diagram'})
with s

UNWIND {json} AS obj
WITH *,
  CASE WHEN obj.type == 'store' THEN [1] ELSE [] END AS stores,
  CASE WHEN obj.type == 'rest' THEN [1] ELSE [] END AS rests,
  CASE WHEN obj.type == 'queue' THEN [1] ELSE [] END AS queues


FOREACH (_ IN stores |
  MERGE (d:DB {name: obj.store})
  MERGE (g:PROGRAM {name: obj.from})
  MERGE (g)-[:QUERIES {source: s.name}]->(d)
)

FOREACH (_ IN rests |
  MERGE (f:PROGRAM {name: obj.from})
  MERGE (t:PROGRAM {name: obj.to})
  MERGE (f)-[:CALLS {source: s.name}]->(t)
)

FOREACH (_ IN queues |
  CREATE (q:QUEUE {name: obj.queue})
  MERGE (r:PROGRAM {name: obj.read})
  MERGE (r)-[:READS {source: s.name}]->(q)
  MERGE (w:PROGRAM {name: obj.write})
  MERGE (w)-[:WRITES {source: s.name}]->(q)
)

Also, do you have uniqueness constraints created for any of the properties you are merging on?

WardCunningham · ‎11-13-2018

Thanks for the tip William. The decoding section that I finally used distinguished queue reads from queue writes. This was my source of null values since only one or the other was present. There was also a fourth type, calls to external sites.

  CASE WHEN obj.type='store'
    THEN [1] ELSE [] END AS stores,
  CASE WHEN obj.type='rest'
    THEN [1] ELSE [] END AS rests,
  CASE WHEN obj.type='queue'
    and obj.read is not null
    THEN [1] ELSE [] END AS reads,
  CASE WHEN obj.type='queue'
    and obj.write is not null
    THEN [1] ELSE [] END AS writes,
  CASE WHEN obj.type='site'
    THEN [1] ELSE [] END AS sites

This version ran in a half second which was consistent with the other files. I wonder what my version was doing with all of those seconds?

dataflow-diagram
real	0m 0.59s
user	0m 0.00s
sys	    0m 0.00s

Neo4j

Import from JSON where CASE statement would help