Neo4j

laxmimerit · ‎04-30-2021

Hi,
I am evaluating the Neo4j 4.2 with data that have millions of nodes and relationships. But writing performance is quite slow. My sample query is given below

create (session:Session {session_id: 'session_id1'})
;

match (s:Session) where s.session_id='session_id1' with s
create (e1:Event {insert_id: "insert_id1"}) set e1:SeenPage
create (s)-[:CONTAINS]->(e1)
create (s)-[:FIRST_EVENT]->(e1)
merge (pp1:Properties {value: "sample-url-1"}) set pp1:Page merge (e1)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e1)-[:RELATED_TO]->(pp2)
;


match (e1:SeenPage) where e1.insert_id='insert_id1' with e1
create (e2:Event {insert_id: "insert_id2"}) set e2:Show merge (e1)-[:NEXT]->(e2) with e2
match (s:Session) where s.session_id='session_id1' with s, e2
create (s)-[:CONTAINS]->(e2)
merge (pp1:Properties {value: "occasions"}) set pp1:Category merge (e2)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sample-url-2"}) set pp2:Page merge (e2)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e2)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "child category"}) set pp4:Sub_Category merge (e2)-[:RELATED_TO]->(pp4)
;


match (e2:Show) where e2.insert_id='insert_id2' with e2
create (e3:Event {insert_id: "insert_id3"}) set e3:SeenPage merge (e2)-[:NEXT]->(e3) with e3
match (s:Session) where s.session_id='session_id1' with s, e3
create (s)-[:CONTAINS]->(e3)
merge (pp1:Properties {value: "/p-page-0"}) set pp1:Page merge (e3)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e3)-[:RELATED_TO]->(pp2)
;


match (e3:SeenPage) where e3.insert_id='insert_id3' with e3
create (e4:Event {insert_id: "insert_id4"}) set e4:Show merge (e3)-[:NEXT]->(e4) with e4
match (s:Session) where s.session_id='session_id1' with s, e4
create (s)-[:CONTAINS]->(e4)
merge (pp1:Properties {value: "rect1"}) set pp1:Category merge (e4)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "/p-page-1"}) set pp2:Page merge (e4)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e4)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "him"}) set pp4:Sub_Category merge (e4)-[:RELATED_TO]->(pp4)
;


match (e4:Show) where e4.insert_id='insert_id4' with e4
create (e5:Event {insert_id: "insert_id5"}) set e5:SeenPage merge (e4)-[:NEXT]->(e5) with e5
match (s:Session) where s.session_id='session_id1' with s, e5
create (s)-[:CONTAINS]->(e5)
merge (pp1:Properties {value: "/p-page-2"}) set pp1:Page merge (e5)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e5)-[:RELATED_TO]->(pp2)
;

This data represents the journey of a user on a website. The user starts a session and browses the pages. The action done by the user is recorded as the event. Each event has its unique id. Then the sequence of events is connected with the relationship :NEXT and CONTAINS . Events are not unique that's why I had to use create not merge . Properties of events are unique and these are created as nodes then added with a relationship RELATED_TO .

It's like this

#session contains events 
#events are connected with :next

Session-[:CONTAINS]->(Event1)-[:NEXT]-(Event2)<-[:CONTAINS]-Session

A session can contain 100s of events. The current speed of writing is quite slow. It is writing 10k sessions data in 4 hours. Each session contains on average 10 events. I am writing data event by event using a python bolt connector.

Any help would be really appreciated.

#neo4j #optimization #cypher #python

Cobra · ‎05-03-2021

Hello @laxmimerit

Did you use UNIQUE CONSTRAINTS before to load your data? It will load your data faster.

Regards,
Cobra

laxmimerit · ‎05-03-2021

Hi, Thanks for the reply. Yes I had created [UNIQUE CONSTRAINTS] for the merge nodes. Few node type I need to CREATE so did not use for that one otherwise yes for all others.

Cobra · ‎05-03-2021

Verify that all node types that have a unique property have a unique constraint. You could retry to reduce the number of merge, you can merge a whole path and set labels after.

Neo4j

Query Optimization for Parallel Connection in Neo4j Community Version