Neo4j

DanielGittx · ‎06-05-2020

I'm new to graph and I'm evaluating if neo4j would fit my use case.

I have 2 CSV files as follows:-

Persons file (phone number, name columns)
Calls file (callerNumber , recipientNumber, callDate columns).

I anticipate >50million nodes, >20billion relationships)

I have been able to create Nodes using Persons file and Relationships using Calls files through Neo4j admin import.

Challenge comes when deleting relationships for a certain callDate so that I can add newer relationships. It's too/painfully slow for large datasets.
match ()-[r: {callDate:20200101}］->() delete r;

I found out I can't index relationship properties.

Is there a way to optimize this cypher? How could I possibly re-model my CSVs?

Cobra · ‎06-06-2020

Nice, happy to hear this 🙂

The apoc procedure and the index should really speed up your query

Regards,
Cobra

View solution in original post

Cobra · ‎06-06-2020

Hello @DanielGittx,

Yes, it's possible This request should work:

CALL apoc.periodic.iterate('MATCH ()-[r:{callDate:20200101}]->() RETURN r', 'DELETE r', {batchSize:1000, iterateList:true})

It deletes relationships by batches of 1000 relationships.

Regards,
Cobra

DanielGittx · ‎06-06-2020

Hi @Cobra,

Thanks much. Indeed the apoc you shared works (I just refractored syntax abit). But it's a bit slow for about 10billion relationships I'm working with(6 months data)

I came across this "db.index.fulltext.createRelationshipIndex" as a way of indexing relationship property.
The index is currently populating hopefully the cypher will gain some speed once done

Cobra · ‎06-06-2020

Nice, happy to hear this 🙂

The apoc procedure and the index should really speed up your query

Regards,
Cobra

DanielGittx · ‎06-08-2020

Just an update...
The indexing process is very slow.

Considering:-

Database size is 1.2t
Server configs:-
Heap - 230g
Page cache - 1.182t

Neo4j Version:-
Neo4j Browser version: 4.0.3
Neo4j Server version: [3.5.15]

It has taken 3hrs to just get to 12% (index populating)

CALL db.index.fulltext.createRelationshipIndex("callDateRelationship",["CALLS"],["CALL_DATE"], { analyzer: "url_or_email", eventually_consistent: "true" })

Why is this and is it possible to fast track?

Cobra · ‎06-08-2020

Hello @DanielGittx,

Yeah because it has to index all your database, that's why it's better to do it when you create the database

Regards,
Cobra

DanielGittx · ‎06-08-2020

Agreed, however initially had done a bulk import (neo4j admin import).
Will neo4j admin import preserve indexes if i create them in advance then do a bulk import?

Cobra · ‎06-08-2020

If I'm right, the index is set at the importation

Regards,
Cobra

DanielGittx · ‎06-08-2020

I don't think so, especially for relationship indexes

I marked one of your messages as solution because i tested that with a subset of the graph and it worked(was fast) also for the fact that i'm solving a different issue now

Cobra · ‎06-08-2020

I don't know more about this topic but I think you right, according to the DOC,

Full-text indexes are powered by the Apache Lucene indexing and search library

so it must be pre-computed already

Regards,
Cobra

alex_rivilis · ‎06-14-2020

Daniel,
I would suggest changing your data model to have a day of the call as a node. So it would look like:
(:Person) -[:CALLED_ON]->(:DayOfCall) <-[:RECEIVED_CALL]- (:Person)
Then you can index day of the call with date property - then DELETE request will work much faster. Please note that you will still need to use apoc.periodic.iterate()

Neo4j

Deleting older relationships