cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to reduce required storage key-names

I have a network with 6.5 Million Datapoints which I want to import from a CSV. All Datapoints will have the exact same structure, like you would expect from an SQL-DB with 6.5 Million rows.

I have limited storage and I am not a fan of redundancy.

From how I understood Neo4J it is schemaless, meaning for 6.5 Million rows the key-names of the JSON key-value pairs would be repeated for every single datapoint.
To give an estimation: With 10 keys, each having a length of 10 chars; if Neo4J saves the chars in UTF8, that is roughly 100 Bytes per Datapoint (I guess even more due to the bigger structure)
With 6.5 Million Data points this is at the very least 650MB, I guess multiple GB of redundant key names that are saved all over.

We can reduce that load by reducing our key-names - "name" becomes "n", "someOtherVeryLongKey" becomes "sovlk" or "s" - or we just use numbers. This reduces the data load, but also readabilty.

I am not happy with any of those 2 solutions. Either I have a bad to maintain DB, or I have a lot of redundancy.

Is this just how it is or is there any option I didnt see exists?

2 REPLIES 2

Hi @riggedCoinflip,

Neo4j consists of Nodes, relationships, and properties. Basically, you can save space based on your model. If you model everything inside one node but having 6.5million properties it will destroy your resource because properties are the most expensive part of the graph (storage). If you model the data to be as nodes than probably you will need to have less storage as the nodes are not so expensive but on the other side you have to connect them somehow based on you data. Basically it is a tradeoff between easy but more expensive (properties) or a bit difficult but less expensive (nodes and rels). More info about the modelling can be found here

I dont really understand how this reduces the storage needed.
Option A: node with all properties - relationship - other node with all properties
Option B: node with no properties - relationship - "data node" - with all properties
and node with no properties - relationship - other node with no properties

In both cases I have N nodes with all properties. In Case B, I also have N nodes without properties.

GDS doesnt use properties anyways (unless specified), so for my pathfinding there shouldnt be a difference