Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-02-2019 09:03 AM
Hello Sir/Madam,
I have been reading various papers on graph indexing so far but, I am extremely confused.
Any suggestions on this topic would be extremely valuable.
Solved! Go to Solution.
10-18-2019 01:46 AM
@dana.canzano @andrew.bowman Thank you so much for your time and response. I believe most of my doubts are clarified.
10-02-2019 11:34 AM
Hi @chim3yy
Welcome to the community
We are glad to know that you are interested in neo4j.
Indexing is kind of thumb rule in terms of performance in neo4j.whenever you perform some query on graph db , indexes are the first thing that will refer the total db hits by your query. when you have large amount of data then you can see the difference in performance and time with indexes.
for more info you can refer below blog .
When Neo4j creates an index, it creates a redundant copy of the data in the database. Therefore using an index will result in more disk space being utilized, plus slower writes to the disk.
Therefore, you need to weigh up these factors when deciding which data/properties to index.
Generally, it's a good idea to create an index when you know there's going to be a lot of data on certain nodes. Also, if you find queries are taking too long to return, adding an index may help.
Hope this help you in query.
please let me know if you required further details.
Cheers.
10-02-2019 08:05 PM
Thank you so much for your response. This definitely answered why indexing is important for Neo4j. If I’m not wrong, using index for query processing can allow for first retrieval of certain features that you have indexed. However it can lead to high memory usage along with heavy write. But I’m still confused why graph based indexing is required for feature selection. Your suggestions are really valuable, thank you for your time.
10-04-2019 04:51 AM
regarding the comment of
When Neo4j creates an index, it creates a redundant copy of the data in the database. Therefore using an index will result in more disk space being utilized, plus slower writes to the disk.
Therefore, you need to weigh up these factors when deciding which data/properties to index.
as a point of clarification the 'creates a redundant copy of data`
as a point of clarification
creates a redundant copy of data
this should be
creates a redundant copy of data for the property indexed
.
For example if you have 100 million :Person nodes and each node has 20 properties and you then create an index on :Person(age) we do not create a duplicate copy of those 100 million :Person nodes with 20 properties. Rather we simply create a redundant
copy of the 100 million :Person nodes and on the given property age
.
Also with regards to 'using an index will result in more disk space being utilized, plus slower writes to the disk.` this is true but this is true of most any/all RDBMS. Indexes are not exactly free. Free to create yes, but they do impact load/write performance simply because as you update the data you also then need to update the associated indexes.
As to indexes and why they are import, if one runs
match (n:Person) where n.age>20 and n.age<30 return n;
without in index on the age
property we would need to iterate over the 100 million :Person nodes and check each node to see if it satisfies the where clause. However with a index on :Person(age) as the index has details on the age
property the query would be much faster.
10-16-2019 03:31 PM
As clarification, I don't believe the nodes or their data is duplicated...I believe it's the graph id, which is essentially a pointer to the nodes in question. The property value for the indexed property is stored in the index however.
So we're keeping the minimal amount of data to serve the index, and not duplicating nodes or node data. Additionally, since we have the indexed property value in the index, we can use that in some optimization scenarios, such as 3.5's index-backed ORDER BY operations, when a hint is provided in the query about the property's type.
10-18-2019 01:46 AM
@dana.canzano @andrew.bowman Thank you so much for your time and response. I believe most of my doubts are clarified.
All the sessions of the conference are now available online