Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-23-2020 04:44 AM
I have built an entire application on your platform only to found the following info on someone's site
I have started experiencing some of them, especially the out of Memory exceptions.
What is your defense? Have I wasted my time on a software thats not scalable or I am doing something wrong?
08-23-2020 06:45 AM
Let me try to explain what I have seen from my experience.
This is such a generic statement there is nothing to even say here. If there is a specific issue about clustering then it can be answered. It's like saying I have something in my mind and your system does not do it that way. I have worked with environments that involves large clusters with large amounts of data. As any complex environment which involves configurations like that requires better tuning and monitoring.
If you are looking for sharded data similar to elasticsearch then you are better off with any document database. Since each document is independent and not related to others anyway distributing data and eventual consistency is good enough for that aspect. What you gain here with data distribution and write performance will lose out when you want to traverse connected data.
Let me give you a simple example. A customer was using elasticsearch to store the manufacturing hierarchy. Most of the reads from that system are good and working well. Then came a scenario where they needed to pick a part and find the hierarchy up 3 levels. The data was not a lot, few 100,000 records. It was taking lot of time for them and not meeting their SLA's. They involved elastic engineers also to see how this can be solved and they couldn't do it. The problem here was using a wrong tool for work. When they loaded the data in neo4j it was way faster because of the way the data was stored. It is about using right tool to solve a given problem.
Again I think you are comparing document databases here with graph database. Graph database optimizes traversals. They cannot be as fast as document databases for writes. But they can be lightning fast for traversals, which document databases cannot match in any for or fashion. So, look at your use cases and see what works best for you. Also, the data in Neo4j is pre joined data. so due to acid compliance the locking can be bit higher than RDBMS world. Models can be adjusted for read/write optimization.
Clustering provides availability and redundancy. There is a cost involved in synchronizing transactions. If the servers are closer in networking perspective the cost is very small. If you have distributed the servers geographically and networking between them is really bad then it can have a negative effect. I have worked with customers who are able to process 5000 messages per second (create 4 nodes and 6 relationships for each msg) in a 3 node cluster.
Why do you want to replicate with a clean node. Take the latest backup and start the new node with that backup and let it catch up the latest transactions. If you are looking to rebuild the whole cluster, then it is way faster. I have worked with a customer who had around 1.5 TB of db. With that db we got a 3 node cluster up and running in 30 mins.
I guess you are referring to community version. It is what it is.
As with any system you need to tune the memory for your use case and be aware how your model and queries can affect it. I have built a patient claims db with 100 million nodes, and 1 billion relationships. We have built the model and wrote code that to generate a sankey chart of what procedures are done in the next 90 days after a condition is identified across all the patients takes around 3-4 seconds and uses less than 10 MB of heap.
Most possible reason GC causes issue is very badly written queries.
Most of the databases I worked with are > 50 million nodes and > 500 mil relationships and all of them are in production. If that is small dataset in your case then i guess you need a specialized implementation.
I have never heard a thing like that.
08-24-2020 10:30 PM
I've got a strong feeling this is an older complaint, possibly several years old and out of date. I can address some of these, some of our other users have addressed other parts as well.
bottleneck with large volume of writes due to slave/master topology
Our older HA clustering used slave/master topology. That has been deprecated for some time, and was removed with our Neo4j 4.0 major release. We've been using causal clustering for years now, which is based on the Raft protocol and doesn't fit the description they provided.
issues with bulk loading, indexing (e.g. range, sort, etc), slow upsert
Our indexing has improved over the years. We migrated away from lucene indexing (which was slow for insertion, possibly the reason for the older complaint) and went to a native indexing structure which is faster for index writes. As mentioned our indexing has improved as well, allowing composite indexes and a fulltext index for more complex substring matching.
confusing license terms for production use
We've addressed this over the past few years, it is simplified now.
heap allocation and GC cause out-of-memory errors
Heap and GCs are usually due to huge transactions. Neo4j is an ACID database, and as such, transactions must be committed atomically. This means any pending changes in a transaction must be held in memory at the same time and applied all at once, so it's possible to craft queries (by accident or design) that eat up all your heap space. You may need to break down larger transactions or batch changes if there's too much to apply in a single transaction. Managing that and understanding how this works is mostly the user's responsibility, but more capabilities have been added in our recent versions to restrict how much heap memory is allowed to be used per query, to keep things under control. We'll continue to add capabilities to manage this. Note that if you're comparing to a non-ACID database, they may not have this issue to deal with, but as a consequence you may not have atomicity in transactions, and you may have other issues due to the database being non-ACID. Choose the right tool for the job and your needs.
practically only useful for small dataset reads that require visualization
We're used by some of the top companies from around the world, so clearly that statement is not true.
If you have issues with out of memory exceptions, reach out to us. If you have an enterprise license with us, you probably have an enterprise support contract, so leverage that. If not, ask on the forums. We're happy to assist.
Neo4j is a powerful tool, but as with all powerful tools, you need to know how to use it well, and there is learning involved. We're happy to lend a hand.
All the sessions of the conference are now available online