I don't normally blog about new product releases. Although I have done so in the past, I somewhat feel that those interested in what a new release of a product has to offer are probably better off reading the release notes provided by the vendor. However, this time I am going to make an exception. I'm not trying to replicate the information contained within the release notes, but in this post I cover what is not really mentioned in the release notes.
With VMware vRealize Operations Manager 6.1, VMware has made a significant change to the underlying database engine by replacing the Gemfire distributed database system with a Cassandra distributed database system, which is also the database system used within Log Insight. One thing that I've acquired (amongst many things) during my time on our current customer project where we are working on a multi-million dollar global vRealize Suite deployment, is a rather strong distaste of Gemfire. In a recent blog post, vRealize Log Insight Performance Woes – Understanding Cassandra, I mention that vRealize Operations 6.0.2 is built on Gemfire and Log Insight 2.5 is build on Cassandra, and how Log Insight's rather simple implementation of Cassandra Snitches can lead to serious performance issues if your large Log Insight environment is deployed incorrectly. Despite this, and in a very short period of time, I've come to love Cassanrda's architecture.
Until a few days ago, I was unaware that VMware is replacing Gemfire in vROPS with Cassandra. Just last week, I mentioned to my colleagues that it would be great if vROPS was to adopt Cassandra rather than Gemfire. And about a week later, I've got a vROPS 6.1 instance running in my lab, using a Cassandra database. Great!
So why do I dislike Gemfire so much? It's got it's place and it is actually very scalable and quite well understood and used in the industry, however, I just don't think is as efficient enough to handle a vROPS in the large deployments that we are working on right now, especially in the way it's been implemented in vROPS. It's memory hungry and in VMware's implementation of it with vROPS 6, it's just not fault tolerant enough. For smaller vROPS deployments, it's probably sufficient, but with vROPS 6.0.2/3 you're limited to 8 node clusters, which just isn't sufficient. Just two days ago, I sent a tweet where I mentioned that with 8 nodes configured with 48GB of RAM each, we are still seeing the guest OS swapping to disk. Remember, Gemfire is an in-memory distributed database, and by definition is going to be heavy on memory. Currently, our deployment has about 55,000 objects. I believe the official limit for object in 6.0.2 is 75,000. Our planned object count for the current deployment is 120,000 objects and VMware therefore had to sign a specific support statement to keep the product in support even with that many objects. The problem is, we're not even close to 75,000 objects and we are already seeing issues with scaling of the cluster.
This is where I hope to see vROPS with a Cassandra database make a difference. I don't know if it will, as it's early days and I've not used it in production under stress, but moving away from the Gemfire model is a step in the right direction, as in my opinion, few things scale like Cassandra :)
Despite this, you'd probably be surprised that the vROPS 6.1 maximums are not actually that much higher than vROPS 6.0. You can now scale a cluster to 16 nodes, supporting up to 120,000 objects, rather than vROPS 6.0's 8 nodes at 75,000. However, I believe VMware is aiming to increase those limits with every subsequent release following 6.1.
You'll still see traces of Gemfire components on the vROPS 6.1 nodes, but the distributed database component now sits on Cassandra.
In addition to the database change, VMware has also now (to an extend) bundled Hyperic into vROPS. Hyperic 5.8.4 is a powerful solution. However, it doesn't scale well at all! In our current deployment, we're having to deploy 6x "Large" size Hyperic instances to meet expected demand. Now, in vROPS 6.1, there is a new Adapter solution called the Endpoint Operations Management. This is essentially the new Hyperic adapter. However, instead of having Hyperic deployed as a separate set of servers (app and DB), you now install the EOPS agent and point them directly at your vROPS servers and the "Hyperic" data will appear within vROPS. I call it "Hyperic" in this post, but it's actually called End Point Operations.