Isn’t Linear Scaling Wasteful?
In conversations with my customers surrounding hyperconvergence, some of the sharper folks will ask an important question about the hyperconvergence model. “If I buy a node with both compute and storage every time, isn’t that wasteful when I only need compute OR storage?” The assumption is that workloads will not scale in a linear fashion. So if the infrastructure scales linearly then one or more resources may be overprovisioned.
There are various ways hyperconvergence vendors deal with this. They may allow you to add compute nodes without any storage. Or they may offer different node configurations to allow a large addition of RAM with only a small addition of storage. Although these things can be helpful in certain situations, linear scaling is usually desirable. It may seem wasteful on the surface, but scaling out in consistent chunks will prove to be beneficial for several reasons.
IO Performance
It is quite possible that it will be time to add compute resources before needing to add storage capacity. The same isn’t likely true for storage performance. It can be assumed that the purpose for adding compute resources is to create VMs. Further, those VMs will consume IOPS. As such, they justify adding spindles.
In the same way, adding cluster nodes adds storage controllers. This increases the performance potential of the pooled disks. A common situation over the past few years has been that an array is scaled (by adding just disks) to increase performance. This continues until the point that the array controller(s) can no longer keep up. Eventually this leads to a forklift upgrade scenario where the old controllers must be replaced. The old ones are (often disruptively) removed and replaced with newer, larger controllers.
In the case of hyperconvergence, each time you add a node, you incrementally add capacity to serve IO. This moves the potential bottleneck away from the storage controllers. The limitation becomes the scalability limits of the platform itself. Often this is the maximum cluster size for the hypervisor.
Failure Domain
Even if you decide that you don’t need more storage capacity or performance, increasing the number of nodes holding disks is always a good idea from an availability standpoint. In adding a new node, there are more buckets in which to store data. With the addition of ‘n’ nodes, you reduce the amount of data that lives in a single failure domain.
With certain platforms, you can choose the number of node failures to sustain. It would be possible to add a node for compute resources, but increase the setting for storage failures to sustain. This would keep your cluster right-sized, but substantially increase resilience. For example, taking the cluster from n+1 to n+2 from a storage standpoint.
So Is Linear Scaling Wasteful?
No. Although one may not need more storage capacity, there are other valuable considerations when adding nodes. Two major concerns are: predictable performance and maintaining availability.
When each node has a consistent amount of compute and storage, we’re able to estimate the number of VMs it can host and at what performance level with a fair degree of accuracy. That performance remains predictable as the environment continues to scale because each “building block” contains all the resources needed.
Also, when a full node is added, availability is maintained or even increased by further distributing the workload. At the same time, this reduces the size of the failure domain. Each node added makes an environment more tolerant to failure!
In the end, what may seem more expensive up front will pay dividends in reduced operational expense and uptime over the life of a hyperconverged cluster. Invest now, and continue to reap the benefits!