Balancing Resource Demand and Capacity with Predictive Analytics – Predictive DRS

Predictive DRS Introduction and Demonstration

Predictive Distributed Resource Scheduler (pDRS)  is an evolution of the vSphere service that intelligently assures balanced and efficient capacity utilization This new capability is provided within the combination of vSphere 6.5 and vRealize Operations 6.4.

UPDATE: I have clarified that pDRS requires vROps 6.4 and above and ONLY vCenter 6.5 and above. This is another of the compelling reasons to move to vCenter 6.5,.

Also, with the latest update to vROps (6.5), there is no longer a 4000 VM limitation per cluster. I don’t see this as ever having been a realistic limitation as I’ve never seen cluster with 4000 VMs .

To understand the benefit of pDRS, you need to understand the fundamentals of DRS and vRealize Operations (vROPs). I’ve published a separate article on vROPs, so I’ll just provide a brief overview of DRS here. Feel free to skip the next three paragraphs if you’re familiar with DRS.

DRS is a core component of vSphere that utilizes advanced algorithms to dynamically balance workloads across available physical CPU and memory capacity. The unit of physical capacity is defined by placing resources (physical hosts) into a managed group (vSphere Cluster).

Every five minutes, vCenter will evaluate the current utilization of VMs in the cluster. Based on a configurable threshold, it determines if enough benefit will be realized by moving (vMotioning) VMs to other hosts.

Factors other than CPU and memory utilization are considered. For example, how long a vMotion operation will take, or how network utilization (another new feature in vSphere 6.5 DRS – Network-Aware DRS ) would impact performance.. vCenter can automatically make these changes or suggest them for administrative action. Many online resources covering DRS are available if you’d like to learn more.

The important mechanism of DRS to understand in this context is that vCenter makes decisions based on current conditions only,. Because of this, a cluster must become imbalanced (utilization must become strained) before action is taken. This entails a short period of potentially degraded performance while the re-balancing operations execute on the strained resources.

If we could look into the past, for a long enough period, with enough metric observations, and appropriate algorithms, we would find patterns of predicable resource utilization in a subset of our workloads. And,as you know, that’s exactly what vROPs does. So, if we can pinpoint workloads with predictable utilization patterns (think batch jobs, recurring campaigns, etc.), we should be able to proactively balance our workload placement before the resource demand begins.

This is where Predictive DRS comes in. pDRS is a connection between vROPs and vCenter, with vROPs providing trend data to vCenter for the purpose of incorporating it into its DRS algorithm. There are analytics performed on the vROPs side to determine when a trend is reliable enough to be useful to vCenter for DRS decisions. Only the patterns determined by vROPs to be useful are shared with vCenter. I will follow up with a future post on the factors that influence this decision if tuning options become available.

vROPs utilizes Dynamic Threshold trends for pDRS; therefore, the vROPs service must have an established Dynamic Threshold baseline calculated for an object before vCenter will receive pDRS analytics. In vROPs, this occurs 14 days after the first collection of metrics of an object. vROPs sends updates to vCenter daily at 6:00 am, representing any changes based on the previous 24 hours of analysis. So, if you add a new VM to your vCenter, expect it to be 14-15 days before it’s evaluated for pDRS balancing.

Once vCenter has the trend data from vROPs, it incorporates it into its DRS cluster analytics (pDRS is enabled on a cluster basis).. vCenter will first reconcile immediate resource demands before adjusting based on future predictions.  Predictive DRS also utilizes the data for Distributed Power Management (DPM) decisions. The default is to look ahead 60 minutes.

pDRS will consider a 60 minute ahead predicted state as the current state. You will find online blogs stating that this window can be changed with DRS Advanced Options by defining the ProactiveDrsLookaheadIntervalSecs override. I’m unclear on which config settings are internal VMware only and which are cleared to  be shared with customers. I can say with certainty that the pDRS service has a number of configuration parameters that are pre-tuned for optimal performance. If you choose to change the ProactiveDrsLookaheadIntervalSecs  parameter, keep in mind that tuning this setting may actually degrade your specific pDRS performance.

That’s pDRS in a nutshell, it’s a simple two-click configuration. Among its benefits, DRS maximizes utilization of capacity and reduces power consumption in the data center.  pDRS improves DRS through the use of latent resources. This results in further lowering CAPEX and OPEX while increasing CSAT.