Genx6: VMware HA

With Genx6 vmware vCenter 6.0 vSphere HA now includes Virtual Machine Component Protection (VMCP). Which provide protection from features like All Paths Down (APD) and Permanent Device Loss(PDL)  which are applicable on block (FC, iSCSI, FCoE) and file storage (NFS).

PDL condition occurs when a device is unintentionally removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error.

When the storage array determines that the device is permanently unavailable, it sends SCSI sense codes to the ESXi host. The sense codes allow your host to recognize that the device has failed and register the state of the device as PDL. The sense codes must be received on all paths to the device for the device to be considered permanently lost.

In case of PDL VM are terminated immediately and are restarted on a healthy host.

HAimgs1

If the sense codes is not received on all paths to the device then it is considered as All Path Down.

The reasons for an APD(All Paths Down) state can be, for example, a failed switch or a disconnected storage cable.

By default, the APD timeout is set to 140 seconds, which is typically longer than most devices need to recover from a connection loss. If the device becomes available within this time, the host and its virtual machine continue to run without experiencing any problems.

HAimgs2

If the device does not recover and the timeout ends, the host stops its attempts at retries and terminates any nonvirtual machine I/O.

After 140 seconds HA start counting for 3 Minutes(Default), post that HA can issue and action like Issue events, Power Off and restart VMs(Conservative) and power off and restart VMs.

HAimgs3 HAimgs4

Aggressive / conservative refers to nature of response configured for APD. Aggressive just restart the VM in case of APD without check if the respective host is having resource or not. Conservative state try to check if any host have sufficient resource to power on the VM post that it will restart the vm or no action will be taken.

In contrast with the permanent device loss (PDL) state, the host treats the APD state as transient and expects the device to be available again.

Example, Two ESXi connected through Storage array controller. Storage array controller is connected to FC storage and iSCSI storage.

HAimgs5

In case of All path Down(APD) in FC,  VM will wait for APD timeout 140 second post 3 minutes VM will be rebooted to next available host. So by default VM take about 320 second to complete APD HA action.

HAimgs6

Where as in case of PDL VM is terminated instantly and rebooted on next available host.

HAimgs7

Source:Pubs

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s