With Genx6 vmware vCenter 6.0 vSphere HA now includes Virtual Machine Component Protection (VMCP). Which provide protection from features like All Paths Down (APD) and Permanent Device Loss(PDL) which are applicable on block (FC, iSCSI, FCoE) and file storage (NFS).
PDL condition occurs when a device is unintentionally removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error.
When the storage array determines that the device is permanently unavailable, it sends SCSI sense codes to the ESXi host. The sense codes allow your host to recognize that the device has failed and register the state of the device as PDL. The sense codes must be received on all paths to the device for the device to be considered permanently lost.
In case of PDL VM are terminated immediately and are restarted on a healthy host.
If the sense codes is not received on all paths to the device then it is considered as All Path Down.
The reasons for an APD(All Paths Down) state can be, for example, a failed switch or a disconnected storage cable.
By default, the APD timeout is set to 140 seconds, which is typically longer than most devices need to recover from a connection loss. If the device becomes available within this time, the host and its virtual machine continue to run without experiencing any problems.
If the device does not recover and the timeout ends, the host stops its attempts at retries and terminates any nonvirtual machine I/O.
After 140 seconds HA start counting for 3 Minutes(Default), post that HA can issue and action like Issue events, Power Off and restart VMs(Conservative) and power off and restart VMs.
Aggressive / conservative refers to nature of response configured for APD. Aggressive just restart the VM in case of APD without check if the respective host is having resource or not. Conservative state try to check if any host have sufficient resource to power on the VM post that it will restart the vm or no action will be taken.
In contrast with the permanent device loss (PDL) state, the host treats the APD state as transient and expects the device to be available again.
Example, Two ESXi connected through Storage array controller. Storage array controller is connected to FC storage and iSCSI storage.
In case of All path Down(APD) in FC, VM will wait for APD timeout 140 second post 3 minutes VM will be rebooted to next available host. So by default VM take about 320 second to complete APD HA action.
Where as in case of PDL VM is terminated instantly and rebooted on next available host.