Hot Add NVMe Device Caused PSOD on ESXi

I recently had a case of “go with your gut” when we added some new NVMe disks to an existing VMware vSAN solution at a customer.

Normally I’m very cautious and will put hosts into maintenance mode, no matter how small the hardware change I’m doing is, but against my better judgement this time I decided to hot add some disks (which of course is supported). However, I fumbled and managed to insert it and quickly remove it again before inserting it again, and ended up with a dreaded Purple Screen of Death (PSOD) on the host.

Naturally this freaked me out and I was eager to figure out what the problem was. Searching through the KBs at VMware didn’t give me any clues, but a quick Google search took me to the ESXi 7.0u2c Release Notes:

PR 2708326: If an NVMe device is hot added and hot removed in a short interval, the ESXi host might fail with a purple diagnostic screen. If an NVMe device is hot added and hot removed in a short interval, the NVMe driver might fail to initialize the NVMe controller due to a command timeout. As a result, the driver might access memory that is already freed in a cleanup process. In the backtrace, you see a message such as WARNING: NVMEDEV: NVMEInitializeController:4045: Failed to get controller identify data, status: Timeout.
Eventually, the ESXi host might fail with a purple diagnostic screen with an error similar to #PF Exception … in world …:vmkdevmgr. This issue is resolved in this release.”

Luckily, there were no more errors after hot adding the disks and rebooting the host, so the next step is of course some patching.

I did not experience the same issue on any of the other hosts in that cluster, probably due to steadier hands or less caffeine in my bloodstream.

ESXi SD-Card/USB boot devices unsupported in 7.0u3

2 September 2021·603 words·3 mins

Christian Mohn

ESXi VMware vSphere

Upgrading to ESXi 7.0 build 18426014 U2c. ESXi stuck in Not responding from vCenter

24 August 2021·707 words·4 mins

ESXi VMware vSphere esod

ESXi 7.0 SD Card/USB Drive Issue Temporary Workaround

1 June 2021·717 words·4 mins

ESXi VMware vSphere esod

Related