Guest Post #
This is a guest post by Espen Ødegaard, Senior Systems Consultant for Proact.
Workaround per 01. June 2021
As VMware has not released a fix yet (regarding issues with SD card and USB drive), I’m still experiencing issues with ESXi 7.0 U2a Potentially Killing USB and SD drives, running from USB or SD card installs. As previous workaround (copying VMware Tools to RAMDISK with option ToolsRamdisk) only worked for 8 days (in my case), I needed something more “permanent”, to get the ESXi-hosts more “stable” (e.g. host being able to enter maintenance mode, move VMs around, snapshots/backup, doing CLI-stuff/commands, etc.).
See ESXi 7.0 SD Card/USB Drive Issue Temporary Workaround for details.
After upgrading my 4-node vSAN-cluster (homelab) to ESXi 7.0 build 17867351 U2a, I detected that ESXi had issues talking to the USB device, where ESXi was installed. I found a related KB from VMware, outlining issues with the new VMFS-L, which started my baseline for troubleshooting VMFS-L Locker partition corruption on SD cards in ESXi 7.0 (83376)
In short, it says that the VMFS-L partition may have become corrupt, and a re-install is needed. There is no resolution for the SD card corruption as of the time this article was published
Mentioned workaround, suggesting moving the scratch partition, is not applicable in my case, as I’ve already verified that my scratch partition is running from RAMDISK.
Verify scratch mountpoint
[root@esx-13:~] vmkfstools -Ph /scratch/ visorfs-1.00 (Raw Major Version: 0) file system spanning 1 partitions. File system label (if any): Mode: private Capacity 3.9 GB, 3.1 GB available, file block size 4 KB, max supported file size 0 bytes Disk Block Size: 4096/4096/0 UUID: 00000000-00000000-0000-000000000000 Partitions spanned (on "notDCS"): memory Is Native Snapshot Capable: NO
List content of the VMFS-L partition (LOCKER)
I also ran a quick
findcommand (from another working host), to get all contents of the VMFS-L mounted partition. Notice that the vmtoolsRepo packages are located here.
[root@esx-11:~] find /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/ /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/ /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.fbb.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.fdc.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.pbc.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.sbc.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.vh.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.pb2.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.sdd.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/.jbc.sf /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/vibs /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/vibs/tools-light--2910230392612735297.xml /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/vibs/tools-light--2910230392612735297.xml.sig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/vibs/tools-light--2910230392612735297.xml.orig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/bulletins /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/profiles /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/baseimages /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/addons /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/solutions /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/manifests /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/reservedComponents /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/var/db/locker/reservedVibs /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/floppies /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/floppies/pvscsi-Windows2008.flp /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/floppies/pvscsi-Windows8.flp /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/floppies/pvscsi-WindowsVista.flp /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/isoimages_manifest.txt /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/isoimages_manifest.txt.sig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/linux.iso /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/linux.iso.sig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/linux_avr_manifest.txt /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/linux.iso.sha /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/linux_avr_manifest.txt.sig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/windows.iso /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/windows.iso.sha /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/windows.iso.sig /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/windows_avr_manifest.txt /vmfs/volumes/LOCKER-6092ba2b-1fdb3f52-337c-000c292b45b0/packages/vmtoolsRepo/vmtools/windows_avr_manifest.txt.sig
Getting an host with issues in maintenance mode — physically remove the USB device first #
Getting the ESXi hosts with USB issues into maintenance mode, was also a little tricky. Used to doing things “remote”, I wanted to try evacuating the VMs the usual way (just enter maintenance mode, and DRS will handle the rest), but this was a no-go. While entering maintenance mode, the VMs would start being vMotioned (job status), but nothing actually happened. All VMs “started” the Migrating/vMotion job (status 9%, or 12% in vCenter), but checking the host with esxtop, under network, I found that no traffic was occuring on the vMotion interface, which usually is at full pipe, when vMotion occurs.
Re-checking the logs, the issues with USB repeated, again and again. I thought I’d try to physically remove the USB device from the host, as this would trigger an “proper” All Paths Down (APD) on the USB device.
So I physically removed the USB device. Waited 2-3 minutes, and boom - the vMotion process finished at once. Digging into the logs (again,
/var/log/vmkernel.log has the answers), I could verify the APD event.
2021-05-15T14:00:03.326Z cpu7:1048720)StorageApdHandler: 606: APD timeout event for 0x43040c4c34d0 [mpx.vmhba32:C0:T0:L0] 2021-05-15T14:00:03.326Z cpu7:1048720)StorageApdHandlerEv: 126: Device or filesystem with identifier [mpx.vmhba32:C0:T0:L0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast faile$ 2021-05-15T14:00:03.326Z cpu3:1048731)ScsiDeviceIO: 4277: Cmd(0x4578c1283080) 0x1a, CmdSN 0x93be0 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 2021-05-15T14:00:03.326Z cpu3:1048731)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "mpx.vmhba32:C0:T0:L0" is blocked. Not starting I/O from device. 2021-05-15T14:00:03.326Z cpu3:1055182)LVM: 6817: Forcing APD unregistration of devID 6092ba2b-13467d16-8d9c-000c292b45b0 in state 1. 2021-05-15T14:00:03.326Z cpu3:1055182)LVM: 6192: Could not open device mpx.vmhba32:C0:T0:L0:7, vol [6092ba2a-e004ead6-09c5-000c292b45b0, 6092ba2a-e004ead6-09c5-000c292b45b0, 1]: No connection 2021-05-15T14:00:03.326Z cpu3:1055182)Vol3: 2129: Could not open device 'mpx.vmhba32:C0:T0:L0:7' for volume open: Not found 2021-05-15T14:00:03.326Z cpu3:1055182)Vol3: 4339: Failed to get object 28 type 1 uuid 6092ba2b-1fdb3f52-337c-000c292b45b0 FD 0 gen 0 :Not found 2021-05-15T14:00:03.326Z cpu3:1055182)WARNING: Fil3: 1534: Failed to reserve volume f533 28 1 6092ba2b 1fdb3f52 c00337c b0452b29 0 0 0 0 0 0 0 2021-05-15T14:00:03.326Z cpu3:1055182)Vol3: 4339: Failed to get object 28 type 2 uuid 6092ba2b-1fdb3f52-337c-000c292b45b0 FD 4 gen 1 :Not found 2021-05-15T14:00:03.326Z cpu4:2205969)VFAT: 5144: Failed to get object 36 type 2 uuid 4365f3f4-494e65bd-7b92-e7c78fac244e cnum 0 dindex fffffffecdate 0 ctime 0 MS 0 :No connection 2021-05-15T14:00:03.326Z cpu3:1051988)LVM: 6817: Forcing APD unregistration of devID 6092ba2b-13467d16-8d9c-000c292b45b0 in state 1. 2021-05-15T14:00:03.326Z cpu3:1051988)LVM: 6817: Forcing APD unregistration of devID 6092ba2b-13467d16-8d9c-000c292b45b0 in state 1.
So I got both hosts in maintenance mode, and rebooted. Everything was working again.
New findings, from an old value #
Continuing my research, I stumbled upon a new thread in the Dell Communities (VMware 7.0 U2 losing contact with SD card, where VMware Support sent a workaround from an older KB, related to moving vmtoolsrepo to RAMDISK. High frequency of read operations on VMware Tools image may cause SD card corruption (2149257)
In ESXi 6.0 Update 3 and later, changes were made to reduce the number of read operations being sent to the SD card, an advanced parameter was introduced that allows you to migrate your VMware tools image to ramdisk on boot . This way, the information is read only once from the SD card per boot cycle.
Note: Even though KB2149257 currently only targets ESXi 6.0 and 6.5 (doesn’t mention ESXi 7.0 at all, as of time of writing), I’m guessing the same workaround now may apply in ESXi 7.0 U1+. Especially if the old “throttle” (fix in 6.0 U3) now is removed, while continuing improving the new VMFS-L.
Applying the workaround — adding option ToolsRamdisk #
As mentioned in KB2149257, I added the
ToolsRamdisk option on all hosts with ESXi 7.0 build 17867351 U2a
- Creating the option first
esxcfg-advcfg -A ToolsRamdisk --add-desc "Use VMware Tools repository from /tools ramdisk" --add-default "0" --add-type 'int' --add-min "0" --add-max "1"
- Setting the value to 1
esxcli system settings advanced set -o /UserVars/ToolsRamdisk -i 1
- Verifiying the value is set
esxcli system settings advanced list -o /UserVars/ToolsRamdisk
- Reboot the host (as setting applies at boot)
Verify new tools mountpoint running from RAMDISK #
After a reboot, I found the newly created mountpoint located under
/tools. Checking the location with
vmkfstools -Ph, we can see that it’s mounted in a RAMDISK.
Checking mountpoint with
[root@esx-11:~] ls -hal /tools/ total 16 drwxrwxrwt 1 root root 512 May 18 14:56 . drwxr-xr-x 1 root root 512 May 18 18:18 .. drwxr-xr-x 1 root root 512 May 18 14:56 floppies drwxr-xr-x 1 root root 512 May 18 14:56 vmtools
Getting mountpoint location with
[root@esx-11:~] vmkfstools -Ph /tools/ visorfs-1.00 (Raw Major Version: 0) file system spanning 1 partitions. File system label (if any): Mode: private Capacity 4.2 GB, 3.2 GB available, file block size 4 KB, max supported file size 0 bytes Disk Block Size: 4096/4096/0 UUID: 00000000-00000000-0000-000000000000 Partitions spanned (on "notDCS"): memory Is Native Snapshot Capable: NO
vmkernel.log for boot events, containg the word “tools”
# Check vmkernel.log for tools-related hits [root@esx-11:~] cat /var/log/vmkernel.log|grep -i tools 2021-05-18T14:55:44.765Z cpu7:1048823)SchedVsi: 2098: Group: host/vim/vimuser/vmtoolsd(1725): min=46 max=46 minLimit=46, units: mb 2021-05-18T14:56:02.361Z cpu2:1048852)Activating Jumpstart plugin vmtoolsRepo. 2021-05-18T14:56:02.399Z cpu3:1049894)VisorFSRam: 871: tools with (0,286,0,256,1777) 2021-05-18T14:56:02.399Z cpu3:1049894)FSS: 8565: Mounting fs visorfs (430547881820) with -o 0,286,0,256,0,01777,tools on file descriptor 43054e9b9230 2021-05-18T14:56:15.302Z cpu3:1048852)Jumpstart plugin vmtoolsRepo activated. 2021-05-18T14:56:21.821Z cpu6:1050194)Starting service vmtoolsd 2021-05-18T14:56:21.830Z cpu6:1050194)Activating Jumpstart plugin vmtoolsd. 2021-05-18T14:56:21.852Z cpu4:1050194)Jumpstart plugin vmtoolsd activated.
Listing content of the
[root@esx-11:~] find /tools/ /tools/ /tools/floppies /tools/floppies/pvscsi-WindowsVista.flp /tools/floppies/pvscsi-Windows2008.flp /tools/floppies/pvscsi-Windows8.flp /tools/vmtools /tools/vmtools/windows.iso.sig /tools/vmtools/linux.iso.sha /tools/vmtools/linux_avr_manifest.txt.sig /tools/vmtools/isoimages_manifest.txt.sig /tools/vmtools/linux.iso /tools/vmtools/linux_avr_manifest.txt /tools/vmtools/isoimages_manifest.txt /tools/vmtools/windows.iso /tools/vmtools/windows_avr_manifest.txt.sig /tools/vmtools/windows_avr_manifest.txt /tools/vmtools/windows.iso.sha /tools/vmtools/linux.iso.sig
So yeah, there you have it. Perhaps using the standard profile on USB was a bad idea (which includes the VMware Tools - vs the “no-tools” profile). Usually I use the “no-tools” profile for USB installs, but I recently switched my USB devices to better SanDisk Ultra Fit SDCZ430-032G-G46 devices, which I thought was way better, and more stable.
Bonus: Tips on proactivly detecting issues on existing USB and SD card installs #
Tips: The followup might apply if there is issues with the USB or SD card in your environment
- Running command
df -hfrom CLI will get stuck, or fail, for the LOCKER mount (VMFS-L partition)
- Checking the hosts logfile
/var/log/vmkernel.log, you’ll notice entries similair to this
2021-05-15T13:48:27.674Z cpu6:1048743)ScsiDeviceIO: 4315: Cmd(0x4578c12ad880) 0x1a, cmdId.initiator=0x45389cb1a6f8 CmdSN 0x93a68 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x5 D:0x0 P:0x0 Cancelled from path layer. Cmd count Active:1
I suggest setting up a vRLI alert on exact match on
Cancelled from path layer. Cmd count Active, which I only found on faulty hosts, for now. I’ve actually set up a webhook alert. So if any USB issues arises, I immediatly get notified in my Slack channel, so I can react on it early on.
In summary #
- Installing ESXi, using the no-tools-image (e.g.
ESXi-70U2a-17867351-no-tools) is probably better suited for USB/SD-card installs, and maybe not require the option/workaround provided above.
- User setting
/UserVars/ToolsRamdiskoutlined in KB2149257 loads vmtools to RAMDISK at boot (mounts under
/tools), possible preventing burning out USB drives & SD cards (well, time will tell).
- A funeral may be needed for my USB devices.
- ESXi 7.0 SD Card/USB Drive Issue Temporary Workaround —
- Searching vCenter Tasks and Events via PowerShell and GridView —
- ESXi 7.0 U2a Potentially Killing USB and SD drives! —
- ESXi: Error Occurred While Saving Snapshot Msg.changetracker —
- The Curious Case of the Intel Microcode Part #2 - It Gets Better — Then Worse —
- The Curious Case of the Intel Microcode —
- Relax and virtualize it! —
- HP Proliant DL380p Gen8 "Decompressed MD5" error —
- ESXi 7.0 SD Card/USB Drive Issue Temporary Workaround —
- VMware vSAN 7.0 Update 2 Announced —
- VMware vSAN 7.0 Update 2 Videos —
- VMware vSphere 7.0 Update 2 Announced —
- VMware vSphere 7.0 Update 2 Videos —