Upgrading to ESXi 7.0 build 18426014 U2c. ESXi stuck in Not responding from vCenter

Background

Well, not directly related to the new ESXi 7 U2c build, BUT if you’re one of the lucky ones (like me), you’ve been experiencing major issues with SD/USB device since ESXi U1/U2, hence the vCenter <> ESXi (at least the configurations, tokens, etc.) has not been working for a while. I mean, last time vCenter talked successfully with my host was back in May 2021, when the USB dropped. That’s like four (4!) months ago.

Today, VMware finally released the ESXi 7 U2c, which has included an updated module for the vmkusb. Hopefully this will fix the previous experienced SD/USB device issues, but before that - let’s talk patching!

Patch plan - quick & dirty

As both LCM/VUM and/or esxcli is currently not able to patch ESXi since the current state of the USB is “broken” (due to the SD/USB issues), I planned to perform a quick reboot on the ESXi hosts, to first make the USB (boot device) accessible again, then proceed with patching, which should work (in theory), as it’s now able to update the images (VIBs) on the USB device.

What happened

Verify USB/boot device not working, before proceeding with reboot

[root@esx-13:~] df -h
Error when running esxcli, return status was: 1
Errors:
Cannot open volume:

[root@esx-13:~] partedUtil getptbl /dev/disks/mpx.vmhba32\:C0\:T0\:L0
Unable to get device /dev/disks/mpx.vmhba32:C0:T0:L0

Comment: So yeah, this server needs reboot first, before able to talk to the boot device, which is USB (in my case listed as vmhba32).

So I rebooted the host. Checking Intel AMT/KVM, the host booted up again, still on ESXi 7 U2a - build 17867351 (with the bug) I waited a little, but it never re-connected in VMware vCenter. VMware vCenter was still showing the host as ”Not responding” - but the host is actually back up and running.

So I quickly jumped into a SSH session on the host, and first verified that the bootbank & VMFS-L/OSDATA was available, and it was (hence, should be able to patch)

[root@esx-13:~] df -h
Filesystem  Size  Used Available Use% Mounted on
VMFS-L    20.8G  1.6G    19.1G  8% /vmfs/volumes/LOCKER-6092997b-cd2ada42-23c0-000c292b45b0
vfat        4.0G 202.7M      3.8G  5% /vmfs/volumes/BOOTBANK1
vfat        4.0G 208.3M      3.8G  5% /vmfs/volumes/BOOTBANK2
vsan        5.5T  1.8T      3.6T  34% /vmfs/volumes/mgmt-01-vsan

While manually reviewing vmkernel.log on the ESXi, I suddenly got this little alert - which originates from vRLI, sent to me via webhook, into my Slack-channel for monitoring.

So I jumped into vRLI, to verify, and yeah, the hostd process on the newly rebooted ESXi host throws a “vRealize Log Insight Error”

For a little more details, I manually checked the hostd.log under /var/log/hostd.log on the ESXi host, and I saw this

2021-08-24T15:11:00.907Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Activation [N5Vmomi10ActivationE:0x00000010f9616270] : Invoke done [login] on [vim.SessionManager:ha-sessionmgr]
2021-08-24T15:11:00.907Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg userName:
--> "vpxuser"
2021-08-24T15:11:00.908Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg password:
--> (not shown)
-->
2021-08-24T15:11:00.908Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg locale:
--> ""
2021-08-24T15:11:00.908Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Throw vim.fault.InvalidLogin
2021-08-24T15:11:00.908Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Result:
--> (vim.fault.InvalidLogin) {
-->    msg = "",
--> }
2021-08-24T15:11:07.909Z error hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] [module:pam_lsass]pam_do_authenticate: error [login:vpxuser][error code:2]
2021-08-24T15:11:07.909Z error hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] [module:pam_lsass]pam_sm_authenticate: failed [error code:2]
2021-08-24T15:11:07.909Z warning hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] Rejected password for user vpxuser from 127.0.0.1
2021-08-24T15:11:07.910Z info hostd[1052023] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=HostSync-host-4384-351d8304-de-1bb3] Event 148 : Cannot login vpxuser@127.0.0.1
2021-08-24T15:11:07.911Z info hostd[1051983] [Originator@6876 sub=Vimsvc.TaskManager opID=7ccb1bb5 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-150
2021-08-24T15:11:07.911Z info hostd[1052127] [Originator@6876 sub=Vimsvc.TaskManager opID=7ccb1bb5 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-150 Status success

For fun, I re-tried restarting the services for hostd & vpxa, but I got the same issue. Basically the ESXi was not able to successfully re-connect in vCenter automatically (like it usually does).

Quickfix

Well, it’s an easy one this time - just re-connect (duuh!).

Doing a new “Connect” from vCenter, gives you an error on re-connecting, then a new login promt. Re-entering the credentials for the host, and boom - back online in vCenter.

You may now patch your babies to the latest ESXi 7.0 build 18426014 U2c, hopefully fixing the SD/USB issues permanently.

🪴 espenodegaard

Explorer

Upgrading to ESXi 7.0 build 18426014 U2c. ESXi stuck in Not responding from vCenter

Background

Patch plan - quick & dirty

What happened

Quickfix

Graph View

Table of Contents

Backlinks

Recent articles

Network Time Protocol

SexiGraf

VMware ESXi Scratch Partition

🪴 espenodegaard

Explorer

Upgrading to ESXi 7.0 build 18426014 U2c. ESXi stuck in Not responding from vCenter

Background §

Patch plan - quick & dirty §

What happened §

Quickfix §

Graph View

Table of Contents

Backlinks

Recent articles

Network Time Protocol

SexiGraf

VMware ESXi Scratch Partition

Background

Patch plan - quick & dirty

What happened

Quickfix