Upgrading to ESXi 7.0 build 18426014 U2c. ESXi stuck in Not responding from vCenter

by Espen Ødegaard · Read in about 4 min (719 words)

Guest Post #

Info

Espen Ødegaard

This is a guest post by Espen Ødegaard, Senior Systems Consultant for Proact.

You can find him on Twitter and LinkedIn. Espen is usually found in vmkernel.log, esxtop, sexigraf or vSAN Observer. Or eating, he eats a lot.

Well, not directly related to the new ESXi 7 U2c build, BUT if you’re one of the lucky ones (like me), you’ve been experiencing big issues with SD/USB device since ESXi U1/U2, hence the vCenter <> ESXi (at least the configurations, tokens, etc.) has not been working for a while. I mean, last time vCenter talked successfully with my host was back in May 2021, when the USB dropped. That’s like four (4!) months ago.

Today, VMware finally released the ESXi 7 U2c, which has included an updated module for the vmkusb. Hopefully this will fix the previous experienced SD/USB device issues, but before that - let’s talk patching!

Patch plan - quick & dirty #

As both LCM/VUM and/or esxcli is currently not able to patch ESXi since the current state of the USB is “broken” (due to the SD/USB issues), I planned to perform a quick reboot on the ESXi hosts, to first make the USB (boot device) accessible again, then proceed with patching, which should work (in theory), as it’s now able to update the images (VIBs) on the USB device.

What happened #

Verify USB/boot device not working, before proceeding with reboot

[root@esx-13:~] df -h
Error when running esxcli, return status was: 1
Errors:
Cannot open volume:
[root@esx-13:~] partedUtil getptbl /dev/disks/mpx.vmhba32\:C0\:T0\:L0
Unable to get device /dev/disks/mpx.vmhba32:C0:T0:L0

Comment: So yeah, this server needs reboot first, before able to talk to the boot device, which is USB (in my case listed as vmhba32).

So I rebooted the host. Checking Intel AMT/KVM, the host booted up again, still on ESXi 7 U2a - build 17867351 (with the bug) I waited a little, but it never re-connected in VMware vCenter. VMware vCenter was still showing the host as “Not responding” - but the host is actually back up and running.

ESXi Host Not Responding

So I quickly jumped into a SSH session on the host, and first verified that the bootbank & VMFS-L/OSDATA was available, and it was (hence, should be able to patch)

[root@esx-13:~] df -h
Filesystem  Size  Used Available Use% Mounted on
VMFS-L    20.8G  1.6G    19.1G  8% /vmfs/volumes/LOCKER-6092997b-cd2ada42-23c0-000c292b45b0
vfat        4.0G 202.7M      3.8G  5% /vmfs/volumes/BOOTBANK1
vfat        4.0G 208.3M      3.8G  5% /vmfs/volumes/BOOTBANK2
vsan        5.5T  1.8T      3.6T  34% /vmfs/volumes/mgmt-01-vsan

While manually reviewing vmkernel.log on the ESXi, I suddenly got this little alert - which originates from vRLI, sent to me via webhook, into my Slack-channel for monitoring.

Slack alert

So I jumped into vRLI, to verify, and yeah, the hostd process on the newly rebooted ESXi host throws a “vRealize Log Insight Error”

ESXi Host Not Responding

For a little more details, I manually checked the hostd.log under /var/log/hostd.log on the ESXi host, and I saw this

2021-08-24T15:11:00.907Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Activation [N5Vmomi10ActivationE:0x00000010f9616270] : Invoke done [login] on [vim.SessionManager:ha-sessionmgr]
2021-08-24T15:11:00.907Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg userName:
--> "vpxuser"
2021-08-24T15:11:00.908Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg password:
--> (not shown)
-->
2021-08-24T15:11:00.908Z verbose hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Arg locale:
--> ""
2021-08-24T15:11:00.908Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Throw vim.fault.InvalidLogin
2021-08-24T15:11:00.908Z info hostd[1052466] [Originator@6876 sub=Solo.Vmomi] Result:
--> (vim.fault.InvalidLogin) {
-->    msg = "",
--> }
2021-08-24T15:11:07.909Z error hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] [module:pam_lsass]pam_do_authenticate: error [login:vpxuser][error code:2]
2021-08-24T15:11:07.909Z error hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] [module:pam_lsass]pam_sm_authenticate: failed [error code:2]
2021-08-24T15:11:07.909Z warning hostd[1052023] [Originator@6876 sub=Default opID=HostSync-host-4384-351d8304-de-1bb3] Rejected password for user vpxuser from 127.0.0.1
2021-08-24T15:11:07.910Z info hostd[1052023] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=HostSync-host-4384-351d8304-de-1bb3] Event 148 : Cannot login vpxuser@127.0.0.1
2021-08-24T15:11:07.911Z info hostd[1051983] [Originator@6876 sub=Vimsvc.TaskManager opID=7ccb1bb5 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-150
2021-08-24T15:11:07.911Z info hostd[1052127] [Originator@6876 sub=Vimsvc.TaskManager opID=7ccb1bb5 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-150 Status success

For fun, I re-tried restarting the services for hostd & vpxa, but I got the same issue. Basically the ESXi was not able to successfully re-connect in vCenter automatically (like it usually does).

Quickfix #

Well, it’s an easy one this time - just re-connect (duuh!).

Reconnect host in vCenter

Doing a new “Connect” from vCenter, gives you an error on re-connecting, then a new login promt. Re-entering the credentials for the host, and boom - back online in vCenter.

You may now patch your babies to the latest ESXi 7.0 build 18426014 U2c, hopefully fixing the SD/USB issues permanently.


This is a post in the Guest Post series. Posts in this series:


Post last updated on September 9, 2021: Fix some frontmatter issues

About the author

Christian Mohn works as a Chief Technologist SDDC for Proact in Norway.

See his About page for more details, or find him on Twitter.

Sponsors