2021-01-10

Non responding Azure VM

It might happen to your virtual machine in Azure Cloud too. It gets stuck without responding to enabled services like SSH, HTTP even pings; in Microsoft words "VM was not responding to any means of communication".

Symptoms: There is no answer from the public IP range as same as from a private network from a machine on the same VNET and subnet. A tricky part is a machine in the portal looks up and running, stopping, and restarting. 

I opened the M$ support case because it was another occasion of the same behavior and I wanted to know the answer. We went through a classic scenario: Restart, deallocation, and redeploy via the portal. The extra task was the restart VM's from the serial console but the serial console did not come up even after a reboot.
Short answer: one of the disks was incorrectly mounted. One of the logs was containing crucial information as "Reached target Emergency Mode" followed by
Failed to mount /var/lib/docker. See 'systemctl status var-lib-docker.mount' for details. Dependency failed for Local File Systems.
That failed mount is preventing VM to boot, it needs to be fixed as described below. We had to create a rescue VM for which we used the OS disk of the impacted VM to create a new VM and it worked.

Solution: add to mount point /etc/fstab an item -nofail. Save and exit. Detach drive and do OS swap for the machine. The reboot should be OK and the machine should be online.

How to rescue the VM
  • Take the snapshot of OS disk - a full snapshot 
    • In disks – Created new disk using source path as a snapshot.
    • Verify the size of the disk and the type of disk used and used the same size and type to create a new disk.
  • Attach the disk to an existing Redhat VM, swap the disk, and mount it.
These few hints might help you to get rid of the troubles.

Žádné komentáře :

Okomentovat

Dotaz, připomínka, oprava?
(pokud máte problém s vložením příspěvku, vyzkoušejte to v prohlížeči Chrome)