How to rescue instances?
Openstack offers a rescue mode to recover VMs. It is a command that allows for a different image to boot a VM. This can be used when the virtual machine fails to boot due to a kernel panic, full disk, or when you simply lost access to the private key. By allowing you to boot from a different image, you will be able to mount and edit the files on your current disk and fix the problem.
Symptoms
Kernel Panic
Check your instance Console Log (web UI: Instances > <your instance>
> Log)
[ 1.041853] Loading compiled-in X.509 certificates
[ 1.043433] Loaded X.509 cert 'CentOS Linux kpatch signing key:ea0413152cde1d98ebdca3fe6f0230904c9ef717'
[ 1.046556] Loaded X.509 cert 'CentOS Linux Driver update signing key:7f421ee0ab69461574bb358861dbe77762a4201b'
[ 1.050310] Loaded X.509 cert 'CentOS Linux kernel signing key:d4115f110055db56c8d605ab752173cfb1ac54d8'
[ 1.053448] registered taskstats version 1
[ 1.055861] Key type trusted registered
[ 1.057771] Key type encrypted registered
[ 1.059249] IMA: No TPM chip found, activating TPM-bypass! (rc=-19)
[ 1.061680] Magic number: 14:548:18
[ 1.063246] ep_81: hash matches
[ 1.064844] rtc_cmos 00:00: setting system clock to 2018-08-23 08:02:54 UTC(1535011374)
[ 1.067954] md: Waiting for all devices to be available before autodetect
[ 1.069982] md: If you don't use raid, use raid=noautodetect
[ 1.072041] md: Autodetecting RAID arrays.
[ 1.073689] md: autorun ...
[ 1.074976] md: ... autorun DONE.
[ 1.076358] List of all partitions:
[ 1.077771] No filesystem could mount root, tried:
[ 1.079600] Kernel panic - not syncing: VFS: Unable to mount root fs onunknown-block(0,0)
[ 1.082286] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.10.0-862.11.6.el7.x86_64 #1
[ 1.085033] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[ 1.087639] Call Trace:
[ 1.088800] [<ffffffff871135d4>] dump_stack+0x19/0x1b
[ 1.090453] [<ffffffff8710d11f>] panic+0xe8/0x21f
[ 1.091982] [<ffffffff8776c761>] mount_block_root+0x291/0x2a0
[ 1.093704] [<ffffffff8776c7c3>] mount_root+0x53/0x56
[ 1.095394] [<ffffffff8776c902>] prepare_namespace+0x13c/0x174
[ 1.097281] [<ffffffff8776c3df>] kernel_init_freeable+0x1f8/0x21f
[ 1.099244] [<ffffffff8776bb1f>] ? initcall_blacklist+0xb0/0xb0
[ 1.101131] [<ffffffff87101bc0>] ? rest_init+0x80/0x80
[ 1.102813] [<ffffffff87101bce>] kernel_init+0xe/0xf0
[ 1.104497] [<ffffffff871255f7>] ret_from_fork_nospec_begin+0x21/0x21
[ 1.106367] [<ffffffff87101bc0>] ? rest_init+0x80/0x80
[ 1.107997] Kernel Offset: 0x5a00000 from 0xffffffff81000000 (relocation range:0xffffffff80000000-0xffffffffbfffffff)
The log says that the instance couldn't boot because it can't find root "Kernel panic - not syncing: VFS: Unable to mount root fs onunknown-block(0,0)". The fix is to use (some) previous, working kernel. Since you can't boot the server, you have to make the fix to the Volume (boot files) by using another instance.
Access denied
The problem can be as simple as:
$ ssh cloud-user@<floating-ip>
cloud-user@<floating-ip>: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
How to fix the issue, nova rescue
Note that there are always several ways to fix any problem, this FAQ is mainly meant to show one of the ways to fix these kinds of problems. Also meanwhile you are allowed to edit Grub boot parameters, the root single mode access is disabled by default for security reasons. The procedure to perform a rescue is as follows:
-
You need to have installed the OpenStack command line tools. And you have to login, and see Configure your terminal environment for OpenStack for reference.
-
Get the server's ID, and store it in an environment variable called:
INSTANCE_UUID
:$ openstack server list +--------------------------------------+-----------+--------+----------------------------+-------+----------------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+-----------+--------+----------------------------+-------+----------------+ | 55555566-ffff-4a52-5735-356251902325 | comp1 | ACTIVE | net=192.168.211.211 | | standard.small | +--------------------------------------+-----------+--------+----------------------------+-------+----------------+
-
Get the image ID. You can store the ID into an environment variable
IMAGE_UUID
. You should use the same image as your instance: (The ID may vary from the example below)$ openstack image list +--------------------------------------+----------------------+--------+ | ID | Name | Status | +--------------------------------------+----------------------+--------+ | 56b70226-0c52-48c6-973f-3f726b5e7dc0 | CentOS-7 | active | | 2d20266d-43f7-499e-b6e6-090b09416b16 | CentOS-7-Cuda | active | | c80adfec-05a8-4c42-8922-4bccdf90df40 | CentOS-8-Stream | active | | 2ca237c5-bd0a-4469-ae9f-20878dd288a9 | Fedora Cloud Base 31 | active | | ee19819d-17d5-4f71-ac38-e024d046eb6a | Ubuntu-18.04 | active | | 668d235f-e6e4-421d-964c-0016f9560206 | Ubuntu-20.04 | active | | aea0bf58-85fb-4f9c-b2ea-ffa6c7a07c02 | Ubuntu-22.04 | active | | 3a9aad67-0f9c-4493-b574-17fe28d40afc | cirros | active | +--------------------------------------+----------------------+--------+
-
Shutdown the instance:
openstack server stop $INSTANCE_UUID
-
Check that the VM is stopped:
openstack server list
The Status should be
SHUTOFF
-
You are now ready to launch the rescue of the instance:
openstack server rescue --image $IMAGE_UUID $INSTANCE_UUID
-
Make sure that the instance is in rescue mode with:
openstack server list
The Status should be
RESCUE
Connecting
Using ssh
The rescue image will get the same SSH keys as configured in the VM you are rescuing, so you should be able to ssh into the instance, using the same user and IP as the normal ones.
ssh <default-user>@<floating-ip>
You will get this warning: WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!
. This is what is called the host keys
, they are stored in the VM's disk, and they change because you are booting using a different disk. Fix it by removing the line of your instance IP address from the file ~/.ssh/known_hosts
. An alternative way is the execution of the following command:
ssh-keygen -f "~/.ssh/known_hosts" -R "$INSTANCE_IP"
Using Pouta's web console (with Cirros)
In some cases, like when you lose the private SSH key, you will need to use Pouta's web console. For this to work, you need to select the Cirros image in the step 3 above.
Login in Pouta's web interface: https://pouta.csc.fi. Look for your instance and click in console
.
The username and password should be printed in the console text, above the login.
Cirros
The Cirros image is a small Linux distribution image with limited software support and security updates. It should be used only for rescue operations when normal SSH access is not possible.
Mount the disk
-
Check what volumes you have. If you don't have any other volumes attached it should look something like this:
$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 253:0 0 10G 0 disk └─vda1 253:1 0 10G 0 part vdb 253:16 0 80G 0 disk └─vdb1 253:17 0 80G 0 part /
-
Now you want to mount
vdb1
to/tmp/mnt
and go to that directory:$ sudo mkdir -p /tmp/mnt $ sudo mount /dev/vdb1 /tmp/mnt/
Change bootloader (Grub)
-
Take a backup of grub:
$ cp /tmp/mnt/boot/grub2/grub.cfg /tmp/mnt/root/grub.cfg.bak-$(date +"%F")
-
Open
/tmp/mnt/boot/grub2/grub.cfg
with your favorite text editor. Remove the firstmenuentry
section.NOTE: This might not be the correct solution for your specific problem. The first menuentry is normally your latest and default kernel.
Use chroot
to change the /
folder
In case that your instance has issues due to some broken packages or drivers, then you can switch to your original and fix the problems using the following commands:
$ sudo mv /tmp/mnt/etc/resolv.conf{,.bak}
$ sudo cp /etc/resolv.conf /tmp/mnt/etc/resolv.conf
$ sudo chroot /tmp/mnt
The chroot
has now changed your root folder /
to /tmp/mnt/
(your VM's disk partition). And can do any fix or change like uninstalling or reinstalling a package.
Get out of rescue
-
Log out from the instances and
unrescue
the instance:openstack server unrescue $INSTANCE_UUID
-
It would be a good idea to verify that a restart works after the kernel reinstallation:
ssh <default-user>@<floating-ip> reboot
wait to boot and ssh to it again:
ssh <default-user>@<floating-ip>
It should work as before the incident happened.