Tuesday, March 3, 2015

LibVirt Fencing - RHEL Virtual Machine High Availability


To configure fencing for virtual machines running on RHEL 6 host with libvirt you can configure a fencing device of Fence virt (Multicast Mode) type.

fence_virt and fence_xvm are an I/O Fencing agents which can be used with virtual machines. 

Libvirt fencing in multicast works by sending a fencing request signed with a shared secret key to the libvirt multicast group, in this case the hypervisor running the virtual machine will need to have a daemon to handle this.

On your host machine or hypervisor

1) install the fence-virtd, fence-virtd-libvirt and fence-virtd-multicast packages.

yum -y install fence-virtd{,-libvirt, -multicast}

2) Create a shared secret key /etc/cluster/fence_xvm.key, you will need to create /etc/cluster directory first

mkdir -p /etc/cluster

dd if=/dev/urandom  of=/etc/cluster/fence_xvm.key bs=1k  count=4


3) Configure the fence_virtd daemon

fence_virtd -c

Module search path [/usr/lib64/fence-virt]:

Available backends:
    libvirt 0.1
Available listeners:
    serial 0.4
    multicast 1.1

Listener modules are responsible for accepting requests
from fencing clients.

Listener module [multicast]:

The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
multicast.

The multicast address is the address that a client will use to
send fencing requests to fence_virtd.

Multicast IP Address [225.0.0.12]:

Using ipv4 as family.

Multicast IP Port [1229]:

Setting a preferred interface causes fence_virtd to listen only
on that interface.  Normally, it listens on the default network
interface.  In environments where the virtual machines are
using the host machine as a gateway, this *must* be set
(typically to virbr0).
Set to 'none' for no interface.

Interface [br0]:

The key file is the shared key information which is used to
authenticate fencing requests.  The contents of this file must
be distributed to each physical host and virtual machine within
a cluster.

Key File [/etc/cluster/fence_xvm.key]:

Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.

Backend module [libvirt]:

The libvirt backend module is designed for single desktops or
servers.  Do not use in environments where virtual machines
may be migrated between hosts.

Libvirt URI [qemu:///system]:

Configuration complete.

=== Begin Configuration ===
fence_virtd {
    listener = "multicast";
    backend = "libvirt";
    module_path = "/usr/lib64/fence-virt";
}

listeners {
    multicast {
        key_file = "/etc/cluster/fence_xvm.key";
        address = "225.0.0.12";
        family = "ipv4";
        port = "1229";
        interface = "br0";
    }

}

backends {
    libvirt {
        uri = "qemu:///system";
    }

}

=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y

Please  note that in my setup I use br0 for my network interface, yours might have different interface.

4) Enable and start the fence_virtd service on your hypervisor

chkconfig fence_virtd on ; service fence_virtd start

5) distribute the /etc/cluster/fence_xvm.key to the cluster nodes in /etc/cluster/ folder

In my setup I have two nodes,

virsh list --all

  Id    Name                           State
 ----------------------------------------------------
 1     node1                          running
 2     node2                          running





On the cluster nodes, check workability of the fencing device, example here on node1 get the list of cluster nodes with

fence_xvm -o list

node1                f238c4a1-d6ce-e920-a5af-70fbc62b3203 on
node2                608b396f-becb-5e54-081a-692301aee064 on


On node1 try fencing the other node

fence_node node2

If you configuration works, the second node shall be rebooted by the host, i.e the running node ask the hypervisor to reboot the other node.


Since this tutorial is not on step by step creating cluster on rhel but more on fence_libvirt, the configuration (/etc/cluster/cluster.conf) for my cluster as below:

<?xml version="1.0"?>
<cluster config_version="39" name="mycluster">
    <clusternodes>
        <clusternode name="exis01.ex.net.my" nodeid="1">
            <fence>
                <method name="Method">
                    <device domain="exis01.ex.net.my" name="fencexvm"/>
                </method>
            </fence>
        </clusternode>
        <clusternode name="exis02.ex.net.my" nodeid="2">
            <fence>
                <method name="Method">
                    <device domain="exis02.ex.net.my" name="fencexvm"/>
                </method>
            </fence>
        </clusternode>
    </clusternodes>
    <cman expected_votes="1" transport="udpu" two_node="1"/>
    <fencedevices>
        <fencedevice agent="fence_xvm" name="fencexvm"/>
    </fencedevices>
    <rm>
        <failoverdomains>
            <failoverdomain name="anynode" nofailback="1" ordered="1">
                <failoverdomainnode name="exis01.ex.net.my" priority="1"/>
                <failoverdomainnode name="exis02.ex.net.my" priority="1"/>
            </failoverdomain>
        </failoverdomains>
        <resources>
            <ip address="172.16.16.32" sleeptime="3"/>
        </resources>
        <service domain="anynode" name="myclusterha" recovery="relocate">
            <ip ref="172.16.16.32"/>
        </service>
    </rm>
    <logging>
        <logging_daemon debug="on" name="rgmanager"/>
    </logging>
</cluster>