Adding support for manually configured fence_kdump

Fri Mar 14 07:17:23 UTC 2014

----- Original Message -----
> From: "Vivek Goyal" <vgoyal at redhat.com>
> To: "Martin Perina" <mperina at redhat.com>
> Cc: kexec at lists.fedoraproject.org
> Sent: Thursday, March 13, 2014 10:25:30 PM
> Subject: Re: Adding support for manually configured fence_kdump
> 
> On Thu, Mar 13, 2014 at 05:01:27PM -0400, Martin Perina wrote:
> 
> [..]
> > Each host is independent when running its own virtual machines, but engine
> > manages how and when are virtual machines started/stopped/migrated among
> > hosts.
> 
> I am assuming there is one engine across whole cluster. So who all needs
> to get notification that a host in cluster is saving dump. Just the engine
> or other hosts too?
> 
> Thanks
> Vivek
> 

We currently have generic mechanism for hard fencing which work this way:

1) If host1 become Non Responsive, engine will choose host2 (fencing proxy)
   from the same cluster on which configure fencing agent will be executed
2) Then engine communicates with VDSM on host2
     a) Get host1 status using fencing device
     b) If status if DOWN, goto e)
     c) If status is UP, shutdown host1
     d) Check host1 status until is DOWN
     e) Startup host1
     f) Check host1 status until it's UP

The process is more complicated, but that's the general idea.

With fence_kdump we have several options:
  1) Use same mechanism as for hard fencing
       - PROBLEM: we would need to update each host with list of all other
         hosts in cluster, because any of them could become fencing proxy
       - PROBLEM: sending messages to huge number of hosts in large clusters
                  may not be efficient
       - PROBLEM: updating all hosts and regenerating its kdump initramfs
                  for each add/remove host to/from cluster may not be
                  efficient

  2) Modify fence_kdump to be able to send notifications to level2
     broadcast address
       - PROBLEM: we would need a patch for fence_kdump which may or may
                  not be accepted
       - PROBLEM: not sure if all of our customers could use level 2
                  broadcast among cluster's hosts

  3) Execute fence kdump from engine host
       - PROBLEM: we would need the engine host to accessible for UDP
                  messaging on port 7410
       - PROBLEM: we would need to write our own fence_kdump message
                  listener in case when multiple fence_kdump notifications
                  would be needed to resolve at the same time

We are currently discussing which option would be best, but for all options
we need possibility to configure fence_kdump without Pacemake cluster config.

Martin