Hello Adam Litke, Allon Mureinik,
I'd like you to do a code review. Please visit
https://gerrit.ovirt.org/46484
to review the following change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
safelease: Unbreak safelease on systemd
spmprotect.sh depends on Vdsm pid file for fencing Vdsm when the SPM lease cannot be renewed. However, on systemd, we do not create Vdsm pid file. When spmprotect tries to fence Vdsm it fails and reboots the host, killing all running vms. This issue effects only v1 storage domains, typically old systems using dc compatibility version 3.0.
Now we pass vdsm pid to spmprotect.sh helper script via command line argument, restoring safelease operation.
Adding another argument to a script with 9 arguments is ugly, but I don't want to make risky changes to this delicate and critical code.
Change-Id: I230b6909781269531eab3d71b516b28ab22de856 Bug-Url: https://bugzilla.redhat.com/1222564 Signed-off-by: Nir Soffer nsoffer@redhat.com Reviewed-on: https://gerrit.ovirt.org/45963 Continuous-Integration: Jenkins CI Reviewed-by: Allon Mureinik amureini@redhat.com Reviewed-by: Adam Litke alitke@redhat.com --- M vdsm/storage/clusterlock.py M vdsm/storage/protect/spmprotect.sh 2 files changed, 11 insertions(+), 8 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/84/46484/1
diff --git a/vdsm/storage/clusterlock.py b/vdsm/storage/clusterlock.py index 24a5d81..c9dab35 100644 --- a/vdsm/storage/clusterlock.py +++ b/vdsm/storage/clusterlock.py @@ -132,7 +132,8 @@ acquireLockCommand = subprocess.list2cmdline([ lockUtil, "start", self._sdUUID, str(hostID), str(self._lockRenewalIntervalSec), str(self._leasesPath), - str(leaseTimeMs), str(ioOpTimeoutMs), str(self._leaseFailRetry) + str(leaseTimeMs), str(ioOpTimeoutMs), + str(self._leaseFailRetry), str(os.getpid()) ])
cmd = [constants.EXT_SU, misc.IOUSER, '-s', constants.EXT_SH, '-c', diff --git a/vdsm/storage/protect/spmprotect.sh b/vdsm/storage/protect/spmprotect.sh index f569a91..fc0e390 100755 --- a/vdsm/storage/protect/spmprotect.sh +++ b/vdsm/storage/protect/spmprotect.sh @@ -31,8 +31,6 @@ CHECKVDSM=${CHECKVDSM:-"/usr/bin/pgrep vdsm"} REBOOTCMD=${REBOOTCMD:-"sudo /sbin/reboot -f"} RENEWDIR="/var/run/vdsm/spmprotect/$$" -VDSM_PIDFILE="/var/run/vdsm/vdsmd.pid" -VDSM_PID=`/bin/cat "$VDSM_PIDFILE"`
function usage() { if [ -n "$1" ]; then @@ -41,7 +39,7 @@ trap EXIT echo "usage: $0 COMMAND PARAMETERS" echo "Commands:" - echo " start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries }" + echo " start { sdUUID hostId renewal_interval_sec lease_path[:offset] lease_time_ms io_op_timeout_ms fail_retries vdsm_pid [debug] }" echo "Parameters:" echo " sdUUID - domain uuid" echo " hostId - host id in pool" @@ -51,6 +49,8 @@ echo " lease_time_ms - time limit within which lease must be renewed (at least 2*renewal_interval_sec)" echo " io_op_timeout_ms - I/O operation timeout" echo " fail_retries - Maximal number of attempts to retry to renew the lease before fencing (<= lease_time_ms/renewal_interval_sec)" + echo " vdsm_pid - Vdsm process ID" + echo " debug - enable debug mode (optional)" exit 1 }
@@ -119,12 +119,14 @@ LEASE_TIME_MS="$4" IO_OP_TIMEOUT_MS="$5" LAST_RENEWAL="$6" + VDSM_PID="$7"
# Make sure params are integers [ "$RENEWAL_INTERVAL" -eq "$RENEWAL_INTERVAL" 2>/dev/null ] || usage "error - Renewal interval not an integer" [ "$LEASE_TIME_MS" -eq "$LEASE_TIME_MS" 2>/dev/null ] || usage "error - Lease time not an integer" [ "$LEASE_TIME_MS" -ge $((RENEWAL_INTERVAL*2)) ] || usage "error - Lease time too small" [ "$IO_OP_TIMEOUT_MS" -eq "$IO_OP_TIMEOUT_MS" 2>/dev/null ] || usage "error - IO op timeout not an integer" + [ "$VDSM_PID" -eq "$VDSM_PID" 2>/dev/null ] || usage "error - Vdsm PID is not a process ID" }
function renew() { @@ -195,12 +197,12 @@
####################################################### Main ###################################################
-if [ "$#" -lt 8 ]; then +if [ "$#" -lt 9 ]; then usage "error - wrong number of arguments" fi
-validate_args $3 $4 $5 $6 $7 $8 -DEBUG="$9" +validate_args $3 $4 $5 $6 $7 $8 $9 +DEBUG="${10}" dbg="" if [ "$DEBUG" -eq "$DEBUG" 2>/dev/null ]; then dbg="-d" @@ -221,7 +223,7 @@ trap release USR1
exec 0>&- && exec 1>&- && exec 2>&- # Close stdin, stdout and stderr - $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS $IO_OP_TIMEOUT_MS $LAST_RENEWAL $DEBUG >> $LOGFILE 2>&1 & + $SETSID $0 renew $sdUUID $ID $RENEWAL_INTERVAL $LEASE_FILE $LEASE_TIME_MS $IO_OP_TIMEOUT_MS $LAST_RENEWAL $VDSM_PID $DEBUG >> $LOGFILE 2>&1 & trap EXIT exit 0 ;;
automation@ovirt.org has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 1: Verified-1
* Update tracker::#1222564::OK * Check Bug-Url::OK * Check Public Bug::#1222564::OK, public bug * Check Product::#1222564::OK, Correct product Red Hat Enterprise Virtualization Manager * Check TR::#1222564::ERROR, wrong target release for stable branch, 3.6.0 should match ^3.[54321].* * Check merged to previous::WARN, Still open on branches ovirt-3.6
Nir Soffer has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 1:
Needs 3.5 bug
Allon Mureinik has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 1:
(1 comment)
+1 on the code, please change the bug url
https://gerrit.ovirt.org/#/c/46484/1//COMMIT_MSG Commit Message:
Line 18: Adding another argument to a script with 9 arguments is ugly, but I Line 19: don't want to make risky changes to this delicate and critical code. Line 20: Line 21: Change-Id: I230b6909781269531eab3d71b516b28ab22de856 Line 22: Bug-Url: https://bugzilla.redhat.com/1222564 https://bugzilla.redhat.com/show_bug.cgi?id=1265177 Line 23: Signed-off-by: Nir Soffer nsoffer@redhat.com Line 24: Reviewed-on: https://gerrit.ovirt.org/45963 Line 25: Continuous-Integration: Jenkins CI Line 26: Reviewed-by: Allon Mureinik amureini@redhat.com
automation@ovirt.org has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2: -Verified
* Update tracker::#1265177::OK * Check Bug-Url::OK * Check Public Bug::#1265177::OK, public bug * Check Product::#1265177::OK, Correct product Red Hat Enterprise Virtualization Manager * Check TR::#1265177::OK, correct target release 3.5.6 * Check merged to previous::OK, change not open on any previous branch
Allon Mureinik has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2: Code-Review+1
Francesco Romani has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2: Continuous-Integration+1
run "make check" locally
Yaniv Bronhaim has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2: Code-Review+2
Nir Soffer has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2: Verified+1
Tested together with https://gerrit.ovirt.org/46485 as described in https://bugzilla.redhat.com/show_bug.cgi?id=1222564#c11
Allon Mureinik has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 2:
Can we move forward with this please?
Yaniv Bronhaim has submitted this change and it was merged.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
safelease: Unbreak safelease on systemd
spmprotect.sh depends on Vdsm pid file for fencing Vdsm when the SPM lease cannot be renewed. However, on systemd, we do not create Vdsm pid file. When spmprotect tries to fence Vdsm it fails and reboots the host, killing all running vms. This issue effects only v1 storage domains, typically old systems using dc compatibility version 3.0.
Now we pass vdsm pid to spmprotect.sh helper script via command line argument, restoring safelease operation.
Adding another argument to a script with 9 arguments is ugly, but I don't want to make risky changes to this delicate and critical code.
Change-Id: I230b6909781269531eab3d71b516b28ab22de856 Bug-Url: https://bugzilla.redhat.com/1265177 Signed-off-by: Nir Soffer nsoffer@redhat.com Reviewed-on: https://gerrit.ovirt.org/45963 Continuous-Integration: Jenkins CI Reviewed-by: Allon Mureinik amureini@redhat.com Reviewed-by: Adam Litke alitke@redhat.com Reviewed-on: https://gerrit.ovirt.org/46484 Continuous-Integration: Francesco Romani fromani@redhat.com Reviewed-by: Yaniv Bronhaim ybronhei@redhat.com --- M vdsm/storage/clusterlock.py M vdsm/storage/protect/spmprotect.sh 2 files changed, 11 insertions(+), 8 deletions(-)
Approvals: Nir Soffer: Verified Yaniv Bronhaim: Looks good to me, approved Allon Mureinik: Looks good to me, but someone else must approve Francesco Romani: Passed CI tests
automation@ovirt.org has posted comments on this change.
Change subject: safelease: Unbreak safelease on systemd ......................................................................
Patch Set 3:
* Update tracker::#1265177::OK * Set MODIFIED::bug 1265177::::#1265177::::IGNORE, not oVirt prod but Red Hat Enterprise Virtualization Manager
vdsm-patches@lists.fedorahosted.org