[PATCH] sanlock: process commands arriving during poll() promptly
by Jonathan Davies
If a command is issued to the sanlock daemon soon after the previous command
from the same client has completed, it might not be processed for up to 1
second. This scenario is commonplace during sequential operations issued through
lvmlockd, making them feel unusually sluggish.
The delay occurs when a command is issued by a client soon after that client has
been marked as 'resumed' and while the sanlock daemon is in the main loop
executing poll().
This is because the fds that poll() monitors are the fds of the non-suspended
clients. Since the duration of poll() is up to STANDARD_CHECK_INTERVAL
milliseconds, i.e. 1 second, any client that resumes during that period and
issues another command will not be picked up during that invocation of poll().
Instead we need to wait for that invocation of poll() to return and be called
again on the next loop iteration.
This problem was observed using lvmlockd between successive invocations of lvs:
the poll() is entered before client_resume for the first lvs's 'unlock' command,
so when the next lvs command's 'acquire' command arrives, it must wait for the
poll() to complete and restart, so takes longer than necessary.
This is illustrated in the following sequence of events caused by two
consecutive invocations of lvs, which we pick up as the first lvs command is
nearing completion:
1. lvmlockd sends "unlock" command to sanlock daemon socket;
2. sanlock daemon dispatches this as "cmd_release" to a worker thread and
calls client_suspend;
3. sanlock daemon invokes poll() (not listening to the suspended lvmlockd
client);
4. sanlock worker thread finishes handling the "cmd_release" command, returns
the response on the socket, and calls client_resume;
5. the second lvs command is issued;
6. lvmlockd issues an "acquire" command on the same connection (but the daemon
isn't listening yet);
7. sanlock daemon's poll() returns after timing out after 1000 ms;
8. sanlock daemon's main loop executes poll() again, this time listening to
the lvmlockd client;
9. poll() returns immediately and receives the "acquire" command.
This patch makes client_resume interrupt the currently-executing poll() by
poking an internal eventfd on which poll() is listening in addition to the
non-suspended clients. This causes the current poll() to return and immediately
restart, this time listening on the resumed client's fd, ready to receive a new
command from the client.
Some performance measurements follow, demonstrating how this patch makes the
second command more responsive.
Before:
% time lvs >/dev/null; time lvs >/dev/null
real 0m0.051s
user 0m0.008s
sys 0m0.008s
real 0m0.880s
user 0m0.000s
sys 0m0.012s
After:
% time lvs >/dev/null; time lvs >/dev/null
real 0m0.039s
user 0m0.004s
sys 0m0.012s
real 0m0.036s
user 0m0.000s
sys 0m0.016s
Signed-off-by: Jonathan Davies <jonathan.davies(a)citrix.com>
---
src/main.c | 27 ++++++++++++++++++++++++---
src/sanlock_internal.h | 1 +
2 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/src/main.c b/src/main.c
index 7038e4e..a7a9016 100644
--- a/src/main.c
+++ b/src/main.c
@@ -35,6 +35,7 @@
#include <sys/utsname.h>
#include <sys/resource.h>
#include <uuid/uuid.h>
+#include <sys/eventfd.h>
#define EXTERN
#include "sanlock_internal.h"
@@ -190,8 +191,10 @@ static int client_alloc(void)
{
int i;
+ /* pollfd is one element longer as we use an additional element for the
+ * eventfd notification mechanism */
client = malloc(CLIENT_NALLOC * sizeof(struct client));
- pollfd = malloc(CLIENT_NALLOC * sizeof(struct pollfd));
+ pollfd = malloc((CLIENT_NALLOC+1) * sizeof(struct pollfd));
if (!client || !pollfd) {
log_error("can't alloc for client or pollfd array");
@@ -360,6 +363,9 @@ void client_resume(int ci)
/* make poll() watch this connection */
pollfd[ci].fd = cl->fd;
pollfd[ci].events = POLLIN;
+
+ /* interrupt any poll() that might already be running */
+ eventfd_write(efd, 1);
}
out:
pthread_mutex_unlock(&cl->mutex);
@@ -737,19 +743,29 @@ static int main_loop(void)
int i, rv, empty, check_all;
char *check_buf = NULL;
int check_buf_len = 0;
+ uint64_t ebuf;
gettimeofday(&last_check, NULL);
poll_timeout = STANDARD_CHECK_INTERVAL;
check_interval = STANDARD_CHECK_INTERVAL;
while (1) {
- rv = poll(pollfd, client_maxi + 1, poll_timeout);
+ /* as well as the clients, check the eventfd */
+ pollfd[client_maxi+1].fd = efd;
+ pollfd[client_maxi+1].events = POLLIN;
+
+ rv = poll(pollfd, client_maxi + 2, poll_timeout);
if (rv == -1 && errno == EINTR)
continue;
if (rv < 0) {
/* not sure */
}
- for (i = 0; i <= client_maxi; i++) {
+ for (i = 0; i <= client_maxi + 1; i++) {
+ if (pollfd[i].fd == efd && pollfd[i].revents & POLLIN) {
+ /* a client_resume completed */
+ eventfd_read(efd, &ebuf);
+ continue;
+ }
if (client[i].fd < 0)
continue;
if (pollfd[i].revents & POLLIN) {
@@ -1676,6 +1692,11 @@ static int do_daemon(void)
if (rv < 0)
goto out_threads;
+ /* initialize global eventfd for client_resume notification */
+ if ((efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK)) == -1)
+ log_error("couldn't create eventfd");
+ goto out_threads;
+
main_loop();
close_token_manager();
diff --git a/src/sanlock_internal.h b/src/sanlock_internal.h
index 9df82f3..0855eec 100644
--- a/src/sanlock_internal.h
+++ b/src/sanlock_internal.h
@@ -376,6 +376,7 @@ EXTERN int helper_kill_fd;
EXTERN int helper_status_fd;
EXTERN uint64_t helper_last_status;
EXTERN uint32_t helper_full_count;
+EXTERN int efd;
EXTERN struct list_head spaces;
EXTERN struct list_head spaces_rem;
--
1.9.1
8 years, 4 months
src/main.c src/sanlock_internal.h
by David Teigland
src/main.c | 28 +++++++++++++++++++++++++---
src/sanlock_internal.h | 1 +
2 files changed, 26 insertions(+), 3 deletions(-)
New commits:
commit 8e948fba24d4501bcf6c615e3c04e17499cf9b96
Author: Jonathan Davies <jonathan.davies(a)citrix.com>
Date: Fri Oct 30 16:16:08 2015 +0000
sanlock: process commands arriving during poll() promptly
If a command is issued to the sanlock daemon soon after the previous command
from the same client has completed, it might not be processed for up to 1
second. This scenario is commonplace during sequential operations issued through
lvmlockd, making them feel unusually sluggish.
The delay occurs when a command is issued by a client soon after that client has
been marked as 'resumed' and while the sanlock daemon is in the main loop
executing poll().
This is because the fds that poll() monitors are the fds of the non-suspended
clients. Since the duration of poll() is up to STANDARD_CHECK_INTERVAL
milliseconds, i.e. 1 second, any client that resumes during that period and
issues another command will not be picked up during that invocation of poll().
Instead we need to wait for that invocation of poll() to return and be called
again on the next loop iteration.
This problem was observed using lvmlockd between successive invocations of lvs:
the poll() is entered before client_resume for the first lvs's 'unlock' command,
so when the next lvs command's 'acquire' command arrives, it must wait for the
poll() to complete and restart, so takes longer than necessary.
This is illustrated in the following sequence of events caused by two
consecutive invocations of lvs, which we pick up as the first lvs command is
nearing completion:
1. lvmlockd sends "unlock" command to sanlock daemon socket;
2. sanlock daemon dispatches this as "cmd_release" to a worker thread and
calls client_suspend;
3. sanlock daemon invokes poll() (not listening to the suspended lvmlockd
client);
4. sanlock worker thread finishes handling the "cmd_release" command, returns
the response on the socket, and calls client_resume;
5. the second lvs command is issued;
6. lvmlockd issues an "acquire" command on the same connection (but the daemon
isn't listening yet);
7. sanlock daemon's poll() returns after timing out after 1000 ms;
8. sanlock daemon's main loop executes poll() again, this time listening to
the lvmlockd client;
9. poll() returns immediately and receives the "acquire" command.
This patch makes client_resume interrupt the currently-executing poll() by
poking an internal eventfd on which poll() is listening in addition to the
non-suspended clients. This causes the current poll() to return and immediately
restart, this time listening on the resumed client's fd, ready to receive a new
command from the client.
Some performance measurements follow, demonstrating how this patch makes the
second command more responsive.
Before:
% time lvs >/dev/null; time lvs >/dev/null
real 0m0.051s
user 0m0.008s
sys 0m0.008s
real 0m0.880s
user 0m0.000s
sys 0m0.012s
After:
% time lvs >/dev/null; time lvs >/dev/null
real 0m0.039s
user 0m0.004s
sys 0m0.012s
real 0m0.036s
user 0m0.000s
sys 0m0.016s
Signed-off-by: Jonathan Davies <jonathan.davies(a)citrix.com>
diff --git a/src/main.c b/src/main.c
index 7038e4e..c1d7e93 100644
--- a/src/main.c
+++ b/src/main.c
@@ -35,6 +35,7 @@
#include <sys/utsname.h>
#include <sys/resource.h>
#include <uuid/uuid.h>
+#include <sys/eventfd.h>
#define EXTERN
#include "sanlock_internal.h"
@@ -190,8 +191,10 @@ static int client_alloc(void)
{
int i;
+ /* pollfd is one element longer as we use an additional element for the
+ * eventfd notification mechanism */
client = malloc(CLIENT_NALLOC * sizeof(struct client));
- pollfd = malloc(CLIENT_NALLOC * sizeof(struct pollfd));
+ pollfd = malloc((CLIENT_NALLOC+1) * sizeof(struct pollfd));
if (!client || !pollfd) {
log_error("can't alloc for client or pollfd array");
@@ -360,6 +363,9 @@ void client_resume(int ci)
/* make poll() watch this connection */
pollfd[ci].fd = cl->fd;
pollfd[ci].events = POLLIN;
+
+ /* interrupt any poll() that might already be running */
+ eventfd_write(efd, 1);
}
out:
pthread_mutex_unlock(&cl->mutex);
@@ -737,19 +743,29 @@ static int main_loop(void)
int i, rv, empty, check_all;
char *check_buf = NULL;
int check_buf_len = 0;
+ uint64_t ebuf;
gettimeofday(&last_check, NULL);
poll_timeout = STANDARD_CHECK_INTERVAL;
check_interval = STANDARD_CHECK_INTERVAL;
while (1) {
- rv = poll(pollfd, client_maxi + 1, poll_timeout);
+ /* as well as the clients, check the eventfd */
+ pollfd[client_maxi+1].fd = efd;
+ pollfd[client_maxi+1].events = POLLIN;
+
+ rv = poll(pollfd, client_maxi + 2, poll_timeout);
if (rv == -1 && errno == EINTR)
continue;
if (rv < 0) {
/* not sure */
}
- for (i = 0; i <= client_maxi; i++) {
+ for (i = 0; i <= client_maxi + 1; i++) {
+ if (pollfd[i].fd == efd && pollfd[i].revents & POLLIN) {
+ /* a client_resume completed */
+ eventfd_read(efd, &ebuf);
+ continue;
+ }
if (client[i].fd < 0)
continue;
if (pollfd[i].revents & POLLIN) {
@@ -1676,6 +1692,12 @@ static int do_daemon(void)
if (rv < 0)
goto out_threads;
+ /* initialize global eventfd for client_resume notification */
+ if ((efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK)) == -1) {
+ log_error("couldn't create eventfd");
+ goto out_threads;
+ }
+
main_loop();
close_token_manager();
diff --git a/src/sanlock_internal.h b/src/sanlock_internal.h
index 9df82f3..0855eec 100644
--- a/src/sanlock_internal.h
+++ b/src/sanlock_internal.h
@@ -376,6 +376,7 @@ EXTERN int helper_kill_fd;
EXTERN int helper_status_fd;
EXTERN uint64_t helper_last_status;
EXTERN uint32_t helper_full_count;
+EXTERN int efd;
EXTERN struct list_head spaces;
EXTERN struct list_head spaces_rem;
8 years, 4 months
src/helper.c
by David Teigland
src/helper.c | 5 +++++
1 file changed, 5 insertions(+)
New commits:
commit c77c09506d1a09b9585879efb29460286321fdc0
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Thu Oct 15 10:41:17 2015 -0500
sanlock: clear helper process supplementary groups
The helper process inherits the sanlock main process groups, but it does
not need them. Clear the helper groups after fork.
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
diff --git a/src/helper.c b/src/helper.c
index 27898d5..0022092 100644
--- a/src/helper.c
+++ b/src/helper.c
@@ -25,6 +25,7 @@
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/prctl.h>
+#include <grp.h>
#include "sanlock.h"
#include "monotime.h"
@@ -159,6 +160,10 @@ int run_helper(int in_fd, int out_fd, int log_stderr)
sprintf(name, "%s", "sanlock-helper");
prctl(PR_SET_NAME, (unsigned long)name, 0, 0, 0);
+ rv = setgroups(0, NULL);
+ if (rv < 0)
+ log_debug("error clearing helper groups errno %i", errno);
+
memset(&pollfd, 0, sizeof(pollfd));
pollfd.fd = in_fd;
pollfd.events = POLLIN;
8 years, 5 months
[PATCH] Clear helper process supplementary groups
by Nir Soffer
The helper process inherits the sanlock main process groups, but it does
not need them. Clear the helper groups after fork.
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
src/helper.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/src/helper.c b/src/helper.c
index 27898d5..fffe00c 100644
--- a/src/helper.c
+++ b/src/helper.c
@@ -25,6 +25,7 @@
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/prctl.h>
+#include <grp.h>
#include "sanlock.h"
#include "monotime.h"
@@ -159,6 +160,11 @@ int run_helper(int in_fd, int out_fd, int log_stderr)
sprintf(name, "%s", "sanlock-helper");
prctl(PR_SET_NAME, (unsigned long)name, 0, 0, 0);
+ rv = setgroups(0, NULL);
+ if (rv < 0) {
+ log_debug("error clearing helper groups errno %i", errno);
+ }
+
memset(&pollfd, 0, sizeof(pollfd));
pollfd.fd = in_fd;
pollfd.events = POLLIN;
--
2.4.3
8 years, 5 months
src/cmd.c src/paxos_lease.c src/paxos_lease.h src/resource.c src/sanlock_resource.h src/sanlock_rv.h
by David Teigland
src/cmd.c | 6 ++++--
src/paxos_lease.c | 9 +++++++++
src/paxos_lease.h | 1 +
src/resource.c | 10 ++++++++--
src/sanlock_resource.h | 6 ++++++
src/sanlock_rv.h | 1 +
6 files changed, 29 insertions(+), 4 deletions(-)
New commits:
commit badc556ff4f0c60b7e82af25fe5d0f362cc41ccb
Author: David Teigland <teigland(a)redhat.com>
Date: Wed Oct 14 12:02:33 2015 -0500
sanlock: add acquire flag to avoid waiting
When a paxos lease is requested that was held by
a failed, but not yet expired host, the acquire
routine waits for the lease to expire before
returning. Add a flag to avoid waiting for the
expiration for programs that want to just retry
instead of block in sanlock_acquire().
diff --git a/src/cmd.c b/src/cmd.c
index 93387c0..64f7f00 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -126,6 +126,7 @@ static const char *acquire_error_str(int error)
case SANLK_ACQUIRE_IDLIVE:
case SANLK_ACQUIRE_OWNED:
case SANLK_ACQUIRE_OTHER:
+ case SANLK_ACQUIRE_OWNED_RETRY:
return "lease owned by other host";
case SANLK_ACQUIRE_SHRETRY:
@@ -181,8 +182,8 @@ static void cmd_acquire(struct task *task, struct cmd_args *ca)
new_tokens_count = ca->header.data;
- log_debug("cmd_acquire %d,%d,%d ci_in %d fd %d count %d",
- cl_ci, cl_fd, cl_pid, ca->ci_in, fd, new_tokens_count);
+ log_debug("cmd_acquire %d,%d,%d ci_in %d fd %d count %d flags %x",
+ cl_ci, cl_fd, cl_pid, ca->ci_in, fd, new_tokens_count, ca->header.cmd_flags);
if (new_tokens_count > SANLK_MAX_RESOURCES) {
log_error("cmd_acquire %d,%d,%d new %d max %d",
@@ -396,6 +397,7 @@ static void cmd_acquire(struct task *task, struct cmd_args *ca)
case SANLK_ACQUIRE_IDLIVE:
case SANLK_ACQUIRE_OWNED:
case SANLK_ACQUIRE_OTHER:
+ case SANLK_ACQUIRE_OWNED_RETRY:
lvl = com.quiet_fail ? LOG_DEBUG : LOG_ERR;
break;
default:
diff --git a/src/paxos_lease.c b/src/paxos_lease.c
index df60ee6..eae2fd9 100644
--- a/src/paxos_lease.c
+++ b/src/paxos_lease.c
@@ -1661,6 +1661,15 @@ int paxos_lease_acquire(struct task *task,
goto run;
}
+ if (flags & PAXOS_ACQUIRE_OWNER_NOWAIT) {
+ log_token(token, "paxos_acquire owner %llu %llu %llu no wait",
+ (unsigned long long)cur_leader.owner_id,
+ (unsigned long long)cur_leader.owner_generation,
+ (unsigned long long)cur_leader.timestamp);
+ error = SANLK_ACQUIRE_OWNED_RETRY;
+ goto out;
+ }
+
skip_live_check:
/* TODO: test with sleep(2) here */
sleep(1);
diff --git a/src/paxos_lease.h b/src/paxos_lease.h
index de5e095..d66a210 100644
--- a/src/paxos_lease.h
+++ b/src/paxos_lease.h
@@ -12,6 +12,7 @@
#define PAXOS_ACQUIRE_FORCE 0x00000001
#define PAXOS_ACQUIRE_QUIET_FAIL 0x00000002
#define PAXOS_ACQUIRE_SHARED 0x00000004
+#define PAXOS_ACQUIRE_OWNER_NOWAIT 0x00000008
uint32_t leader_checksum(struct leader_record *lr);
diff --git a/src/resource.c b/src/resource.c
index 15d88a9..b378f62 100644
--- a/src/resource.c
+++ b/src/resource.c
@@ -569,7 +569,7 @@ int res_get_lvb(struct sanlk_resource *res, char **lvb_out, int *lvblen)
static int acquire_disk(struct task *task, struct token *token,
uint64_t acquire_lver, int new_num_hosts,
- struct leader_record *leader)
+ int owner_nowait, struct leader_record *leader)
{
struct leader_record leader_tmp;
int rv;
@@ -581,6 +581,9 @@ static int acquire_disk(struct task *task, struct token *token,
if (token->acquire_flags & SANLK_RES_SHARED)
flags |= PAXOS_ACQUIRE_SHARED;
+ if (owner_nowait)
+ flags |= PAXOS_ACQUIRE_OWNER_NOWAIT;
+
memset(&leader_tmp, 0, sizeof(leader_tmp));
rv = paxos_lease_acquire(task, token, flags, &leader_tmp, acquire_lver,
@@ -1320,6 +1323,7 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags,
int live_count = 0;
int allow_orphan = 0;
int only_orphan = 0;
+ int owner_nowait = 0;
int rv;
if (token->acquire_flags & SANLK_RES_LVER)
@@ -1331,6 +1335,8 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags,
allow_orphan = 1;
if (cmd_flags & SANLK_ACQUIRE_ORPHAN_ONLY)
only_orphan = 1;
+ if (cmd_flags & SANLK_ACQUIRE_OWNER_NOWAIT)
+ owner_nowait = 1;
pthread_mutex_lock(&resource_mutex);
@@ -1493,7 +1499,7 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags,
retry:
memset(&leader, 0, sizeof(struct leader_record));
- rv = acquire_disk(task, token, acquire_lver, new_num_hosts, &leader);
+ rv = acquire_disk(task, token, acquire_lver, new_num_hosts, owner_nowait, &leader);
if (rv == SANLK_ACQUIRE_IDLIVE || rv == SANLK_ACQUIRE_OWNED || rv == SANLK_ACQUIRE_OTHER) {
/*
diff --git a/src/sanlock_resource.h b/src/sanlock_resource.h
index 624a7d0..d412482 100644
--- a/src/sanlock_resource.h
+++ b/src/sanlock_resource.h
@@ -43,11 +43,17 @@
* If the lock already exists as an orphan,
* then acquire it. Otherwise, do not acquire
* a lock at all and return -ENOENT.
+ *
+ * SANLK_ACQUIRE_OWNER_NOWAIT
+ * If the lock cannot be granted immediately
+ * because the owner's lease needs to time out, do
+ * not wait, but return -SANLK_ACQUIRE_OWNED_RETRY.
*/
#define SANLK_ACQUIRE_LVB 0x00000001
#define SANLK_ACQUIRE_ORPHAN 0x00000002
#define SANLK_ACQUIRE_ORPHAN_ONLY 0x00000004
+#define SANLK_ACQUIRE_OWNER_NOWAIT 0x00000008
/*
* release flags
diff --git a/src/sanlock_rv.h b/src/sanlock_rv.h
index 686603a..bafef2b 100644
--- a/src/sanlock_rv.h
+++ b/src/sanlock_rv.h
@@ -47,6 +47,7 @@
#define SANLK_ACQUIRE_OWNED -244
#define SANLK_ACQUIRE_OTHER -245
#define SANLK_ACQUIRE_SHRETRY -246
+#define SANLK_ACQUIRE_OWNED_RETRY -247
#define SANLK_RELEASE_LVER -250
#define SANLK_RELEASE_OWNER -251
8 years, 5 months
python/setup.py
by David Teigland
python/setup.py | 1 +
1 file changed, 1 insertion(+)
New commits:
commit a1929080a6ce51879139eb8d05a425ccd3d37082
Author: David Teigland <teigland(a)redhat.com>
Date: Wed Oct 14 13:21:04 2015 -0500
python: add compile flags
diff --git a/python/setup.py b/python/setup.py
index efc7eff..91311a8 100644
--- a/python/setup.py
+++ b/python/setup.py
@@ -11,6 +11,7 @@ sanlock = Extension(name='sanlock',
sources=['sanlock.c'],
include_dirs=['../src'],
library_dirs=['../src'],
+ extra_link_args=['-fPIE', '-Wl,-z,relro,-z,now'],
libraries=sanlocklib)
version = None
8 years, 5 months
fence_sanlock/Makefile
by David Teigland
fence_sanlock/Makefile | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
New commits:
commit 05fb1a81a3218c9247dd96d7ea261c67f84700cf
Author: David Teigland <teigland(a)redhat.com>
Date: Wed Oct 14 12:57:52 2015 -0500
fence_sanlock: add flags for build
diff --git a/fence_sanlock/Makefile b/fence_sanlock/Makefile
index 6b460fe..6e2aa9b 100644
--- a/fence_sanlock/Makefile
+++ b/fence_sanlock/Makefile
@@ -32,13 +32,15 @@ CFLAGS += -D_GNU_SOURCE -g \
VER=$(shell cat ../VERSION)
CFLAGS += -DVERSION=\"$(VER)\" -I../src -I../wdmd
+CFLAGS += -fPIE -DPIE
-LDFLAGS = -lrt -laio -lblkid -lsanlock -lwdmd
+LDFLAGS = -Wl,-z,now -Wl,-z,relro -pie
+LDADD = -lrt -laio -lblkid -lsanlock -lwdmd
all: $(TARGET1) $(TARGET2)
$(TARGET1): $(SOURCE1)
- $(CC) $(CFLAGS) $(LDFLAGS) $(SOURCE1) -o $@ -L. -L../src -L../wdmd
+ $(CC) $(CFLAGS) $(LDFLAGS) $(SOURCE1) $(LDADD) -o $@ -L. -L../src -L../wdmd
$(TARGET2): $(SOURCE2)
cat $(SOURCE2) | sed \
8 years, 5 months
src/main.c
by David Teigland
src/main.c | 70 ++++++++++++-------------------------------------------------
1 file changed, 14 insertions(+), 56 deletions(-)
New commits:
commit fe871876d0c04b4abb67e7f1990bdd96e7556aa5
Author: Nir Soffer <nsoffer(a)redhat.com>
Date: Wed Oct 14 12:51:33 2015 -0500
sanlock: setup supplementary groups before daemonizing
Sanlock daemon used to setup the supplementary groups after daemonizing.
This may be slow on overloaded machines creating a window where sanlock
supplementary groups are empty, confusing vdsm-tool.
This patch set the supplementary groups before daemonizing, so programs
managing sanlock would wait until it the daemon is configured properly,
eliminating the race during startup. The child process inherits the
supplementary groups set by the parent.
The complex code for setting groups was replaced with a call to
initgroups().
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
diff --git a/src/main.c b/src/main.c
index 958b57b..7038e4e 100644
--- a/src/main.c
+++ b/src/main.c
@@ -1431,65 +1431,23 @@ static void setup_limits(void)
static void setup_groups(void)
{
- int rv, i, j, h;
- int pngroups, sngroups, ngroups_max;
- gid_t *pgroup, *sgroup;
+ int rv;
if (!com.uname || !com.gname)
return;
- ngroups_max = sysconf(_SC_NGROUPS_MAX);
- if (ngroups_max < 0) {
- log_error("cannot get the max number of groups %i", errno);
- return;
- }
-
- pgroup = malloc(ngroups_max * 2 * sizeof(gid_t));
- if (!pgroup) {
- log_error("cannot malloc the group list %i", errno);
- exit(EXIT_FAILURE);
- }
-
- pngroups = getgroups(ngroups_max, pgroup);
- if (pngroups < 0) {
- log_error("cannot get the process groups %i", errno);
- goto out;
- }
-
- sgroup = pgroup + ngroups_max;
- sngroups = ngroups_max;
-
- rv = getgrouplist(com.uname, com.gid, sgroup, &sngroups);
+ rv = initgroups(com.uname, com.gid);
if (rv < 0) {
- log_error("cannot get the user %s groups %i", com.uname, errno);
- goto out;
+ log_error("error initializing groups errno %i", errno);
}
+}
- for (i = 0, j = pngroups; i < sngroups; i++) {
- if (j >= ngroups_max) {
- log_error("too many groups for the user %s", com.uname);
- break;
- }
-
- /* check if the groups is already present in the list */
- for (h = 0; h < j; h++) {
- if (pgroup[h] == sgroup[i]) {
- goto skip_gid;
- }
- }
-
- pgroup[j] = sgroup[i];
- j++;
-
- skip_gid:
- ; /* skipping the gid because it's already present */
- }
+static void setup_uid_gid(void)
+{
+ int rv;
- rv = setgroups(j, pgroup);
- if (rv < 0) {
- log_error("cannot set the user %s groups %i", com.uname, errno);
- goto out;
- }
+ if (!com.uname || !com.gname)
+ return;
rv = setgid(com.gid);
if (rv < 0) {
@@ -1509,9 +1467,6 @@ static void setup_groups(void)
if (rv < 0) {
log_error("cannot set dumpable process errno %i", errno);
}
-
- out:
- free(pgroup);
}
static void setup_signals(void)
@@ -1660,9 +1615,12 @@ static int do_daemon(void)
{
int fd, rv;
- /* TODO: copy comprehensive daemonization method from libvirtd */
+
+ /* This can take a while so do it before forking. */
+ setup_groups();
if (!com.debug) {
+ /* TODO: copy comprehensive daemonization method from libvirtd */
if (daemon(0, 0) < 0) {
log_tool("cannot fork daemon\n");
exit(EXIT_FAILURE);
@@ -1699,7 +1657,7 @@ static int do_daemon(void)
setup_host_name();
- setup_groups();
+ setup_uid_gid();
log_level(0, 0, NULL, LOG_WARNING, "sanlock daemon started %s host %s",
VERSION, our_host_name_global);
8 years, 5 months
[PATCH] Setup supplementary groups before daemonizing
by Nir Soffer
This is a better and simpler version of "Collect supplementary groups before
daemonizing" I sent earlier today.
Sanlock daemon used to setup the supplementary groups after daemonizing.
This may be slow on overloaded machines creating a window where sanlock
supplementary groups are empty, confusing vdsm-tool.
This patch set the supplementary groups before daemonizing, so programs
managing sanlock would wait until it the daemon is configured properly,
eliminating the race during startup. The child process inherits the
supplementary groups set by the parent.
The complex code for setting groups was replaced with a call to
initgroups().
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
Please find the patch on github (git send-email trouble);
https://github.com/nirs/sanlock/commit/db6f765a3f8ee4c22c818be90a765ac8be...
Regards,
Nir
8 years, 5 months
[PATCH] Collect supplementary groups before daemonizing
by Nir Soffer
Sanlock daemon used to collect and set the supplementary groups after
daemonizing. Collecting the groups may be slow on overloaded machines
creating a window where sanlock supplementary groups are not empty. This
was confusing vdsm-tool while it tried to configure sanlock.
This patch collect the supplementary groups before daemonizing, so
programs managing sanlock would wait until it collected the necessary
information for the daemon, minimizing the window where the daemon is
running is not configured properly yet.
We ignore now the current process groups since we are interested only in
the effective user supplementary groups. This change also drop the
complex code needed to eliminate duplicates between current process
groups and user supplementary groups.
This change also fix the log when getgrouplist() fails; according to the
manual, this means the number of groups is bigger than the requested
size.
git send-email had trouble sending the patch to the list; please check
the patch on github:
https://github.com/nirs/sanlock/commit/4c314e5b6e176e95ef60033184089c29d9...
Regards,
Nir
8 years, 5 months