October 2015 - sanlock-devel - Fedora Mailing-Lists

[PATCH] sanlock: process commands arriving during poll() promptly

by Jonathan Davies

If a command is issued to the sanlock daemon soon after the previous command from the same client has completed, it might not be processed for up to 1 second. This scenario is commonplace during sequential operations issued through lvmlockd, making them feel unusually sluggish. The delay occurs when a command is issued by a client soon after that client has been marked as 'resumed' and while the sanlock daemon is in the main loop executing poll(). This is because the fds that poll() monitors are the fds of the non-suspended clients. Since the duration of poll() is up to STANDARD_CHECK_INTERVAL milliseconds, i.e. 1 second, any client that resumes during that period and issues another command will not be picked up during that invocation of poll(). Instead we need to wait for that invocation of poll() to return and be called again on the next loop iteration. This problem was observed using lvmlockd between successive invocations of lvs: the poll() is entered before client_resume for the first lvs's 'unlock' command, so when the next lvs command's 'acquire' command arrives, it must wait for the poll() to complete and restart, so takes longer than necessary. This is illustrated in the following sequence of events caused by two consecutive invocations of lvs, which we pick up as the first lvs command is nearing completion: 1. lvmlockd sends "unlock" command to sanlock daemon socket; 2. sanlock daemon dispatches this as "cmd_release" to a worker thread and calls client_suspend; 3. sanlock daemon invokes poll() (not listening to the suspended lvmlockd client); 4. sanlock worker thread finishes handling the "cmd_release" command, returns the response on the socket, and calls client_resume; 5. the second lvs command is issued; 6. lvmlockd issues an "acquire" command on the same connection (but the daemon isn't listening yet); 7. sanlock daemon's poll() returns after timing out after 1000 ms; 8. sanlock daemon's main loop executes poll() again, this time listening to the lvmlockd client; 9. poll() returns immediately and receives the "acquire" command. This patch makes client_resume interrupt the currently-executing poll() by poking an internal eventfd on which poll() is listening in addition to the non-suspended clients. This causes the current poll() to return and immediately restart, this time listening on the resumed client's fd, ready to receive a new command from the client. Some performance measurements follow, demonstrating how this patch makes the second command more responsive. Before: % time lvs >/dev/null; time lvs >/dev/null real 0m0.051s user 0m0.008s sys 0m0.008s real 0m0.880s user 0m0.000s sys 0m0.012s After: % time lvs >/dev/null; time lvs >/dev/null real 0m0.039s user 0m0.004s sys 0m0.012s real 0m0.036s user 0m0.000s sys 0m0.016s Signed-off-by: Jonathan Davies <jonathan.davies(a)citrix.com> --- src/main.c | 27 ++++++++++++++++++++++++--- src/sanlock_internal.h | 1 + 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/src/main.c b/src/main.c index 7038e4e..a7a9016 100644 --- a/src/main.c +++ b/src/main.c @@ -35,6 +35,7 @@ #include <sys/utsname.h> #include <sys/resource.h> #include <uuid/uuid.h> +#include <sys/eventfd.h> #define EXTERN #include "sanlock_internal.h" @@ -190,8 +191,10 @@ static int client_alloc(void) { int i; + /* pollfd is one element longer as we use an additional element for the + * eventfd notification mechanism */ client = malloc(CLIENT_NALLOC * sizeof(struct client)); - pollfd = malloc(CLIENT_NALLOC * sizeof(struct pollfd)); + pollfd = malloc((CLIENT_NALLOC+1) * sizeof(struct pollfd)); if (!client || !pollfd) { log_error("can't alloc for client or pollfd array"); @@ -360,6 +363,9 @@ void client_resume(int ci) /* make poll() watch this connection */ pollfd[ci].fd = cl->fd; pollfd[ci].events = POLLIN; + + /* interrupt any poll() that might already be running */ + eventfd_write(efd, 1); } out: pthread_mutex_unlock(&cl->mutex); @@ -737,19 +743,29 @@ static int main_loop(void) int i, rv, empty, check_all; char *check_buf = NULL; int check_buf_len = 0; + uint64_t ebuf; gettimeofday(&last_check, NULL); poll_timeout = STANDARD_CHECK_INTERVAL; check_interval = STANDARD_CHECK_INTERVAL; while (1) { - rv = poll(pollfd, client_maxi + 1, poll_timeout); + /* as well as the clients, check the eventfd */ + pollfd[client_maxi+1].fd = efd; + pollfd[client_maxi+1].events = POLLIN; + + rv = poll(pollfd, client_maxi + 2, poll_timeout); if (rv == -1 && errno == EINTR) continue; if (rv < 0) { /* not sure */ } - for (i = 0; i <= client_maxi; i++) { + for (i = 0; i <= client_maxi + 1; i++) { + if (pollfd[i].fd == efd && pollfd[i].revents & POLLIN) { + /* a client_resume completed */ + eventfd_read(efd, &ebuf); + continue; + } if (client[i].fd < 0) continue; if (pollfd[i].revents & POLLIN) { @@ -1676,6 +1692,11 @@ static int do_daemon(void) if (rv < 0) goto out_threads; + /* initialize global eventfd for client_resume notification */ + if ((efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK)) == -1) + log_error("couldn't create eventfd"); + goto out_threads; + main_loop(); close_token_manager(); diff --git a/src/sanlock_internal.h b/src/sanlock_internal.h index 9df82f3..0855eec 100644 --- a/src/sanlock_internal.h +++ b/src/sanlock_internal.h @@ -376,6 +376,7 @@ EXTERN int helper_kill_fd; EXTERN int helper_status_fd; EXTERN uint64_t helper_last_status; EXTERN uint32_t helper_full_count; +EXTERN int efd; EXTERN struct list_head spaces; EXTERN struct list_head spaces_rem; -- 1.9.1

8 years, 6 months

2
1
0 / 0

src/main.c src/sanlock_internal.h

by David Teigland

src/main.c | 28 +++++++++++++++++++++++++--- src/sanlock_internal.h | 1 + 2 files changed, 26 insertions(+), 3 deletions(-) New commits: commit 8e948fba24d4501bcf6c615e3c04e17499cf9b96 Author: Jonathan Davies <jonathan.davies(a)citrix.com> Date: Fri Oct 30 16:16:08 2015 +0000 sanlock: process commands arriving during poll() promptly If a command is issued to the sanlock daemon soon after the previous command from the same client has completed, it might not be processed for up to 1 second. This scenario is commonplace during sequential operations issued through lvmlockd, making them feel unusually sluggish. The delay occurs when a command is issued by a client soon after that client has been marked as 'resumed' and while the sanlock daemon is in the main loop executing poll(). This is because the fds that poll() monitors are the fds of the non-suspended clients. Since the duration of poll() is up to STANDARD_CHECK_INTERVAL milliseconds, i.e. 1 second, any client that resumes during that period and issues another command will not be picked up during that invocation of poll(). Instead we need to wait for that invocation of poll() to return and be called again on the next loop iteration. This problem was observed using lvmlockd between successive invocations of lvs: the poll() is entered before client_resume for the first lvs's 'unlock' command, so when the next lvs command's 'acquire' command arrives, it must wait for the poll() to complete and restart, so takes longer than necessary. This is illustrated in the following sequence of events caused by two consecutive invocations of lvs, which we pick up as the first lvs command is nearing completion: 1. lvmlockd sends "unlock" command to sanlock daemon socket; 2. sanlock daemon dispatches this as "cmd_release" to a worker thread and calls client_suspend; 3. sanlock daemon invokes poll() (not listening to the suspended lvmlockd client); 4. sanlock worker thread finishes handling the "cmd_release" command, returns the response on the socket, and calls client_resume; 5. the second lvs command is issued; 6. lvmlockd issues an "acquire" command on the same connection (but the daemon isn't listening yet); 7. sanlock daemon's poll() returns after timing out after 1000 ms; 8. sanlock daemon's main loop executes poll() again, this time listening to the lvmlockd client; 9. poll() returns immediately and receives the "acquire" command. This patch makes client_resume interrupt the currently-executing poll() by poking an internal eventfd on which poll() is listening in addition to the non-suspended clients. This causes the current poll() to return and immediately restart, this time listening on the resumed client's fd, ready to receive a new command from the client. Some performance measurements follow, demonstrating how this patch makes the second command more responsive. Before: % time lvs >/dev/null; time lvs >/dev/null real 0m0.051s user 0m0.008s sys 0m0.008s real 0m0.880s user 0m0.000s sys 0m0.012s After: % time lvs >/dev/null; time lvs >/dev/null real 0m0.039s user 0m0.004s sys 0m0.012s real 0m0.036s user 0m0.000s sys 0m0.016s Signed-off-by: Jonathan Davies <jonathan.davies(a)citrix.com> diff --git a/src/main.c b/src/main.c index 7038e4e..c1d7e93 100644 --- a/src/main.c +++ b/src/main.c @@ -35,6 +35,7 @@ #include <sys/utsname.h> #include <sys/resource.h> #include <uuid/uuid.h> +#include <sys/eventfd.h> #define EXTERN #include "sanlock_internal.h" @@ -190,8 +191,10 @@ static int client_alloc(void) { int i; + /* pollfd is one element longer as we use an additional element for the + * eventfd notification mechanism */ client = malloc(CLIENT_NALLOC * sizeof(struct client)); - pollfd = malloc(CLIENT_NALLOC * sizeof(struct pollfd)); + pollfd = malloc((CLIENT_NALLOC+1) * sizeof(struct pollfd)); if (!client || !pollfd) { log_error("can't alloc for client or pollfd array"); @@ -360,6 +363,9 @@ void client_resume(int ci) /* make poll() watch this connection */ pollfd[ci].fd = cl->fd; pollfd[ci].events = POLLIN; + + /* interrupt any poll() that might already be running */ + eventfd_write(efd, 1); } out: pthread_mutex_unlock(&cl->mutex); @@ -737,19 +743,29 @@ static int main_loop(void) int i, rv, empty, check_all; char *check_buf = NULL; int check_buf_len = 0; + uint64_t ebuf; gettimeofday(&last_check, NULL); poll_timeout = STANDARD_CHECK_INTERVAL; check_interval = STANDARD_CHECK_INTERVAL; while (1) { - rv = poll(pollfd, client_maxi + 1, poll_timeout); + /* as well as the clients, check the eventfd */ + pollfd[client_maxi+1].fd = efd; + pollfd[client_maxi+1].events = POLLIN; + + rv = poll(pollfd, client_maxi + 2, poll_timeout); if (rv == -1 && errno == EINTR) continue; if (rv < 0) { /* not sure */ } - for (i = 0; i <= client_maxi; i++) { + for (i = 0; i <= client_maxi + 1; i++) { + if (pollfd[i].fd == efd && pollfd[i].revents & POLLIN) { + /* a client_resume completed */ + eventfd_read(efd, &ebuf); + continue; + } if (client[i].fd < 0) continue; if (pollfd[i].revents & POLLIN) { @@ -1676,6 +1692,12 @@ static int do_daemon(void) if (rv < 0) goto out_threads; + /* initialize global eventfd for client_resume notification */ + if ((efd = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK)) == -1) { + log_error("couldn't create eventfd"); + goto out_threads; + } + main_loop(); close_token_manager(); diff --git a/src/sanlock_internal.h b/src/sanlock_internal.h index 9df82f3..0855eec 100644 --- a/src/sanlock_internal.h +++ b/src/sanlock_internal.h @@ -376,6 +376,7 @@ EXTERN int helper_kill_fd; EXTERN int helper_status_fd; EXTERN uint64_t helper_last_status; EXTERN uint32_t helper_full_count; +EXTERN int efd; EXTERN struct list_head spaces; EXTERN struct list_head spaces_rem;

8 years, 6 months

1
0
0 / 0

src/helper.c

by David Teigland

src/helper.c | 5 +++++ 1 file changed, 5 insertions(+) New commits: commit c77c09506d1a09b9585879efb29460286321fdc0 Author: Nir Soffer <nsoffer(a)redhat.com> Date: Thu Oct 15 10:41:17 2015 -0500 sanlock: clear helper process supplementary groups The helper process inherits the sanlock main process groups, but it does not need them. Clear the helper groups after fork. Signed-off-by: Nir Soffer <nsoffer(a)redhat.com> diff --git a/src/helper.c b/src/helper.c index 27898d5..0022092 100644 --- a/src/helper.c +++ b/src/helper.c @@ -25,6 +25,7 @@ #include <sys/types.h> #include <sys/wait.h> #include <sys/prctl.h> +#include <grp.h> #include "sanlock.h" #include "monotime.h" @@ -159,6 +160,10 @@ int run_helper(int in_fd, int out_fd, int log_stderr) sprintf(name, "%s", "sanlock-helper"); prctl(PR_SET_NAME, (unsigned long)name, 0, 0, 0); + rv = setgroups(0, NULL); + if (rv < 0) + log_debug("error clearing helper groups errno %i", errno); + memset(&pollfd, 0, sizeof(pollfd)); pollfd.fd = in_fd; pollfd.events = POLLIN;

8 years, 6 months

1
0
0 / 0

[PATCH] Clear helper process supplementary groups

by Nir Soffer

The helper process inherits the sanlock main process groups, but it does not need them. Clear the helper groups after fork. Signed-off-by: Nir Soffer <nsoffer(a)redhat.com> --- src/helper.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/helper.c b/src/helper.c index 27898d5..fffe00c 100644 --- a/src/helper.c +++ b/src/helper.c @@ -25,6 +25,7 @@ #include <sys/types.h> #include <sys/wait.h> #include <sys/prctl.h> +#include <grp.h> #include "sanlock.h" #include "monotime.h" @@ -159,6 +160,11 @@ int run_helper(int in_fd, int out_fd, int log_stderr) sprintf(name, "%s", "sanlock-helper"); prctl(PR_SET_NAME, (unsigned long)name, 0, 0, 0); + rv = setgroups(0, NULL); + if (rv < 0) { + log_debug("error clearing helper groups errno %i", errno); + } + memset(&pollfd, 0, sizeof(pollfd)); pollfd.fd = in_fd; pollfd.events = POLLIN; -- 2.4.3

8 years, 6 months

1
0
0 / 0

src/cmd.c src/paxos_lease.c src/paxos_lease.h src/resource.c src/sanlock_resource.h src/sanlock_rv.h

by David Teigland

src/cmd.c | 6 ++++-- src/paxos_lease.c | 9 +++++++++ src/paxos_lease.h | 1 + src/resource.c | 10 ++++++++-- src/sanlock_resource.h | 6 ++++++ src/sanlock_rv.h | 1 + 6 files changed, 29 insertions(+), 4 deletions(-) New commits: commit badc556ff4f0c60b7e82af25fe5d0f362cc41ccb Author: David Teigland <teigland(a)redhat.com> Date: Wed Oct 14 12:02:33 2015 -0500 sanlock: add acquire flag to avoid waiting When a paxos lease is requested that was held by a failed, but not yet expired host, the acquire routine waits for the lease to expire before returning. Add a flag to avoid waiting for the expiration for programs that want to just retry instead of block in sanlock_acquire(). diff --git a/src/cmd.c b/src/cmd.c index 93387c0..64f7f00 100644 --- a/src/cmd.c +++ b/src/cmd.c @@ -126,6 +126,7 @@ static const char *acquire_error_str(int error) case SANLK_ACQUIRE_IDLIVE: case SANLK_ACQUIRE_OWNED: case SANLK_ACQUIRE_OTHER: + case SANLK_ACQUIRE_OWNED_RETRY: return "lease owned by other host"; case SANLK_ACQUIRE_SHRETRY: @@ -181,8 +182,8 @@ static void cmd_acquire(struct task *task, struct cmd_args *ca) new_tokens_count = ca->header.data; - log_debug("cmd_acquire %d,%d,%d ci_in %d fd %d count %d", - cl_ci, cl_fd, cl_pid, ca->ci_in, fd, new_tokens_count); + log_debug("cmd_acquire %d,%d,%d ci_in %d fd %d count %d flags %x", + cl_ci, cl_fd, cl_pid, ca->ci_in, fd, new_tokens_count, ca->header.cmd_flags); if (new_tokens_count > SANLK_MAX_RESOURCES) { log_error("cmd_acquire %d,%d,%d new %d max %d", @@ -396,6 +397,7 @@ static void cmd_acquire(struct task *task, struct cmd_args *ca) case SANLK_ACQUIRE_IDLIVE: case SANLK_ACQUIRE_OWNED: case SANLK_ACQUIRE_OTHER: + case SANLK_ACQUIRE_OWNED_RETRY: lvl = com.quiet_fail ? LOG_DEBUG : LOG_ERR; break; default: diff --git a/src/paxos_lease.c b/src/paxos_lease.c index df60ee6..eae2fd9 100644 --- a/src/paxos_lease.c +++ b/src/paxos_lease.c @@ -1661,6 +1661,15 @@ int paxos_lease_acquire(struct task *task, goto run; } + if (flags & PAXOS_ACQUIRE_OWNER_NOWAIT) { + log_token(token, "paxos_acquire owner %llu %llu %llu no wait", + (unsigned long long)cur_leader.owner_id, + (unsigned long long)cur_leader.owner_generation, + (unsigned long long)cur_leader.timestamp); + error = SANLK_ACQUIRE_OWNED_RETRY; + goto out; + } + skip_live_check: /* TODO: test with sleep(2) here */ sleep(1); diff --git a/src/paxos_lease.h b/src/paxos_lease.h index de5e095..d66a210 100644 --- a/src/paxos_lease.h +++ b/src/paxos_lease.h @@ -12,6 +12,7 @@ #define PAXOS_ACQUIRE_FORCE 0x00000001 #define PAXOS_ACQUIRE_QUIET_FAIL 0x00000002 #define PAXOS_ACQUIRE_SHARED 0x00000004 +#define PAXOS_ACQUIRE_OWNER_NOWAIT 0x00000008 uint32_t leader_checksum(struct leader_record *lr); diff --git a/src/resource.c b/src/resource.c index 15d88a9..b378f62 100644 --- a/src/resource.c +++ b/src/resource.c @@ -569,7 +569,7 @@ int res_get_lvb(struct sanlk_resource *res, char **lvb_out, int *lvblen) static int acquire_disk(struct task *task, struct token *token, uint64_t acquire_lver, int new_num_hosts, - struct leader_record *leader) + int owner_nowait, struct leader_record *leader) { struct leader_record leader_tmp; int rv; @@ -581,6 +581,9 @@ static int acquire_disk(struct task *task, struct token *token, if (token->acquire_flags & SANLK_RES_SHARED) flags |= PAXOS_ACQUIRE_SHARED; + if (owner_nowait) + flags |= PAXOS_ACQUIRE_OWNER_NOWAIT; + memset(&leader_tmp, 0, sizeof(leader_tmp)); rv = paxos_lease_acquire(task, token, flags, &leader_tmp, acquire_lver, @@ -1320,6 +1323,7 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags, int live_count = 0; int allow_orphan = 0; int only_orphan = 0; + int owner_nowait = 0; int rv; if (token->acquire_flags & SANLK_RES_LVER) @@ -1331,6 +1335,8 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags, allow_orphan = 1; if (cmd_flags & SANLK_ACQUIRE_ORPHAN_ONLY) only_orphan = 1; + if (cmd_flags & SANLK_ACQUIRE_OWNER_NOWAIT) + owner_nowait = 1; pthread_mutex_lock(&resource_mutex); @@ -1493,7 +1499,7 @@ int acquire_token(struct task *task, struct token *token, uint32_t cmd_flags, retry: memset(&leader, 0, sizeof(struct leader_record)); - rv = acquire_disk(task, token, acquire_lver, new_num_hosts, &leader); + rv = acquire_disk(task, token, acquire_lver, new_num_hosts, owner_nowait, &leader); if (rv == SANLK_ACQUIRE_IDLIVE || rv == SANLK_ACQUIRE_OWNED || rv == SANLK_ACQUIRE_OTHER) { /* diff --git a/src/sanlock_resource.h b/src/sanlock_resource.h index 624a7d0..d412482 100644 --- a/src/sanlock_resource.h +++ b/src/sanlock_resource.h @@ -43,11 +43,17 @@ * If the lock already exists as an orphan, * then acquire it. Otherwise, do not acquire * a lock at all and return -ENOENT. + * + * SANLK_ACQUIRE_OWNER_NOWAIT + * If the lock cannot be granted immediately + * because the owner's lease needs to time out, do + * not wait, but return -SANLK_ACQUIRE_OWNED_RETRY. */ #define SANLK_ACQUIRE_LVB 0x00000001 #define SANLK_ACQUIRE_ORPHAN 0x00000002 #define SANLK_ACQUIRE_ORPHAN_ONLY 0x00000004 +#define SANLK_ACQUIRE_OWNER_NOWAIT 0x00000008 /* * release flags diff --git a/src/sanlock_rv.h b/src/sanlock_rv.h index 686603a..bafef2b 100644 --- a/src/sanlock_rv.h +++ b/src/sanlock_rv.h @@ -47,6 +47,7 @@ #define SANLK_ACQUIRE_OWNED -244 #define SANLK_ACQUIRE_OTHER -245 #define SANLK_ACQUIRE_SHRETRY -246 +#define SANLK_ACQUIRE_OWNED_RETRY -247 #define SANLK_RELEASE_LVER -250 #define SANLK_RELEASE_OWNER -251

8 years, 6 months

1
0
0 / 0

python/setup.py

by David Teigland

python/setup.py | 1 + 1 file changed, 1 insertion(+) New commits: commit a1929080a6ce51879139eb8d05a425ccd3d37082 Author: David Teigland <teigland(a)redhat.com> Date: Wed Oct 14 13:21:04 2015 -0500 python: add compile flags diff --git a/python/setup.py b/python/setup.py index efc7eff..91311a8 100644 --- a/python/setup.py +++ b/python/setup.py @@ -11,6 +11,7 @@ sanlock = Extension(name='sanlock', sources=['sanlock.c'], include_dirs=['../src'], library_dirs=['../src'], + extra_link_args=['-fPIE', '-Wl,-z,relro,-z,now'], libraries=sanlocklib) version = None

8 years, 6 months

1
0
0 / 0

fence_sanlock/Makefile

by David Teigland

fence_sanlock/Makefile | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) New commits: commit 05fb1a81a3218c9247dd96d7ea261c67f84700cf Author: David Teigland <teigland(a)redhat.com> Date: Wed Oct 14 12:57:52 2015 -0500 fence_sanlock: add flags for build diff --git a/fence_sanlock/Makefile b/fence_sanlock/Makefile index 6b460fe..6e2aa9b 100644 --- a/fence_sanlock/Makefile +++ b/fence_sanlock/Makefile @@ -32,13 +32,15 @@ CFLAGS += -D_GNU_SOURCE -g \ VER=$(shell cat ../VERSION) CFLAGS += -DVERSION=\"$(VER)\" -I../src -I../wdmd +CFLAGS += -fPIE -DPIE -LDFLAGS = -lrt -laio -lblkid -lsanlock -lwdmd +LDFLAGS = -Wl,-z,now -Wl,-z,relro -pie +LDADD = -lrt -laio -lblkid -lsanlock -lwdmd all: $(TARGET1) $(TARGET2) $(TARGET1): $(SOURCE1) - $(CC) $(CFLAGS) $(LDFLAGS) $(SOURCE1) -o $@ -L. -L../src -L../wdmd + $(CC) $(CFLAGS) $(LDFLAGS) $(SOURCE1) $(LDADD) -o $@ -L. -L../src -L../wdmd $(TARGET2): $(SOURCE2) cat $(SOURCE2) | sed \

8 years, 6 months

1
0
0 / 0

src/main.c

by David Teigland

src/main.c | 70 ++++++++++++------------------------------------------------- 1 file changed, 14 insertions(+), 56 deletions(-) New commits: commit fe871876d0c04b4abb67e7f1990bdd96e7556aa5 Author: Nir Soffer <nsoffer(a)redhat.com> Date: Wed Oct 14 12:51:33 2015 -0500 sanlock: setup supplementary groups before daemonizing Sanlock daemon used to setup the supplementary groups after daemonizing. This may be slow on overloaded machines creating a window where sanlock supplementary groups are empty, confusing vdsm-tool. This patch set the supplementary groups before daemonizing, so programs managing sanlock would wait until it the daemon is configured properly, eliminating the race during startup. The child process inherits the supplementary groups set by the parent. The complex code for setting groups was replaced with a call to initgroups(). Signed-off-by: Nir Soffer <nsoffer(a)redhat.com> diff --git a/src/main.c b/src/main.c index 958b57b..7038e4e 100644 --- a/src/main.c +++ b/src/main.c @@ -1431,65 +1431,23 @@ static void setup_limits(void) static void setup_groups(void) { - int rv, i, j, h; - int pngroups, sngroups, ngroups_max; - gid_t *pgroup, *sgroup; + int rv; if (!com.uname || !com.gname) return; - ngroups_max = sysconf(_SC_NGROUPS_MAX); - if (ngroups_max < 0) { - log_error("cannot get the max number of groups %i", errno); - return; - } - - pgroup = malloc(ngroups_max * 2 * sizeof(gid_t)); - if (!pgroup) { - log_error("cannot malloc the group list %i", errno); - exit(EXIT_FAILURE); - } - - pngroups = getgroups(ngroups_max, pgroup); - if (pngroups < 0) { - log_error("cannot get the process groups %i", errno); - goto out; - } - - sgroup = pgroup + ngroups_max; - sngroups = ngroups_max; - - rv = getgrouplist(com.uname, com.gid, sgroup, &sngroups); + rv = initgroups(com.uname, com.gid); if (rv < 0) { - log_error("cannot get the user %s groups %i", com.uname, errno); - goto out; + log_error("error initializing groups errno %i", errno); } +} - for (i = 0, j = pngroups; i < sngroups; i++) { - if (j >= ngroups_max) { - log_error("too many groups for the user %s", com.uname); - break; - } - - /* check if the groups is already present in the list */ - for (h = 0; h < j; h++) { - if (pgroup[h] == sgroup[i]) { - goto skip_gid; - } - } - - pgroup[j] = sgroup[i]; - j++; - - skip_gid: - ; /* skipping the gid because it's already present */ - } +static void setup_uid_gid(void) +{ + int rv; - rv = setgroups(j, pgroup); - if (rv < 0) { - log_error("cannot set the user %s groups %i", com.uname, errno); - goto out; - } + if (!com.uname || !com.gname) + return; rv = setgid(com.gid); if (rv < 0) { @@ -1509,9 +1467,6 @@ static void setup_groups(void) if (rv < 0) { log_error("cannot set dumpable process errno %i", errno); } - - out: - free(pgroup); } static void setup_signals(void) @@ -1660,9 +1615,12 @@ static int do_daemon(void) { int fd, rv; - /* TODO: copy comprehensive daemonization method from libvirtd */ + + /* This can take a while so do it before forking. */ + setup_groups(); if (!com.debug) { + /* TODO: copy comprehensive daemonization method from libvirtd */ if (daemon(0, 0) < 0) { log_tool("cannot fork daemon\n"); exit(EXIT_FAILURE); @@ -1699,7 +1657,7 @@ static int do_daemon(void) setup_host_name(); - setup_groups(); + setup_uid_gid(); log_level(0, 0, NULL, LOG_WARNING, "sanlock daemon started %s host %s", VERSION, our_host_name_global);

8 years, 6 months

1
0
0 / 0

[PATCH] Setup supplementary groups before daemonizing

by Nir Soffer

This is a better and simpler version of "Collect supplementary groups before daemonizing" I sent earlier today. Sanlock daemon used to setup the supplementary groups after daemonizing. This may be slow on overloaded machines creating a window where sanlock supplementary groups are empty, confusing vdsm-tool. This patch set the supplementary groups before daemonizing, so programs managing sanlock would wait until it the daemon is configured properly, eliminating the race during startup. The child process inherits the supplementary groups set by the parent. The complex code for setting groups was replaced with a call to initgroups(). Signed-off-by: Nir Soffer <nsoffer(a)redhat.com> Please find the patch on github (git send-email trouble); https://github.com/nirs/sanlock/commit/db6f765a3f8ee4c22c818be90a765ac8be... Regards, Nir

8 years, 6 months

1
0
0 / 0

[PATCH] Collect supplementary groups before daemonizing

by Nir Soffer

Sanlock daemon used to collect and set the supplementary groups after daemonizing. Collecting the groups may be slow on overloaded machines creating a window where sanlock supplementary groups are not empty. This was confusing vdsm-tool while it tried to configure sanlock. This patch collect the supplementary groups before daemonizing, so programs managing sanlock would wait until it collected the necessary information for the daemon, minimizing the window where the daemon is running is not configured properly yet. We ignore now the current process groups since we are interested only in the effective user supplementary groups. This change also drop the complex code needed to eliminate duplicates between current process groups and user supplementary groups. This change also fix the log when getgrouplist() fails; according to the manual, this means the number of groups is bigger than the requested size. git send-email had trouble sending the patch to the list; please check the patch on github: https://github.com/nirs/sanlock/commit/4c314e5b6e176e95ef60033184089c29d9... Regards, Nir

8 years, 6 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

sanlock-devel October 2015