[irqbalance: 1/2] Make irqbalance scan for new irqs when it detects new irqs (bz832815)
Petr Holasek
pholasek at fedoraproject.org
Thu Aug 23 13:31:25 UTC 2012
commit 914ea2898002ce9760a95d023ecd4bfa02527af3
Author: Petr Holasek <pholasek at redhat.com>
Date: Thu Aug 23 15:27:51 2012 +0200
Make irqbalance scan for new irqs when it detects new irqs (bz832815)
- Fixes SIGFPE crash for some banning configuration (bz849792)
- Fixes affinity_hint values processing (bz832815)
- Adds banirq and bansript options (bz837049)
- imake isn't needed for building any more (bz844359)
- Fixes clogging of syslog (bz837646)
- Added IRQBALANCE_ARGS variable for passing arguments via systemd(bz837048)
- Fixes --hint-policy=subset behavior (bz844381)
0001-Add-sample-irqbalance-environment-file.patch | 74 +++++++
0002-introduce-banirq-option.patch | 172 +++++++++++++++
...ANCE_BANNED_CPUS-is-set-proc-stat-is-not-.patch | 44 ++++
...ance-scan-for-new-irqs-when-it-detects-ne.patch | 91 ++++++++
0005-Add-banscript-option.patch | 218 ++++++++++++++++++++
...cpu-powersave-code-disabled-when-power_th.patch | 41 ++++
...ity-hint-also-if-the-current-policy-is-su.patch | 103 +++++++++
...ed-check-for-avoidance-of-division-by-zer.patch | 31 +++
irqbalance-scan-for-new-irqs.patch | 90 ++++++++
irqbalance.spec | 48 +++--
10 files changed, 896 insertions(+), 16 deletions(-)
---
diff --git a/0001-Add-sample-irqbalance-environment-file.patch b/0001-Add-sample-irqbalance-environment-file.patch
new file mode 100644
index 0000000..ec6f25e
--- /dev/null
+++ b/0001-Add-sample-irqbalance-environment-file.patch
@@ -0,0 +1,74 @@
+From 626dded557de1e7b90cb847df9e900d40be5af1a Mon Sep 17 00:00:00 2001
+From: Neil Horman <nhorman at tuxdriver.com>
+Date: Wed, 14 Dec 2011 07:09:07 -0500
+Subject: [PATCH 1/8] Add sample irqbalance environment file
+
+It was pointed out that the example systemd unit file pointed to a corresponding
+environment file that had no sample. Fix that up, and modify the unit file to
+pass available option via environment variables rather than command line options
+since that looks a little cleaner.
+
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+
+add irqbalance args variable to env file
+
+Allow users to pass general arguments to irqbalance through systemd
+
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ misc/irqbalance.env | 26 ++++++++++++++++++++++++++
+ misc/irqbalance.service | 5 ++---
+ 2 files changed, 28 insertions(+), 3 deletions(-)
+ create mode 100644 misc/irqbalance.env
+
+diff --git a/misc/irqbalance.env b/misc/irqbalance.env
+new file mode 100644
+index 0000000..bd87e3d
+--- /dev/null
++++ b/misc/irqbalance.env
+@@ -0,0 +1,26 @@
++# irqbalance is a daemon process that distributes interrupts across
++# CPUS on SMP systems. The default is to rebalance once every 10
++# seconds. This is the environment file that is specified to systemd via the
++# EnvironmentFile key in the service unit file (or via whatever method the init
++# system you're using has.
++#
++# ONESHOT=yes
++# after starting, wait for a minute, then look at the interrupt
++# load and balance it once; after balancing exit and do not change
++# it again.
++#IRQBALANCE_ONESHOT=
++
++#
++# IRQBALANCE_BANNED_CPUS
++# 64 bit bitmask which allows you to indicate which cpu's should
++# be skipped when reblancing irqs. Cpu numbers which have their
++# corresponding bits set to one in this mask will not have any
++# irq's assigned to them on rebalance
++#
++#IRQBALANCE_BANNED_CPUS=
++
++#
++# IRQBALANCE_ARGS
++# append any args here to the irqbalance daemon as documented in the man page
++#
++#IRQBALANCE_ARGS=
+diff --git a/misc/irqbalance.service b/misc/irqbalance.service
+index f349616..aae2b03 100644
+--- a/misc/irqbalance.service
++++ b/misc/irqbalance.service
+@@ -3,9 +3,8 @@ Description=irqbalance daemon
+ After=syslog.target
+
+ [Service]
+-EnvironmentFile=/etc/sysconfig/irqbalance
+-Type=forking
+-ExecStart=/usr/sbin/irqbalance $ONESHOT
++EnvironmentFile=/path/to/irqbalance.env
++ExecStart=/usr/sbin/irqbalance $IRQBALANCE_ARGS
+
+ [Install]
+ WantedBy=multi-user.target
+--
+1.7.11.4
+
diff --git a/0002-introduce-banirq-option.patch b/0002-introduce-banirq-option.patch
new file mode 100644
index 0000000..137de84
--- /dev/null
+++ b/0002-introduce-banirq-option.patch
@@ -0,0 +1,172 @@
+From 4da232bbf763e535ec2512087aa9ac8a96fba3d9 Mon Sep 17 00:00:00 2001
+From: Neil Horman <nhorman at tuxdriver.com>
+Date: Fri, 17 Feb 2012 14:27:11 -0500
+Subject: [PATCH 2/8] introduce banirq option
+
+Fixing bug http://code.google.com/p/irqbalance/issues/detail?id=25
+
+It was pointed out that during the rewrite of irqbalance I inadvertently removed
+the support for the IRQBALANCE_BANNED_IRQS environment variable. While going to
+return it to the build, it occured to me that, given the availability of msi[x]
+irqs, a single system can literally have thousands of interrupt sources, making
+the environment variable a non-scalable solution. Instead I'm adding a new
+option, banirqs, which takes its place. It lets you build a list of irqs that
+you want irqbalance to leave alone.
+
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ classify.c | 32 ++++++++++++++++++++++++++++++++
+ irqbalance.1 | 11 +++++++----
+ irqbalance.c | 15 ++++++++++++---
+ irqbalance.h | 1 +
+ 4 files changed, 52 insertions(+), 7 deletions(-)
+
+diff --git a/classify.c b/classify.c
+index 124dab0..d59da7f 100644
+--- a/classify.c
++++ b/classify.c
+@@ -52,6 +52,7 @@ static short class_codes[MAX_CLASS] = {
+ };
+
+ static GList *interrupts_db;
++static GList *banned_irqs;
+
+ #define SYSDEV_DIR "/sys/bus/pci/devices"
+
+@@ -63,6 +64,30 @@ static gint compare_ints(gconstpointer a, gconstpointer b)
+ return ai->irq - bi->irq;
+ }
+
++void add_banned_irq(int irq)
++{
++ struct irq_info find, *new;
++ GList *entry;
++
++ find.irq = irq;
++ entry = g_list_find_custom(banned_irqs, &find, compare_ints);
++ if (entry)
++ return;
++
++ new = calloc(sizeof(struct irq_info), 1);
++ if (!new) {
++ if (debug_mode)
++ printf("No memory to ban irq %d\n", irq);
++ return;
++ }
++
++ new->irq = irq;
++
++ banned_irqs = g_list_append(banned_irqs, new);
++ return;
++}
++
++
+ /*
+ * Inserts an irq_info struct into the intterupts_db list
+ * devpath points to the device directory in sysfs for the
+@@ -90,6 +115,13 @@ static struct irq_info *add_one_irq_to_db(const char *devpath, int irq)
+ return NULL;
+ }
+
++ entry = g_list_find_custom(banned_irqs, &find, compare_ints);
++ if (entry) {
++ if (debug_mode)
++ printf("SKIPPING BANNED IRQ %d\n", irq);
++ return NULL;
++ }
++
+ new = calloc(sizeof(struct irq_info), 1);
+ if (!new)
+ return NULL;
+diff --git a/irqbalance.1 b/irqbalance.1
+index 55fc15f..978c7c1 100644
+--- a/irqbalance.1
++++ b/irqbalance.1
+@@ -62,6 +62,13 @@ average cpu softirq workload, and no cpus are more than 1 standard deviation
+ above (and have more than 1 irq assigned to them), attempt to place 1 cpu in
+ powersave mode. In powersave mode, a cpu will not have any irqs balanced to it,
+ in an effort to prevent that cpu from waking up without need.
++
++.TP
++.B --banirq=<irqnum>
++Add the specified irq list to the set of banned irqs. irqbalance will not affect
++the affinity of any irqs on the banned list, allowing them to be specified
++manually. This option is addative and can be specified multiple times
++
+ .SH "ENVIRONMENT VARIABLES"
+ .TP
+ .B IRQBALANCE_ONESHOT
+@@ -75,10 +82,6 @@ Same as --debug
+ .B IRQBALANCE_BANNED_CPUS
+ Provides a mask of cpus which irqbalance should ignore and never assign interrupts to
+
+-.TP
+-.B IRQBALANCE_BANNED_INTERRUPTS
+-A list of space delimited IRQ numbers that irqbalance should not touch
+-
+ .SH "Homepage"
+ http://code.google.com/p/irqbalance
+
+diff --git a/irqbalance.c b/irqbalance.c
+index 99c5db7..c613e2b 100644
+--- a/irqbalance.c
++++ b/irqbalance.c
+@@ -72,7 +72,7 @@ struct option lopts[] = {
+ static void usage(void)
+ {
+ printf("irqbalance [--oneshot | -o] [--debug | -d] [--hintpolicy= | -h [exact|subset|ignore]]\n");
+- printf(" [--powerthresh= | -p <off> | <n>]\n");
++ printf(" [--powerthresh= | -p <off> | <n>] [--banirq= | -i <n>]\n");
+ }
+
+ static void parse_command_line(int argc, char **argv)
+@@ -81,7 +81,7 @@ static void parse_command_line(int argc, char **argv)
+ int longind;
+
+ while ((opt = getopt_long(argc, argv,
+- "odh:p:",
++ "odh:p:b:",
+ lopts, &longind)) != -1) {
+
+ switch(opt) {
+@@ -103,6 +103,14 @@ static void parse_command_line(int argc, char **argv)
+ exit(1);
+ }
+ break;
++ case 'i':
++ val = strtoull(optarg, NULL, 10);
++ if (val == ULONG_MAX) {
++ usage();
++ exit(1);
++ }
++ add_banned_irq((int)val);
++ break;
+ case 'p':
+ if (!strncmp(optarg, "off", strlen(optarg)))
+ power_thresh = ULONG_MAX;
+@@ -179,8 +187,9 @@ int main(int argc, char** argv)
+ #ifdef HAVE_GETOPT_LONG
+ parse_command_line(argc, argv);
+ #else
+- if (argc>1 && strstr(argv[1],"--debug"))
++ if (argc>1 && strstr(argv[1],"--debug")) {
+ debug_mode=1;
++ }
+ if (argc>1 && strstr(argv[1],"--oneshot"))
+ one_shot_mode=1;
+ #endif
+diff --git a/irqbalance.h b/irqbalance.h
+index 4e85325..956aa8c 100644
+--- a/irqbalance.h
++++ b/irqbalance.h
+@@ -103,6 +103,7 @@ extern int get_cpu_count(void);
+ */
+ extern void rebuild_irq_db(void);
+ extern void free_irq_db(void);
++extern void add_banned_irq(int irq);
+ extern void for_each_irq(GList *list, void (*cb)(struct irq_info *info, void *data), void *data);
+ extern struct irq_info *get_irq_info(int irq);
+ extern void migrate_irq(GList **from, GList **to, struct irq_info *info);
+--
+1.7.11.4
+
diff --git a/0003-When-IRQBALANCE_BANNED_CPUS-is-set-proc-stat-is-not-.patch b/0003-When-IRQBALANCE_BANNED_CPUS-is-set-proc-stat-is-not-.patch
new file mode 100644
index 0000000..3eac789
--- /dev/null
+++ b/0003-When-IRQBALANCE_BANNED_CPUS-is-set-proc-stat-is-not-.patch
@@ -0,0 +1,44 @@
+From 718561bc79c095909f0c9d3fb2f0c1c163478b1e Mon Sep 17 00:00:00 2001
+From: Petr Holasek <pholasek at redhat.com>
+Date: Mon, 20 Feb 2012 16:59:05 +0100
+Subject: [PATCH 3/8] When IRQBALANCE_BANNED_CPUS is set, /proc/stat is not
+ parsed properly.
+
+proc stats counts all the cpus in /proc/stat, but compares that number to the
+value in get_cpu_count(), which returns the number of cpus actively being
+balanced. Since that value doesn't include banned cpus, its incorrect. Since
+we don't want to measure the load on banned cpus anyway, just skip those lines
+so cpucount doesn't increment and the count remains equal.
+
+Signed-off-by: Petr Holasek <pholasek at redhat.com>
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ procinterrupts.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/procinterrupts.c b/procinterrupts.c
+index 4d3b07b..c032caf 100644
+--- a/procinterrupts.c
++++ b/procinterrupts.c
+@@ -32,6 +32,8 @@
+
+ #define LINESIZE 4096
+
++extern cpumask_t banned_cpus;
++
+ static int proc_int_has_msi = 0;
+ static int msi_found_in_sysfs = 0;
+
+@@ -217,6 +219,9 @@ void parse_proc_stat(void)
+
+ cpunr = strtoul(&line[3], NULL, 10);
+
++ if (cpu_isset(cpunr, banned_cpus))
++ continue;
++
+ rc = sscanf(line, "%*s %*d %*d %*d %*d %*d %d %d", &irq_load, &softirq_load);
+ if (rc < 2)
+ break;
+--
+1.7.11.4
+
diff --git a/0004-Make-irqbalance-scan-for-new-irqs-when-it-detects-ne.patch b/0004-Make-irqbalance-scan-for-new-irqs-when-it-detects-ne.patch
new file mode 100644
index 0000000..045892e
--- /dev/null
+++ b/0004-Make-irqbalance-scan-for-new-irqs-when-it-detects-ne.patch
@@ -0,0 +1,91 @@
+From 0edc531b0a2ebb41eb5cf49168e2897640cba0ec Mon Sep 17 00:00:00 2001
+From: Neil Horman <nhorman at tuxdriver.com>
+Date: Mon, 2 Jul 2012 13:27:14 -0400
+Subject: [PATCH 4/8] Make irqbalance scan for new irqs when it detects new
+ irqs
+
+Like cpu hotplug, irqbalance needs to rebuild its topo map and irq db when it
+detects new irqs in the system. This patch adds that ability
+
+Resolves: http://code.google.com/p/irqbalance/issues/detail?id=32
+
+Singed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ irqbalance.c | 6 +++---
+ irqbalance.h | 2 +-
+ procinterrupts.c | 14 ++++++++++++--
+ 3 files changed, 16 insertions(+), 6 deletions(-)
+
+diff --git a/irqbalance.c b/irqbalance.c
+index c613e2b..5d40321 100644
+--- a/irqbalance.c
++++ b/irqbalance.c
+@@ -40,7 +40,7 @@ volatile int keep_going = 1;
+ int one_shot_mode;
+ int debug_mode;
+ int numa_avail;
+-int need_cpu_rescan;
++int need_rescan;
+ extern cpumask_t banned_cpus;
+ enum hp_e hint_policy = HINT_POLICY_SUBSET;
+ unsigned long power_thresh = ULONG_MAX;
+@@ -256,8 +256,8 @@ int main(int argc, char** argv)
+ parse_proc_stat();
+
+ /* cope with cpu hotplug -- detected during /proc/interrupts parsing */
+- if (need_cpu_rescan) {
+- need_cpu_rescan = 0;
++ if (need_rescan) {
++ need_rescan = 0;
+ /* if there's a hotplug event we better turn off power mode for a bit until things settle */
+ power_mode = 0;
+ if (debug_mode)
+diff --git a/irqbalance.h b/irqbalance.h
+index 956aa8c..043bfe6 100644
+--- a/irqbalance.h
++++ b/irqbalance.h
+@@ -64,7 +64,7 @@ enum hp_e {
+ extern int debug_mode;
+ extern int one_shot_mode;
+ extern int power_mode;
+-extern int need_cpu_rescan;
++extern int need_rescan;
+ extern enum hp_e hint_policy;
+ extern unsigned long long cycle_count;
+ extern unsigned long power_thresh;
+diff --git a/procinterrupts.c b/procinterrupts.c
+index c032caf..4559b16 100644
+--- a/procinterrupts.c
++++ b/procinterrupts.c
+@@ -82,8 +82,18 @@ void parse_proc_interrupts(void)
+ c++;
+ number = strtoul(line, NULL, 10);
+ info = get_irq_info(number);
+- if (!info)
++ if (!info) {
++ /*
++ * If this is our 0th pass through this routine
++ * this is an irq that wasn't reported in sysfs
++ * and we should just add it. If we've been running
++ * a while then this irq just appeared and its time
++ * to rescan our irqs
++ */
++ if (cycle_count)
++ need_rescan = 1;
+ info = add_misc_irq(number);
++ }
+
+ count = 0;
+ cpunr = 0;
+@@ -99,7 +109,7 @@ void parse_proc_interrupts(void)
+ cpunr++;
+ }
+ if (cpunr != core_count)
+- need_cpu_rescan = 1;
++ need_rescan = 1;
+
+ info->last_irq_count = info->irq_count;
+ info->irq_count = count;
+--
+1.7.11.4
+
diff --git a/0005-Add-banscript-option.patch b/0005-Add-banscript-option.patch
new file mode 100644
index 0000000..bf02d1e
--- /dev/null
+++ b/0005-Add-banscript-option.patch
@@ -0,0 +1,218 @@
+From b18eb8f6b28cc9b0816be0fb8fe3468c9f64f345 Mon Sep 17 00:00:00 2001
+From: Neil Horman <nhorman at tuxdriver.com>
+Date: Thu, 5 Jul 2012 14:54:35 -0400
+Subject: [PATCH 5/8] Add banscript option
+
+Its been requested in several different ways, that irqbalance have a more robust
+mechanism for setting balancing policy at run time. While I don't feel its
+apropriate to have irqbalance be able to implement arbitrary balance policy
+(having a flexible mechanism to define which irqs should be placed where can
+become exceedingly complex), I do think we need some mechanism that easily
+allows users to dynamically exclude irqs from the irqbalance policy at run time.
+The banscript option does exactly this. It allows the user to point irqbalance
+toward an exacutable file that is run one for each irq deiscovered passing the
+sysfs path of the device and an irq vector as arguments. A zero exit code tells
+irqbalance to manage the irq as it normally would, while a non-zero exit tells
+irqbalance to ignore the interrupt entirely. This provides adminstrators a code
+point with which to exclude irqs dynamically based on any programatic
+informatino available, and to manage those irqs independently, etither via
+another irqbalance like program, or via static affinity setting.
+
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+
+Reesolves: http://code.google.com/p/irqbalance/issues/detail?id=33
+---
+ classify.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
+ irqbalance.1 | 11 +++++++++++
+ irqbalance.c | 25 +++++++++++++++++++++----
+ irqbalance.h | 1 +
+ 4 files changed, 79 insertions(+), 4 deletions(-)
+
+diff --git a/classify.c b/classify.c
+index d59da7f..750d946 100644
+--- a/classify.c
++++ b/classify.c
+@@ -207,6 +207,43 @@ out:
+ return new;
+ }
+
++static int check_for_irq_ban(char *path, int irq)
++{
++ char *cmd;
++ int rc;
++
++ if (!banscript)
++ return 0;
++
++ cmd = alloca(strlen(path)+strlen(banscript)+32);
++ if (!cmd)
++ return 0;
++
++ sprintf(cmd, "%s %s %d",banscript, path, irq);
++ rc = system(cmd);
++
++ /*
++ * The system command itself failed
++ */
++ if (rc == -1) {
++ if (debug_mode)
++ printf("%s failed, please check the --banscript option\n", cmd);
++ else
++ syslog(LOG_INFO, "%s failed, please check the --banscript option\n", cmd);
++ return 0;
++ }
++
++ if (WEXITSTATUS(rc)) {
++ if (debug_mode)
++ printf("irq %d is baned by %s\n", irq, banscript);
++ else
++ syslog(LOG_INFO, "irq %d is baned by %s\n", irq, banscript);
++ return 1;
++ }
++ return 0;
++
++}
++
+ /*
+ * Figures out which interrupt(s) relate to the device we're looking at in dirname
+ */
+@@ -231,6 +268,10 @@ static void build_one_dev_entry(const char *dirname)
+ irqnum = strtol(entry->d_name, NULL, 10);
+ if (irqnum) {
+ sprintf(path, "%s/%s", SYSDEV_DIR, dirname);
++ if (check_for_irq_ban(path, irqnum)) {
++ add_banned_irq(irqnum);
++ continue;
++ }
+ new = add_one_irq_to_db(path, irqnum);
+ if (!new)
+ continue;
+@@ -253,6 +294,11 @@ static void build_one_dev_entry(const char *dirname)
+ */
+ if (irqnum) {
+ sprintf(path, "%s/%s", SYSDEV_DIR, dirname);
++ if (check_for_irq_ban(path, irqnum)) {
++ add_banned_irq(irqnum);
++ goto done;
++ }
++
+ new = add_one_irq_to_db(path, irqnum);
+ if (!new)
+ goto done;
+diff --git a/irqbalance.1 b/irqbalance.1
+index 978c7c1..63b0e26 100644
+--- a/irqbalance.1
++++ b/irqbalance.1
+@@ -69,6 +69,17 @@ Add the specified irq list to the set of banned irqs. irqbalance will not affect
+ the affinity of any irqs on the banned list, allowing them to be specified
+ manually. This option is addative and can be specified multiple times
+
++.TP
++.B --banscript=<script>
++Execute the specified script for each irq that is discovered, passing the sysfs
++path to the associated device as the first argument, and the irq vector as the
++second. An exit value of 0 tells irqbalance that this interrupt should balanced
++and managed as a normal irq, while a non-zero exit code indicates this irq
++should be ignored by irqbalance completely (see --banirq above). Use of this
++script provides users the ability to dynamically select which irqs get exluded
++from balancing, and provides an opportunity for manual affinity setting in one
++single code point.
++
+ .SH "ENVIRONMENT VARIABLES"
+ .TP
+ .B IRQBALANCE_ONESHOT
+diff --git a/irqbalance.c b/irqbalance.c
+index 5d40321..0184f0f 100644
+--- a/irqbalance.c
++++ b/irqbalance.c
+@@ -1,5 +1,6 @@
+ /*
+ * Copyright (C) 2006, Intel Corporation
++ * Copyright (C) 2012, Neil Horman <nhorman at tuxdriver.com>
+ *
+ * This file is part of irqbalance
+ *
+@@ -45,6 +46,7 @@ extern cpumask_t banned_cpus;
+ enum hp_e hint_policy = HINT_POLICY_SUBSET;
+ unsigned long power_thresh = ULONG_MAX;
+ unsigned long long cycle_count = 0;
++char *banscript = NULL;
+
+ void sleep_approx(int seconds)
+ {
+@@ -66,6 +68,8 @@ struct option lopts[] = {
+ {"debug", 0, NULL, 'd'},
+ {"hintpolicy", 1, NULL, 'h'},
+ {"powerthresh", 1, NULL, 'p'},
++ {"banirq", 1 , NULL, 'i'},
++ {"banscript", 1, NULL, 'b'},
+ {0, 0, 0, 0}
+ };
+
+@@ -79,9 +83,10 @@ static void parse_command_line(int argc, char **argv)
+ {
+ int opt;
+ int longind;
++ unsigned long val;
+
+ while ((opt = getopt_long(argc, argv,
+- "odh:p:b:",
++ "odh:i:p:b:",
+ lopts, &longind)) != -1) {
+
+ switch(opt) {
+@@ -193,6 +198,12 @@ int main(int argc, char** argv)
+ if (argc>1 && strstr(argv[1],"--oneshot"))
+ one_shot_mode=1;
+ #endif
++
++ /*
++ * Open the syslog connection
++ */
++ openlog(argv[0], 0, LOG_DAEMON);
++
+ if (getenv("IRQBALANCE_BANNED_CPUS")) {
+ cpumask_parse_user(getenv("IRQBALANCE_BANNED_CPUS"), strlen(getenv("IRQBALANCE_BANNED_CPUS")), banned_cpus);
+ }
+@@ -221,8 +232,16 @@ int main(int argc, char** argv)
+
+
+ /* On single core UP systems irqbalance obviously has no work to do */
+- if (core_count<2)
++ if (core_count<2) {
++ char *msg = "Balancing is ineffective on systems with a "
++ "single cache domain. Shutting down\n";
++
++ if (debug_mode)
++ printf("%s", msg);
++ else
++ syslog(LOG_INFO, "%s", msg);
+ exit(EXIT_SUCCESS);
++ }
+ /* On dual core/hyperthreading shared cache systems just do a one shot setup */
+ if (cache_domain_count==1)
+ one_shot_mode = 1;
+@@ -231,8 +250,6 @@ int main(int argc, char** argv)
+ if (daemon(0,0))
+ exit(EXIT_FAILURE);
+
+- openlog(argv[0], 0, LOG_DAEMON);
+-
+ #ifdef HAVE_LIBCAP_NG
+ // Drop capabilities
+ capng_clear(CAPNG_SELECT_BOTH);
+diff --git a/irqbalance.h b/irqbalance.h
+index 043bfe6..425e0dd 100644
+--- a/irqbalance.h
++++ b/irqbalance.h
+@@ -68,6 +68,7 @@ extern int need_rescan;
+ extern enum hp_e hint_policy;
+ extern unsigned long long cycle_count;
+ extern unsigned long power_thresh;
++extern char *banscript;
+
+ /*
+ * Numa node access routines
+--
+1.7.11.4
+
diff --git a/0006-irqbalance-cpu-powersave-code-disabled-when-power_th.patch b/0006-irqbalance-cpu-powersave-code-disabled-when-power_th.patch
new file mode 100644
index 0000000..8e00a63
--- /dev/null
+++ b/0006-irqbalance-cpu-powersave-code-disabled-when-power_th.patch
@@ -0,0 +1,41 @@
+From ab5ee2928b75f12a2340afe6778a106886509b4c Mon Sep 17 00:00:00 2001
+From: Petr Holasek <pholasek at redhat.com>
+Date: Thu, 12 Jul 2012 14:54:16 +0200
+Subject: [PATCH 6/8] irqbalance: cpu powersave code disabled when
+ power_thresh is not set
+
+When user doesn't set power_thresh argument no cpu can enter powersave
+mode. This patch should remove syslog clogging with pointless message
+about re-enabling all cpus for irq balancing.
+
+Signed-off-by: Petr Holasek <pholasek at redhat.com>
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ irqlist.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/irqlist.c b/irqlist.c
+index c29ee84..e03aa7b 100644
+--- a/irqlist.c
++++ b/irqlist.c
+@@ -112,7 +112,7 @@ static void migrate_overloaded_irqs(struct topo_obj *obj, void *data)
+ if (obj->load <= info->avg_load) {
+ if ((obj->load + info->std_deviation) <= info->avg_load) {
+ info->num_under++;
+- if (!info->powersave)
++ if (power_thresh != ULONG_MAX && !info->powersave)
+ if (!obj->powersave_mode)
+ info->powersave = obj;
+ } else
+@@ -172,7 +172,7 @@ void update_migration_status(void)
+ {
+ struct load_balance_info info;
+ find_overloaded_objs(cpus, info);
+- if (cycle_count > 5) {
++ if (power_thresh != ULONG_MAX && cycle_count > 5) {
+ if (!info.num_over && (info.num_under >= power_thresh) && info.powersave) {
+ syslog(LOG_INFO, "cpu %d entering powersave mode\n", info.powersave->number);
+ info.powersave->powersave_mode = 1;
+--
+1.7.11.4
+
diff --git a/0007-apply-affinity-hint-also-if-the-current-policy-is-su.patch b/0007-apply-affinity-hint-also-if-the-current-policy-is-su.patch
new file mode 100644
index 0000000..66618c3
--- /dev/null
+++ b/0007-apply-affinity-hint-also-if-the-current-policy-is-su.patch
@@ -0,0 +1,103 @@
+From 7475c3e26d14bb210eb3524396adef77021e696f Mon Sep 17 00:00:00 2001
+From: Paolo Bonzini <pbonzini at redhat.com>
+Date: Tue, 7 Aug 2012 02:54:34 -0400
+Subject: [PATCH 7/8] apply affinity hint also if the current policy is subset
+
+--hintpolicy=subset chooses an object that has a non-empty intersection
+with the affinity hint, but it never restricts the object's CPU mask
+with the hint itself. As a result, there is no guarantee that the
+object's CPU mask is a subset of the hint.
+
+This is visible for interrupts whose balancing policy is not BALANCE_CORE.
+For example, if there is only one cache domain and the interrupt's policy
+is BALANCE_CACHE, the chosen object will correspond to "all CPUs" and
+the affinity hint will be effectively ignored.
+
+Signed-off-by: Paolo Bonzini <pbonzini at redhat.com>
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ activate.c | 45 ++++++++++++++++++++++++++++++++++++++-------
+ 1 file changed, 38 insertions(+), 7 deletions(-)
+
+diff --git a/activate.c b/activate.c
+index 292c44a..97d84a8 100644
+--- a/activate.c
++++ b/activate.c
+@@ -1,5 +1,6 @@
+ /*
+ * Copyright (C) 2006, Intel Corporation
++ * Copyright (C) 2012, Neil Horman <nhorman at tuxdriver.com>
+ *
+ * This file is part of irqbalance
+ *
+@@ -31,17 +32,53 @@
+
+ #include "irqbalance.h"
+
++static int check_affinity(struct irq_info *info, cpumask_t applied_mask)
++{
++ cpumask_t current_mask;
++ char buf[PATH_MAX];
++ char *line = NULL;
++ size_t size = 0;
++ FILE *file;
++
++ sprintf(buf, "/proc/irq/%i/smp_affinity", info->irq);
++ file = fopen(buf, "r");
++ if (!file)
++ return 1;
++ if (getline(&line, &size, file)==0) {
++ free(line);
++ fclose(file);
++ return 1;
++ }
++ cpumask_parse_user(line, strlen(line), current_mask);
++ fclose(file);
++ free(line);
++
++ return cpus_equal(applied_mask, current_mask);
++}
+
+ static void activate_mapping(struct irq_info *info, void *data __attribute__((unused)))
+ {
+ char buf[PATH_MAX];
+ FILE *file;
+ cpumask_t applied_mask;
++ int valid_mask = 0;
++
++ if ((hint_policy == HINT_POLICY_EXACT) &&
++ (!cpus_empty(info->affinity_hint))) {
++ applied_mask = info->affinity_hint;
++ valid_mask = 1;
++ } else if (info->assigned_obj) {
++ applied_mask = info->assigned_obj->mask;
++ valid_mask = 1;
++ if ((hint_policy == HINT_POLICY_SUBSET) &&
++ (!cpus_empty(info->affinity_hint)))
++ cpus_and(applied_mask, applied_mask, info->affinity_hint);
++ }
+
+ /*
+ * only activate mappings for irqs that have moved
+ */
+- if (!info->moved)
++ if (!info->moved && (!valid_mask || check_affinity(info, applied_mask)))
+ return;
+
+ if (!info->assigned_obj)
+@@ -53,12 +90,6 @@ static void activate_mapping(struct irq_info *info, void *data __attribute__((un
+ if (!file)
+ return;
+
+- if ((hint_policy == HINT_POLICY_EXACT) &&
+- (!cpus_empty(info->affinity_hint)))
+- applied_mask = info->affinity_hint;
+- else
+- applied_mask = info->assigned_obj->mask;
+-
+ cpumask_scnprintf(buf, PATH_MAX, applied_mask);
+ fprintf(file, "%s", buf);
+ fclose(file);
+--
+1.7.11.4
+
diff --git a/0008-irqlist-added-check-for-avoidance-of-division-by-zer.patch b/0008-irqlist-added-check-for-avoidance-of-division-by-zer.patch
new file mode 100644
index 0000000..aab34ea
--- /dev/null
+++ b/0008-irqlist-added-check-for-avoidance-of-division-by-zer.patch
@@ -0,0 +1,31 @@
+From 8285d9a1cac9cf74130ae71df0ddb4ed14122544 Mon Sep 17 00:00:00 2001
+From: Petr Holasek <pholasek at redhat.com>
+Date: Tue, 21 Aug 2012 14:45:57 +0200
+Subject: [PATCH 8/8] irqlist: added check for avoidance of division by zero
+
+When counting load_sources, its occasionally possible to have one of our object
+lists be zero (if you exlude all the cpus from balancing for instance). In
+these cases load_sources can be zero, and that will cause a SIGFPE. Avoid that
+by making sure that load_sources is always at least 1.
+
+Signed-off-by: Petr Holasek <pholasek at redhat.com>
+Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ irqlist.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/irqlist.c b/irqlist.c
+index e03aa7b..c0e0d2b 100644
+--- a/irqlist.c
++++ b/irqlist.c
+@@ -160,6 +160,7 @@ static void clear_powersave_mode(struct topo_obj *obj, void *data __attribute__(
+ int ___load_sources;\
+ memset(&(info), 0, sizeof(struct load_balance_info));\
+ for_each_object((name), gather_load_stats, &(info));\
++ (info).load_sources = ((info).load_sources == 0) ? 1 : ((info).load_sources);\
+ (info).avg_load = (info).total_load / (info).load_sources;\
+ for_each_object((name), compute_deviations, &(info));\
+ ___load_sources = ((info).load_sources == 1) ? 1 : ((info).load_sources - 1);\
+--
+1.7.11.4
+
diff --git a/irqbalance-scan-for-new-irqs.patch b/irqbalance-scan-for-new-irqs.patch
new file mode 100644
index 0000000..43b0b30
--- /dev/null
+++ b/irqbalance-scan-for-new-irqs.patch
@@ -0,0 +1,90 @@
+From 1523a7830cb2670cb531a5fdea86885eaa648eaf Mon Sep 17 00:00:00 2001
+From: Neil Horman <nhorman at tuxdriver.com>
+Date: Mon, 2 Jul 2012 13:27:14 -0400
+Subject: [PATCH] Make irqbalance scan for new irqs when it detects new irqs
+
+Like cpu hotplug, irqbalance needs to rebuild its topo map and irq db when it
+detects new irqs in the system. This patch adds that ability
+
+Resolves: http://code.google.com/p/irqbalance/issues/detail?id=32
+
+Singed-off-by: Neil Horman <nhorman at tuxdriver.com>
+---
+ irqbalance.c | 6 +++---
+ irqbalance.h | 2 +-
+ procinterrupts.c | 14 ++++++++++++--
+ 3 files changed, 16 insertions(+), 6 deletions(-)
+
+diff --git a/irqbalance.c b/irqbalance.c
+index 1fcc367..7ef72af 100644
+--- a/irqbalance.c
++++ b/irqbalance.c
+@@ -46,7 +46,7 @@ int one_shot_mode;
+ int debug_mode;
+ int foreground_mode;
+ int numa_avail;
+-int need_cpu_rescan;
++int need_rescan;
+ extern cpumask_t banned_cpus;
+ enum hp_e hint_policy = HINT_POLICY_SUBSET;
+ unsigned long power_thresh = ULONG_MAX;
+@@ -301,8 +301,8 @@ int main(int argc, char** argv)
+ parse_proc_stat();
+
+ /* cope with cpu hotplug -- detected during /proc/interrupts parsing */
+- if (need_cpu_rescan) {
+- need_cpu_rescan = 0;
++ if (need_rescan) {
++ need_rescan = 0;
+ /* if there's a hotplug event we better turn off power mode for a bit until things settle */
+ power_mode = 0;
+ if (debug_mode)
+diff --git a/irqbalance.h b/irqbalance.h
+index b9b1f06..8ec7c23 100644
+--- a/irqbalance.h
++++ b/irqbalance.h
+@@ -64,7 +64,7 @@ enum hp_e {
+ extern int debug_mode;
+ extern int one_shot_mode;
+ extern int power_mode;
+-extern int need_cpu_rescan;
++extern int need_rescan;
+ extern enum hp_e hint_policy;
+ extern unsigned long long cycle_count;
+ extern unsigned long power_thresh;
+diff --git a/procinterrupts.c b/procinterrupts.c
+index 8ffe30c..f1d6745 100644
+--- a/procinterrupts.c
++++ b/procinterrupts.c
+@@ -83,8 +83,18 @@ void parse_proc_interrupts(void)
+ c++;
+ number = strtoul(line, NULL, 10);
+ info = get_irq_info(number);
+- if (!info)
++ if (!info) {
++ /*
++ * If this is our 0th pass through this routine
++ * this is an irq that wasn't reported in sysfs
++ * and we should just add it. If we've been running
++ * a while then this irq just appeared and its time
++ * to rescan our irqs
++ */
++ if (cycle_count)
++ need_rescan = 1;
+ info = add_misc_irq(number);
++ }
+
+ count = 0;
+ cpunr = 0;
+@@ -100,7 +110,7 @@ void parse_proc_interrupts(void)
+ cpunr++;
+ }
+ if (cpunr != core_count)
+- need_cpu_rescan = 1;
++ need_rescan = 1;
+
+ info->last_irq_count = info->irq_count;
+ info->irq_count = count;
+--
+1.7.11.4
+
diff --git a/irqbalance.spec b/irqbalance.spec
index c714995..329f7d6 100644
--- a/irqbalance.spec
+++ b/irqbalance.spec
@@ -1,6 +1,6 @@
Name: irqbalance
Version: 1.0.3
-Release: 4%{?dist}
+Release: 5%{?dist}
Epoch: 2
Summary: IRQ balancing daemon
@@ -11,7 +11,7 @@ Source0: http://irqbalance.googlecode.com/files/irqbalance-%{version}.tar
Source1: irqbalance.sysconfig
BuildRequires: autoconf automake libtool libcap-ng
-BuildRequires: glib2-devel pkgconfig imake libcap-ng-devel
+BuildRequires: glib2-devel pkgconfig libcap-ng-devel
%ifnarch %{arm}
BuildRequires: numactl-devel numactl-libs
Requires: numactl-libs
@@ -23,12 +23,29 @@ Requires(preun):systemd-units
ExclusiveArch: %{ix86} x86_64 ia64 ppc ppc64 %{arm}
+Patch1: 0001-Add-sample-irqbalance-environment-file.patch
+Patch2: 0002-introduce-banirq-option.patch
+Patch3: 0003-When-IRQBALANCE_BANNED_CPUS-is-set-proc-stat-is-not-.patch
+Patch4: 0004-Make-irqbalance-scan-for-new-irqs-when-it-detects-ne.patch
+Patch5: 0005-Add-banscript-option.patch
+Patch6: 0006-irqbalance-cpu-powersave-code-disabled-when-power_th.patch
+Patch7: 0007-apply-affinity-hint-also-if-the-current-policy-is-su.patch
+Patch8: 0008-irqlist-added-check-for-avoidance-of-division-by-zer.patch
+
%description
irqbalance is a daemon that evenly distributes IRQ load across
multiple CPUs for enhanced performance.
%prep
%setup -q
+%patch1 -p1
+%patch2 -p1
+%patch3 -p1
+%patch4 -p1
+%patch5 -p1
+%patch6 -p1
+%patch7 -p1
+%patch8 -p1
%build
%{configure}
@@ -51,24 +68,13 @@ install -p -m 0644 ./irqbalance.1 %{buildroot}%{_mandir}/man1/
%config(noreplace) %{_sysconfdir}/sysconfig/irqbalance
%post
-if [ $1 -eq 1 ]; then
- # Initial installation
- /bin/systemctl enable irqbalance.service >/dev/null 2>&1 || :
-fi
+%systemd_post irqbalance.service
%preun
-if [ $1 -eq 0 ] ; then
- # Package removal, not upgrade
- /bin/systemctl disable irqbalance.service >/dev/null 2>&1 || :
- /bin/systemctl stop irqbalance.service > /dev/null 2>&1 || :
-fi
+%systemd_preun irqbalance.service
%postun
-/bin/systemctl daemon-reload >/dev/null 2>&1 || :
-if [ $1 -ge 1 ] ; then
- # Package upgrade, not uninstall
- /bin/systemctl try-restart irqbalance.service >/dev/null 2>&1 || :
-fi
+%systemd_postun_with_restart irqbalance.service
%triggerun -- irqbalance < 2:0.56-3
if /sbin/chkconfig --level 3 irqbalance ; then
@@ -77,6 +83,16 @@ fi
/sbin/chkconfig --del irqbalance >/dev/null 2>&1 || :
%changelog
+* Wed Aug 22 2012 Petr Holasek <pholasek at redhat.com> - 2:1.0.3-5
+- Make irqbalance scan for new irqs when it detects new irqs (bz832815)
+- Fixes SIGFPE crash for some banning configuration (bz849792)
+- Fixes affinity_hint values processing (bz832815)
+- Adds banirq and bansript options (bz837049)
+- imake isn't needed for building any more (bz844359)
+- Fixes clogging of syslog (bz837646)
+- Added IRQBALANCE_ARGS variable for passing arguments via systemd(bz837048)
+- Fixes --hint-policy=subset behavior (bz844381)
+
* Sun Apr 15 2012 Petr Holasek <pholasek at redhat.com> - 2:1.0.3-4
- Updated libnuma dependencies
More information about the scm-commits
mailing list