fedora 14 kernel performance with ip forwarding workload
by Jesse Brandeburg
The other day I was running the stock fedora kernel on my ip
forwarding setup, to see what the performance was, and the performance
wasn't very good.
system is S5520HC dual socket 2.93GHz Xeon 5570 (Nehalem) with 3 quad
port 82580 adapters (12 ports). Traffic is bidirectional 64 byte
packets being forwarded and received on each port, basically port to
port routing. I am only using 12 flows currently.
The driver is igb, and I am using an affinity script that lines up
each pair of ports that are forwarding traffic into optimal
configurations for cache locality. I am also disabling
remote_node_defrag_ratio to stop cross node traffic.
With the fedora default kernel from F14 it appears that
CONFIG_NETFILTER=y means that I cannot unload all of netfilter even if
I stop iptables service.
perf showed netfilter being prominent, and removing it gives me much
higher throughput. Is there a reason CONFIG_NETFILTER=y ? Isn't it a
good thing to be able to disable netfilter if you want to?
Jesse
8 years, 3 months
Re: [kernel] Disable debugging options.
by Thorsten Leemhuis
On 18.06.2012 15:09, Josh Boyer wrote:
> commit 3b37beedf48825354c42e25f7001677320958d38
> Author: Josh Boyer <jwboyer(a)redhat.com>
> Date: Mon Jun 18 09:09:58 2012 -0400
>
> Disable debugging options.
>
> config-generic | 8 ++--
> config-nodebug | 112 ++++++++++++++++++++++++++--------------------------
> config-x86-generic | 2 +-
> kernel.spec | 9 +++-
> 4 files changed, 67 insertions(+), 64 deletions(-)
> ---
> diff --git a/config-generic b/config-generic
> index 6b1c651..08c28ba 100644
> --- a/config-generic
> +++ b/config-generic
> @@ -1446,13 +1446,13 @@ CONFIG_B43_SDIO=y
> CONFIG_B43_BCMA=y
> # CONFIG_B43_BCMA_EXTRA is not set
> CONFIG_B43_BCMA_PIO=y
> -CONFIG_B43_DEBUG=y
> +# CONFIG_B43_DEBUG is not set
> [...]
Just out of curiosity: Why is enabling and disabling the debug options
done via "make {no,}debug" in the git checkout and not by an conditional
within the spec file? That afaics makes it harder to switch the debug
options off if you only have the SRPM at hand.
Or am I missing something obvious here?
CU
knurd
10 years, 10 months
CONFIG_USB_UAS
by Gerd Hoffmann
Hi,
Any specific reason why CONFIG_USB_UAS is not enabled?
If not, can it be enabled please (rawhide & f17)?
thanks,
Gerd
11 years, 1 month
NX cannot be enabled: non-PAE kernel
by Michael Zintakis
I have compiled a PAE kernel and installed it on my system (it *is* a
PAE kernel, I am sure of it :-) ), but when I boot up I see the
following message: "Notice: NX (Execute Disable) protection cannot be
enabled: non-PAE kernel!"
I am certain I am using the right kernel and cannot explain why this
error occurs. Could it be that the BIOS may have the NX-bit disabled (if
so, how do I find out?) or is it something I am definitely doing wrong?
11 years, 2 months
Fintek f71882fg ACPI conflict
by Michael Zintakis
Hello,
I hope this is the right place to place my query!
I am trying to take control and initialise my fan censors and
"sensors-detect" is telling me that the chip on the board is "Fintek
f71869a at 0x290, revision 32". However, when I run sensors (the
service) this driver is not loaded (I can't see any of the fan sensors
either) and dmesg is showing me the following:
[ 2312.804747] f71882fg: Found f71869a chip at 0x290, revision 32
[ 2312.804902] ACPI Warning: 0x00000290-0x00000297 SystemIO conflicts
with Region \_TZ_.IP__ 1 (20120111/utaddress-251)
[ 2312.804919] ACPI Warning: 0x00000290-0x00000297 SystemIO conflicts
with Region \_SB_.PCI0.LPCB.SIO1.RNTR 2 (20120111/utaddress-251)
[ 2312.804934] ACPI: If an ACPI driver is available for this device, you
should use it instead of the native driver
Could anybody tell me what the above means please? Is there a conflict
between ACPI and the Fintek kernel module sensors (the service) is
trying to load? Is there a separate "ACPI driver" I should be using
instead? If so, how can I activate it so that I could get to read my fan
sensors?
When I run "sensors" (the program) I see very limited information, like
this:
acpitz-virtual-0
Adapter: Virtual device
temp1: +48.0°C (crit = +127.0°C)
temp2: +26.8°C (crit = +127.0°C)
temp3: +26.8°C (crit = +100.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +51.0°C (crit = +100.0°C)
Core 1: +51.0°C (crit = +100.0°C)
This is not OK because I don't have any RPMs of any of the fans I have
on this board (I have 3 fans running). I am using Fedora Core 17 with
kernel version 3.3.4.
Many thanks!
MZ
11 years, 2 months
[PATCH PULL F16 F17] task_work_add: generic process-context callbacks
by Anton Arapov
FYI. This patch is upstream since linux-3.5. Backported in order to
bring SystemTap functionality back after the switch to linux-3.4 that
doesn't have utrace. :)
The split-out series is available in the git repository at:
git://fedorapeople.org/home/fedora/aarapov/public_git/kernel-uprobes.git
Oleg Nesterov (1):
task_work_add: generic process-context callbacks
Signed-off-by: Anton Arapov <anton(a)redhat.com>
---
include/linux/sched.h | 2 ++
include/linux/task_work.h | 33 ++++++++++++++++++
include/linux/tracehook.h | 11 ++++++
kernel/Makefile | 2 +-
kernel/exit.c | 5 ++-
kernel/fork.c | 1 +
kernel/task_work.c | 84 +++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 136 insertions(+), 2 deletions(-)
create mode 100644 include/linux/task_work.h
create mode 100644 kernel/task_work.c
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6869c60..e011a11 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1445,6 +1445,8 @@ struct task_struct {
int (*notifier)(void *priv);
void *notifier_data;
sigset_t *notifier_mask;
+ struct hlist_head task_works;
+
struct audit_context *audit_context;
#ifdef CONFIG_AUDITSYSCALL
uid_t loginuid;
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
new file mode 100644
index 0000000..294d5d5
--- /dev/null
+++ b/include/linux/task_work.h
@@ -0,0 +1,33 @@
+#ifndef _LINUX_TASK_WORK_H
+#define _LINUX_TASK_WORK_H
+
+#include <linux/list.h>
+#include <linux/sched.h>
+
+struct task_work;
+typedef void (*task_work_func_t)(struct task_work *);
+
+struct task_work {
+ struct hlist_node hlist;
+ task_work_func_t func;
+ void *data;
+};
+
+static inline void
+init_task_work(struct task_work *twork, task_work_func_t func, void *data)
+{
+ twork->func = func;
+ twork->data = data;
+}
+
+int task_work_add(struct task_struct *task, struct task_work *twork, bool);
+struct task_work *task_work_cancel(struct task_struct *, task_work_func_t);
+void task_work_run(void);
+
+static inline void exit_task_work(struct task_struct *task)
+{
+ if (unlikely(!hlist_empty(&task->task_works)))
+ task_work_run();
+}
+
+#endif /* _LINUX_TASK_WORK_H */
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 51bd91d..48c597d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -49,6 +49,7 @@
#include <linux/sched.h>
#include <linux/ptrace.h>
#include <linux/security.h>
+#include <linux/task_work.h>
struct linux_binprm;
/*
@@ -165,8 +166,10 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
*/
static inline void set_notify_resume(struct task_struct *task)
{
+#ifdef TIF_NOTIFY_RESUME
if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME))
kick_process(task);
+#endif
}
/**
@@ -184,6 +187,14 @@ static inline void set_notify_resume(struct task_struct *task)
*/
static inline void tracehook_notify_resume(struct pt_regs *regs)
{
+ /*
+ * The caller just cleared TIF_NOTIFY_RESUME. This barrier
+ * pairs with task_work_add()->set_notify_resume() after
+ * hlist_add_head(task->task_works);
+ */
+ smp_mb__after_clear_bit();
+ if (unlikely(!hlist_empty(¤t->task_works)))
+ task_work_run();
}
#endif /* TIF_NOTIFY_RESUME */
diff --git a/kernel/Makefile b/kernel/Makefile
index cb41b95..2479528 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -5,7 +5,7 @@
obj-y = fork.o exec_domain.o panic.o printk.o \
cpu.o exit.o itimer.o time.o softirq.o resource.o \
sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \
- signal.o sys.o kmod.o workqueue.o pid.o \
+ signal.o sys.o kmod.o workqueue.o pid.o task_work.o \
rcupdate.o extable.o params.o posix-timers.o \
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..b82c38e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -946,11 +946,14 @@ void do_exit(long code)
exit_signals(tsk); /* sets PF_EXITING */
/*
* tsk->flags are checked in the futex code to protect against
- * an exiting task cleaning up the robust pi futexes.
+ * an exiting task cleaning up the robust pi futexes, and in
+ * task_work_add() to avoid the race with exit_task_work().
*/
smp_mb();
raw_spin_unlock_wait(&tsk->pi_lock);
+ exit_task_work(tsk);
+
exit_irq_thread();
if (unlikely(in_atomic()))
diff --git a/kernel/fork.c b/kernel/fork.c
index 5b87e9f..76a961d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1391,6 +1391,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
*/
p->group_leader = p;
INIT_LIST_HEAD(&p->thread_group);
+ INIT_HLIST_HEAD(&p->task_works);
/* Now that the task is set up, run cgroup callbacks if
* necessary. We need to run them before the task is visible
diff --git a/kernel/task_work.c b/kernel/task_work.c
new file mode 100644
index 0000000..82d1c79
--- /dev/null
+++ b/kernel/task_work.c
@@ -0,0 +1,84 @@
+#include <linux/spinlock.h>
+#include <linux/task_work.h>
+#include <linux/tracehook.h>
+
+int
+task_work_add(struct task_struct *task, struct task_work *twork, bool notify)
+{
+ unsigned long flags;
+ int err = -ESRCH;
+
+#ifndef TIF_NOTIFY_RESUME
+ if (notify)
+ return -ENOTSUPP;
+#endif
+ /*
+ * We must not insert the new work if the task has already passed
+ * exit_task_work(). We rely on do_exit()->raw_spin_unlock_wait()
+ * and check PF_EXITING under pi_lock.
+ */
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+ if (likely(!(task->flags & PF_EXITING))) {
+ hlist_add_head(&twork->hlist, &task->task_works);
+ err = 0;
+ }
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+
+ /* test_and_set_bit() implies mb(), see tracehook_notify_resume(). */
+ if (likely(!err) && notify)
+ set_notify_resume(task);
+ return err;
+}
+
+struct task_work *
+task_work_cancel(struct task_struct *task, task_work_func_t func)
+{
+ unsigned long flags;
+ struct task_work *twork;
+ struct hlist_node *pos;
+
+ raw_spin_lock_irqsave(&task->pi_lock, flags);
+ hlist_for_each_entry(twork, pos, &task->task_works, hlist) {
+ if (twork->func == func) {
+ hlist_del(&twork->hlist);
+ goto found;
+ }
+ }
+ twork = NULL;
+ found:
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+
+ return twork;
+}
+
+void task_work_run(void)
+{
+ struct task_struct *task = current;
+ struct hlist_head task_works;
+ struct hlist_node *pos;
+
+ raw_spin_lock_irq(&task->pi_lock);
+ hlist_move_list(&task->task_works, &task_works);
+ raw_spin_unlock_irq(&task->pi_lock);
+
+ if (unlikely(hlist_empty(&task_works)))
+ return;
+ /*
+ * We use hlist to save the space in task_struct, but we want fifo.
+ * Find the last entry, the list should be short, then process them
+ * in reverse order.
+ */
+ for (pos = task_works.first; pos->next; pos = pos->next)
+ ;
+
+ for (;;) {
+ struct hlist_node **pprev = pos->pprev;
+ struct task_work *twork = container_of(pos, struct task_work,
+ hlist);
+ twork->func(twork);
+
+ if (pprev == &task_works.first)
+ break;
+ pos = container_of(pprev, struct hlist_node, next);
+ }
+}
11 years, 2 months
[PATCH F18] team: update to latest net-next
by Jiri Pirko
Update team driver to latest net-next.
Split patches available here:
http://people.redhat.com/jpirko/f18_team_update/
David S. Miller (1):
team: Revert previous two changes.
Jiri Pirko (25):
team: make team_mode struct const
team: for nomode use dummy struct team_mode
team: add mode priv to port
team: lb: push hash counting into separate function
team: allow read/write-only options
team: introduce array options
team: comments: s/net\/drivers\/team/drivers\/net\/team/
team: push array_index and port into separate structure
team: allow async option changes
team: fix error path in team_nl_fill_options_get()
team: fix error path in team_nl_fill_port_list_get()
team: allow to specify one option instance to be send to userspace
team: pass NULL to __team_option_inst_add() instead of 0
team: add port_[enabled/disabled] mode callbacks
team: lb: introduce infrastructure for userspace driven tx
loadbalancing
team: implement multipart netlink messages for options transfers
team: ensure correct order of netlink messages delivery
team: allow to send multiple set events in one message
team: use rcu_dereference_bh() in tx path
team: use rcu_access_pointer to access RCU pointer by writer
team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
team: do RCU update path fixups
team: fix team_adjust_ops with regard to enabled ports
team: do not allow to map disabled ports
team: remove unused rcu_head field from team_port struct
drivers/net/team/team.c | 577 ++++++++++++++++++-----------
drivers/net/team/team_mode_activebackup.c | 14 +-
drivers/net/team/team_mode_loadbalance.c | 543 +++++++++++++++++++++++++--
drivers/net/team/team_mode_roundrobin.c | 4 +-
include/linux/if_team.h | 25 +-
5 files changed, 919 insertions(+), 244 deletions(-)
Signed-off-by: Jiri Pirko <jpirko(a)redhat.com>
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index c61ae35..5350eea 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team.c - Network team device driver
+ * drivers/net/team/team.c - Network team device driver
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -82,14 +82,16 @@ static void team_refresh_port_linkup(struct team_port *port)
port->state.linkup;
}
+
/*******************
* Options handling
*******************/
struct team_option_inst { /* One for each option instance */
struct list_head list;
+ struct list_head tmp_list;
struct team_option *option;
- struct team_port *port; /* != NULL if per-port */
+ struct team_option_inst_info info;
bool changed;
bool removed;
};
@@ -106,22 +108,6 @@ static struct team_option *__team_find_option(struct team *team,
return NULL;
}
-static int __team_option_inst_add(struct team *team, struct team_option *option,
- struct team_port *port)
-{
- struct team_option_inst *opt_inst;
-
- opt_inst = kmalloc(sizeof(*opt_inst), GFP_KERNEL);
- if (!opt_inst)
- return -ENOMEM;
- opt_inst->option = option;
- opt_inst->port = port;
- opt_inst->changed = true;
- opt_inst->removed = false;
- list_add_tail(&opt_inst->list, &team->option_inst_list);
- return 0;
-}
-
static void __team_option_inst_del(struct team_option_inst *opt_inst)
{
list_del(&opt_inst->list);
@@ -139,14 +125,49 @@ static void __team_option_inst_del_option(struct team *team,
}
}
+static int __team_option_inst_add(struct team *team, struct team_option *option,
+ struct team_port *port)
+{
+ struct team_option_inst *opt_inst;
+ unsigned int array_size;
+ unsigned int i;
+ int err;
+
+ array_size = option->array_size;
+ if (!array_size)
+ array_size = 1; /* No array but still need one instance */
+
+ for (i = 0; i < array_size; i++) {
+ opt_inst = kmalloc(sizeof(*opt_inst), GFP_KERNEL);
+ if (!opt_inst)
+ return -ENOMEM;
+ opt_inst->option = option;
+ opt_inst->info.port = port;
+ opt_inst->info.array_index = i;
+ opt_inst->changed = true;
+ opt_inst->removed = false;
+ list_add_tail(&opt_inst->list, &team->option_inst_list);
+ if (option->init) {
+ err = option->init(team, &opt_inst->info);
+ if (err)
+ return err;
+ }
+
+ }
+ return 0;
+}
+
static int __team_option_inst_add_option(struct team *team,
struct team_option *option)
{
struct team_port *port;
int err;
- if (!option->per_port)
- return __team_option_inst_add(team, option, 0);
+ if (!option->per_port) {
+ err = __team_option_inst_add(team, option, NULL);
+ if (err)
+ goto inst_del_option;
+ }
list_for_each_entry(port, &team->port_list, list) {
err = __team_option_inst_add(team, option, port);
@@ -180,7 +201,7 @@ static void __team_option_inst_del_port(struct team *team,
list_for_each_entry_safe(opt_inst, tmp, &team->option_inst_list, list) {
if (opt_inst->option->per_port &&
- opt_inst->port == port)
+ opt_inst->info.port == port)
__team_option_inst_del(opt_inst);
}
}
@@ -211,7 +232,7 @@ static void __team_option_inst_mark_removed_port(struct team *team,
struct team_option_inst *opt_inst;
list_for_each_entry(opt_inst, &team->option_inst_list, list) {
- if (opt_inst->port == port) {
+ if (opt_inst->info.port == port) {
opt_inst->changed = true;
opt_inst->removed = true;
}
@@ -324,28 +345,12 @@ void team_options_unregister(struct team *team,
}
EXPORT_SYMBOL(team_options_unregister);
-static int team_option_port_add(struct team *team, struct team_port *port)
-{
- int err;
-
- err = __team_option_inst_add_port(team, port);
- if (err)
- return err;
- __team_options_change_check(team);
- return 0;
-}
-
-static void team_option_port_del(struct team *team, struct team_port *port)
-{
- __team_option_inst_mark_removed_port(team, port);
- __team_options_change_check(team);
- __team_option_inst_del_port(team, port);
-}
-
static int team_option_get(struct team *team,
struct team_option_inst *opt_inst,
struct team_gsetter_ctx *ctx)
{
+ if (!opt_inst->option->getter)
+ return -EOPNOTSUPP;
return opt_inst->option->getter(team, ctx);
}
@@ -353,16 +358,26 @@ static int team_option_set(struct team *team,
struct team_option_inst *opt_inst,
struct team_gsetter_ctx *ctx)
{
- int err;
+ if (!opt_inst->option->setter)
+ return -EOPNOTSUPP;
+ return opt_inst->option->setter(team, ctx);
+}
- err = opt_inst->option->setter(team, ctx);
- if (err)
- return err;
+void team_option_inst_set_change(struct team_option_inst_info *opt_inst_info)
+{
+ struct team_option_inst *opt_inst;
+ opt_inst = container_of(opt_inst_info, struct team_option_inst, info);
opt_inst->changed = true;
+}
+EXPORT_SYMBOL(team_option_inst_set_change);
+
+void team_options_change_check(struct team *team)
+{
__team_options_change_check(team);
- return err;
}
+EXPORT_SYMBOL(team_options_change_check);
+
/****************
* Mode handling
@@ -371,13 +386,18 @@ static int team_option_set(struct team *team,
static LIST_HEAD(mode_list);
static DEFINE_SPINLOCK(mode_list_lock);
-static struct team_mode *__find_mode(const char *kind)
+struct team_mode_item {
+ struct list_head list;
+ const struct team_mode *mode;
+};
+
+static struct team_mode_item *__find_mode(const char *kind)
{
- struct team_mode *mode;
+ struct team_mode_item *mitem;
- list_for_each_entry(mode, &mode_list, list) {
- if (strcmp(mode->kind, kind) == 0)
- return mode;
+ list_for_each_entry(mitem, &mode_list, list) {
+ if (strcmp(mitem->mode->kind, kind) == 0)
+ return mitem;
}
return NULL;
}
@@ -392,49 +412,65 @@ static bool is_good_mode_name(const char *name)
return true;
}
-int team_mode_register(struct team_mode *mode)
+int team_mode_register(const struct team_mode *mode)
{
int err = 0;
+ struct team_mode_item *mitem;
if (!is_good_mode_name(mode->kind) ||
mode->priv_size > TEAM_MODE_PRIV_SIZE)
return -EINVAL;
+
+ mitem = kmalloc(sizeof(*mitem), GFP_KERNEL);
+ if (!mitem)
+ return -ENOMEM;
+
spin_lock(&mode_list_lock);
if (__find_mode(mode->kind)) {
err = -EEXIST;
+ kfree(mitem);
goto unlock;
}
- list_add_tail(&mode->list, &mode_list);
+ mitem->mode = mode;
+ list_add_tail(&mitem->list, &mode_list);
unlock:
spin_unlock(&mode_list_lock);
return err;
}
EXPORT_SYMBOL(team_mode_register);
-int team_mode_unregister(struct team_mode *mode)
+void team_mode_unregister(const struct team_mode *mode)
{
+ struct team_mode_item *mitem;
+
spin_lock(&mode_list_lock);
- list_del_init(&mode->list);
+ mitem = __find_mode(mode->kind);
+ if (mitem) {
+ list_del_init(&mitem->list);
+ kfree(mitem);
+ }
spin_unlock(&mode_list_lock);
- return 0;
}
EXPORT_SYMBOL(team_mode_unregister);
-static struct team_mode *team_mode_get(const char *kind)
+static const struct team_mode *team_mode_get(const char *kind)
{
- struct team_mode *mode;
+ struct team_mode_item *mitem;
+ const struct team_mode *mode = NULL;
spin_lock(&mode_list_lock);
- mode = __find_mode(kind);
- if (!mode) {
+ mitem = __find_mode(kind);
+ if (!mitem) {
spin_unlock(&mode_list_lock);
request_module("team-mode-%s", kind);
spin_lock(&mode_list_lock);
- mode = __find_mode(kind);
+ mitem = __find_mode(kind);
}
- if (mode)
+ if (mitem) {
+ mode = mitem->mode;
if (!try_module_get(mode->owner))
mode = NULL;
+ }
spin_unlock(&mode_list_lock);
return mode;
@@ -458,26 +494,45 @@ rx_handler_result_t team_dummy_receive(struct team *team,
return RX_HANDLER_ANOTHER;
}
-static void team_adjust_ops(struct team *team)
+static const struct team_mode __team_no_mode = {
+ .kind = "*NOMODE*",
+};
+
+static bool team_is_mode_set(struct team *team)
+{
+ return team->mode != &__team_no_mode;
+}
+
+static void team_set_no_mode(struct team *team)
+{
+ team->mode = &__team_no_mode;
+}
+
+static void __team_adjust_ops(struct team *team, int en_port_count)
{
/*
* To avoid checks in rx/tx skb paths, ensure here that non-null and
* correct ops are always set.
*/
- if (list_empty(&team->port_list) ||
- !team->mode || !team->mode->ops->transmit)
+ if (!en_port_count || !team_is_mode_set(team) ||
+ !team->mode->ops->transmit)
team->ops.transmit = team_dummy_transmit;
else
team->ops.transmit = team->mode->ops->transmit;
- if (list_empty(&team->port_list) ||
- !team->mode || !team->mode->ops->receive)
+ if (!en_port_count || !team_is_mode_set(team) ||
+ !team->mode->ops->receive)
team->ops.receive = team_dummy_receive;
else
team->ops.receive = team->mode->ops->receive;
}
+static void team_adjust_ops(struct team *team)
+{
+ __team_adjust_ops(team, team->en_port_count);
+}
+
/*
* We can benefit from the fact that it's ensured no port is present
* at the time of mode change. Therefore no packets are in fly so there's no
@@ -487,7 +542,7 @@ static int __team_change_mode(struct team *team,
const struct team_mode *new_mode)
{
/* Check if mode was previously set and do cleanup if so */
- if (team->mode) {
+ if (team_is_mode_set(team)) {
void (*exit_op)(struct team *team) = team->ops.exit;
/* Clear ops area so no callback is called any longer */
@@ -497,7 +552,7 @@ static int __team_change_mode(struct team *team,
if (exit_op)
exit_op(team);
team_mode_put(team->mode);
- team->mode = NULL;
+ team_set_no_mode(team);
/* zero private data area */
memset(&team->mode_priv, 0,
sizeof(struct team) - offsetof(struct team, mode_priv));
@@ -523,7 +578,7 @@ static int __team_change_mode(struct team *team,
static int team_change_mode(struct team *team, const char *kind)
{
- struct team_mode *new_mode;
+ const struct team_mode *new_mode;
struct net_device *dev = team->dev;
int err;
@@ -532,7 +587,7 @@ static int team_change_mode(struct team *team, const char *kind)
return -EBUSY;
}
- if (team->mode && strcmp(team->mode->kind, kind) == 0) {
+ if (team_is_mode_set(team) && strcmp(team->mode->kind, kind) == 0) {
netdev_err(dev, "Unable to change to the same mode the team is in\n");
return -EINVAL;
}
@@ -559,8 +614,6 @@ static int team_change_mode(struct team *team, const char *kind)
* Rx path frame handler
************************/
-static bool team_port_enabled(struct team_port *port);
-
/* note: already called with rcu_read_lock */
static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
{
@@ -618,10 +671,11 @@ static bool team_port_find(const struct team *team,
return false;
}
-static bool team_port_enabled(struct team_port *port)
+bool team_port_enabled(struct team_port *port)
{
return port->index != -1;
}
+EXPORT_SYMBOL(team_port_enabled);
/*
* Enable/disable port by adding to enabled port hashlist and setting
@@ -637,6 +691,9 @@ static void team_port_enable(struct team *team,
port->index = team->en_port_count++;
hlist_add_head_rcu(&port->hlist,
team_port_index_hash(team, port->index));
+ team_adjust_ops(team);
+ if (team->ops.port_enabled)
+ team->ops.port_enabled(team, port);
}
static void __reconstruct_port_hlist(struct team *team, int rm_index)
@@ -656,14 +713,20 @@ static void __reconstruct_port_hlist(struct team *team, int rm_index)
static void team_port_disable(struct team *team,
struct team_port *port)
{
- int rm_index = port->index;
-
if (!team_port_enabled(port))
return;
+ if (team->ops.port_disabled)
+ team->ops.port_disabled(team, port);
hlist_del_rcu(&port->hlist);
- __reconstruct_port_hlist(team, rm_index);
- team->en_port_count--;
+ __reconstruct_port_hlist(team, port->index);
port->index = -1;
+ __team_adjust_ops(team, team->en_port_count - 1);
+ /*
+ * Wait until readers see adjusted ops. This ensures that
+ * readers never see team->en_port_count == 0
+ */
+ synchronize_rcu();
+ team->en_port_count--;
}
#define TEAM_VLAN_FEATURES (NETIF_F_ALL_CSUM | NETIF_F_SG | \
@@ -758,7 +821,8 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
return -EBUSY;
}
- port = kzalloc(sizeof(struct team_port), GFP_KERNEL);
+ port = kzalloc(sizeof(struct team_port) + team->mode->port_priv_size,
+ GFP_KERNEL);
if (!port)
return -ENOMEM;
@@ -809,7 +873,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
goto err_handler_register;
}
- err = team_option_port_add(team, port);
+ err = __team_option_inst_add_port(team, port);
if (err) {
netdev_err(dev, "Device %s failed to add per-port options\n",
portname);
@@ -819,9 +883,9 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
port->index = -1;
team_port_enable(team, port);
list_add_tail_rcu(&port->list, &team->port_list);
- team_adjust_ops(team);
__team_compute_features(team);
__team_port_change_check(port, !!netif_carrier_ok(port_dev));
+ __team_options_change_check(team);
netdev_info(dev, "Port device %s added\n", portname);
@@ -865,12 +929,13 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
return -ENOENT;
}
+ __team_option_inst_mark_removed_port(team, port);
+ __team_options_change_check(team);
+ __team_option_inst_del_port(team, port);
port->removed = true;
__team_port_change_check(port, false);
team_port_disable(team, port);
list_del_rcu(&port->list);
- team_adjust_ops(team);
- team_option_port_del(team, port);
netdev_rx_handler_unregister(port_dev);
netdev_set_master(port_dev, NULL);
vlan_vids_del_by_dev(port_dev, dev);
@@ -891,11 +956,9 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
* Net device ops
*****************/
-static const char team_no_mode_kind[] = "*NOMODE*";
-
static int team_mode_option_get(struct team *team, struct team_gsetter_ctx *ctx)
{
- ctx->data.str_val = team->mode ? team->mode->kind : team_no_mode_kind;
+ ctx->data.str_val = team->mode->kind;
return 0;
}
@@ -907,39 +970,47 @@ static int team_mode_option_set(struct team *team, struct team_gsetter_ctx *ctx)
static int team_port_en_option_get(struct team *team,
struct team_gsetter_ctx *ctx)
{
- ctx->data.bool_val = team_port_enabled(ctx->port);
+ struct team_port *port = ctx->info->port;
+
+ ctx->data.bool_val = team_port_enabled(port);
return 0;
}
static int team_port_en_option_set(struct team *team,
struct team_gsetter_ctx *ctx)
{
+ struct team_port *port = ctx->info->port;
+
if (ctx->data.bool_val)
- team_port_enable(team, ctx->port);
+ team_port_enable(team, port);
else
- team_port_disable(team, ctx->port);
+ team_port_disable(team, port);
return 0;
}
static int team_user_linkup_option_get(struct team *team,
struct team_gsetter_ctx *ctx)
{
- ctx->data.bool_val = ctx->port->user.linkup;
+ struct team_port *port = ctx->info->port;
+
+ ctx->data.bool_val = port->user.linkup;
return 0;
}
static int team_user_linkup_option_set(struct team *team,
struct team_gsetter_ctx *ctx)
{
- ctx->port->user.linkup = ctx->data.bool_val;
- team_refresh_port_linkup(ctx->port);
+ struct team_port *port = ctx->info->port;
+
+ port->user.linkup = ctx->data.bool_val;
+ team_refresh_port_linkup(port);
return 0;
}
static int team_user_linkup_en_option_get(struct team *team,
struct team_gsetter_ctx *ctx)
{
- struct team_port *port = ctx->port;
+ struct team_port *port = ctx->info->port;
ctx->data.bool_val = port->user.linkup_enabled;
return 0;
@@ -948,10 +1019,10 @@ static int team_user_linkup_en_option_get(struct team *team,
static int team_user_linkup_en_option_set(struct team *team,
struct team_gsetter_ctx *ctx)
{
- struct team_port *port = ctx->port;
+ struct team_port *port = ctx->info->port;
port->user.linkup_enabled = ctx->data.bool_val;
- team_refresh_port_linkup(ctx->port);
+ team_refresh_port_linkup(port);
return 0;
}
@@ -993,6 +1064,7 @@ static int team_init(struct net_device *dev)
team->dev = dev;
mutex_init(&team->lock);
+ team_set_no_mode(team);
team->pcpu_stats = alloc_percpu(struct team_pcpu_stats);
if (!team->pcpu_stats)
@@ -1482,16 +1554,128 @@ err_fill:
return err;
}
-static int team_nl_fill_options_get(struct sk_buff *skb,
- u32 pid, u32 seq, int flags,
- struct team *team, bool fillall)
+typedef int team_nl_send_func_t(struct sk_buff *skb,
+ struct team *team, u32 pid);
+
+static int team_nl_send_unicast(struct sk_buff *skb, struct team *team, u32 pid)
+{
+ return genlmsg_unicast(dev_net(team->dev), skb, pid);
+}
+
+static int team_nl_fill_one_option_get(struct sk_buff *skb, struct team *team,
+ struct team_option_inst *opt_inst)
+{
+ struct nlattr *option_item;
+ struct team_option *option = opt_inst->option;
+ struct team_option_inst_info *opt_inst_info = &opt_inst->info;
+ struct team_gsetter_ctx ctx;
+ int err;
+
+ ctx.info = opt_inst_info;
+ err = team_option_get(team, opt_inst, &ctx);
+ if (err)
+ return err;
+
+ option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
+ if (!option_item)
+ return -EMSGSIZE;
+
+ if (nla_put_string(skb, TEAM_ATTR_OPTION_NAME, option->name))
+ goto nest_cancel;
+ if (opt_inst_info->port &&
+ nla_put_u32(skb, TEAM_ATTR_OPTION_PORT_IFINDEX,
+ opt_inst_info->port->dev->ifindex))
+ goto nest_cancel;
+ if (opt_inst->option->array_size &&
+ nla_put_u32(skb, TEAM_ATTR_OPTION_ARRAY_INDEX,
+ opt_inst_info->array_index))
+ goto nest_cancel;
+
+ switch (option->type) {
+ case TEAM_OPTION_TYPE_U32:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32))
+ goto nest_cancel;
+ if (nla_put_u32(skb, TEAM_ATTR_OPTION_DATA, ctx.data.u32_val))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_STRING:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING))
+ goto nest_cancel;
+ if (nla_put_string(skb, TEAM_ATTR_OPTION_DATA,
+ ctx.data.str_val))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_BINARY:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_BINARY))
+ goto nest_cancel;
+ if (nla_put(skb, TEAM_ATTR_OPTION_DATA, ctx.data.bin_val.len,
+ ctx.data.bin_val.ptr))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_BOOL:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_FLAG))
+ goto nest_cancel;
+ if (ctx.data.bool_val &&
+ nla_put_flag(skb, TEAM_ATTR_OPTION_DATA))
+ goto nest_cancel;
+ break;
+ default:
+ BUG();
+ }
+ if (opt_inst->removed && nla_put_flag(skb, TEAM_ATTR_OPTION_REMOVED))
+ goto nest_cancel;
+ if (opt_inst->changed) {
+ if (nla_put_flag(skb, TEAM_ATTR_OPTION_CHANGED))
+ goto nest_cancel;
+ opt_inst->changed = false;
+ }
+ nla_nest_end(skb, option_item);
+ return 0;
+
+nest_cancel:
+ nla_nest_cancel(skb, option_item);
+ return -EMSGSIZE;
+}
+
+static int __send_and_alloc_skb(struct sk_buff **pskb,
+ struct team *team, u32 pid,
+ team_nl_send_func_t *send_func)
+{
+ int err;
+
+ if (*pskb) {
+ err = send_func(*pskb, team, pid);
+ if (err)
+ return err;
+ }
+ *pskb = genlmsg_new(NLMSG_DEFAULT_SIZE - GENL_HDRLEN, GFP_KERNEL);
+ if (!*pskb)
+ return -ENOMEM;
+ return 0;
+}
+
+static int team_nl_send_options_get(struct team *team, u32 pid, u32 seq,
+ int flags, team_nl_send_func_t *send_func,
+ struct list_head *sel_opt_inst_list)
{
struct nlattr *option_list;
+ struct nlmsghdr *nlh;
void *hdr;
struct team_option_inst *opt_inst;
int err;
+ struct sk_buff *skb = NULL;
+ bool incomplete;
+ int i;
- hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
+ opt_inst = list_first_entry(sel_opt_inst_list,
+ struct team_option_inst, tmp_list);
+
+start_again:
+ err = __send_and_alloc_skb(&skb, team, pid, send_func);
+ if (err)
+ return err;
+
+ hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags | NLM_F_MULTI,
TEAM_CMD_OPTIONS_GET);
if (IS_ERR(hdr))
return PTR_ERR(hdr);
@@ -1500,122 +1684,80 @@ static int team_nl_fill_options_get(struct sk_buff *skb,
goto nla_put_failure;
option_list = nla_nest_start(skb, TEAM_ATTR_LIST_OPTION);
if (!option_list)
- return -EMSGSIZE;
-
- list_for_each_entry(opt_inst, &team->option_inst_list, list) {
- struct nlattr *option_item;
- struct team_option *option = opt_inst->option;
- struct team_gsetter_ctx ctx;
+ goto nla_put_failure;
- /* Include only changed options if fill all mode is not on */
- if (!fillall && !opt_inst->changed)
- continue;
- option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
- if (!option_item)
- goto nla_put_failure;
- if (nla_put_string(skb, TEAM_ATTR_OPTION_NAME, option->name))
- goto nla_put_failure;
- if (opt_inst->changed) {
- if (nla_put_flag(skb, TEAM_ATTR_OPTION_CHANGED))
- goto nla_put_failure;
- opt_inst->changed = false;
- }
- if (opt_inst->removed &&
- nla_put_flag(skb, TEAM_ATTR_OPTION_REMOVED))
- goto nla_put_failure;
- if (opt_inst->port &&
- nla_put_u32(skb, TEAM_ATTR_OPTION_PORT_IFINDEX,
- opt_inst->port->dev->ifindex))
- goto nla_put_failure;
- ctx.port = opt_inst->port;
- switch (option->type) {
- case TEAM_OPTION_TYPE_U32:
- if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32))
- goto nla_put_failure;
- err = team_option_get(team, opt_inst, &ctx);
- if (err)
- goto errout;
- if (nla_put_u32(skb, TEAM_ATTR_OPTION_DATA,
- ctx.data.u32_val))
- goto nla_put_failure;
- break;
- case TEAM_OPTION_TYPE_STRING:
- if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING))
- goto nla_put_failure;
- err = team_option_get(team, opt_inst, &ctx);
- if (err)
- goto errout;
- if (nla_put_string(skb, TEAM_ATTR_OPTION_DATA,
- ctx.data.str_val))
- goto nla_put_failure;
- break;
- case TEAM_OPTION_TYPE_BINARY:
- if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_BINARY))
- goto nla_put_failure;
- err = team_option_get(team, opt_inst, &ctx);
- if (err)
- goto errout;
- if (nla_put(skb, TEAM_ATTR_OPTION_DATA,
- ctx.data.bin_val.len, ctx.data.bin_val.ptr))
- goto nla_put_failure;
- break;
- case TEAM_OPTION_TYPE_BOOL:
- if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_FLAG))
- goto nla_put_failure;
- err = team_option_get(team, opt_inst, &ctx);
- if (err)
- goto errout;
- if (ctx.data.bool_val &&
- nla_put_flag(skb, TEAM_ATTR_OPTION_DATA))
- goto nla_put_failure;
- break;
- default:
- BUG();
+ i = 0;
+ incomplete = false;
+ list_for_each_entry_from(opt_inst, sel_opt_inst_list, tmp_list) {
+ err = team_nl_fill_one_option_get(skb, team, opt_inst);
+ if (err) {
+ if (err == -EMSGSIZE) {
+ if (!i)
+ goto errout;
+ incomplete = true;
+ break;
+ }
+ goto errout;
}
- nla_nest_end(skb, option_item);
+ i++;
}
nla_nest_end(skb, option_list);
- return genlmsg_end(skb, hdr);
+ genlmsg_end(skb, hdr);
+ if (incomplete)
+ goto start_again;
+
+send_done:
+ nlh = nlmsg_put(skb, pid, seq, NLMSG_DONE, 0, flags | NLM_F_MULTI);
+ if (!nlh) {
+ err = __send_and_alloc_skb(&skb, team, pid, send_func);
+ if (err)
+ goto errout;
+ goto send_done;
+ }
+
+ return send_func(skb, team, pid);
nla_put_failure:
err = -EMSGSIZE;
errout:
genlmsg_cancel(skb, hdr);
+ nlmsg_free(skb);
return err;
}
-static int team_nl_fill_options_get_all(struct sk_buff *skb,
- struct genl_info *info, int flags,
- struct team *team)
-{
- return team_nl_fill_options_get(skb, info->snd_pid,
- info->snd_seq, NLM_F_ACK,
- team, true);
-}
-
static int team_nl_cmd_options_get(struct sk_buff *skb, struct genl_info *info)
{
struct team *team;
+ struct team_option_inst *opt_inst;
int err;
+ LIST_HEAD(sel_opt_inst_list);
team = team_nl_team_get(info);
if (!team)
return -EINVAL;
- err = team_nl_send_generic(info, team, team_nl_fill_options_get_all);
+ list_for_each_entry(opt_inst, &team->option_inst_list, list)
+ list_add_tail(&opt_inst->tmp_list, &sel_opt_inst_list);
+ err = team_nl_send_options_get(team, info->snd_pid, info->snd_seq,
+ NLM_F_ACK, team_nl_send_unicast,
+ &sel_opt_inst_list);
team_nl_team_put(team);
return err;
}
+static int team_nl_send_event_options_get(struct team *team,
+ struct list_head *sel_opt_inst_list);
+
static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
{
struct team *team;
int err = 0;
int i;
struct nlattr *nl_option;
+ LIST_HEAD(opt_inst_list);
team = team_nl_team_get(info);
if (!team)
@@ -1629,10 +1771,12 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
nla_for_each_nested(nl_option, info->attrs[TEAM_ATTR_LIST_OPTION], i) {
struct nlattr *opt_attrs[TEAM_ATTR_OPTION_MAX + 1];
- struct nlattr *attr_port_ifindex;
+ struct nlattr *attr;
struct nlattr *attr_data;
enum team_option_type opt_type;
int opt_port_ifindex = 0; /* != 0 for per-port options */
+ u32 opt_array_index = 0;
+ bool opt_is_array = false;
struct team_option_inst *opt_inst;
char *opt_name;
bool opt_found = false;
@@ -1674,23 +1818,33 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
}
opt_name = nla_data(opt_attrs[TEAM_ATTR_OPTION_NAME]);
- attr_port_ifindex = opt_attrs[TEAM_ATTR_OPTION_PORT_IFINDEX];
- if (attr_port_ifindex)
- opt_port_ifindex = nla_get_u32(attr_port_ifindex);
+ attr = opt_attrs[TEAM_ATTR_OPTION_PORT_IFINDEX];
+ if (attr)
+ opt_port_ifindex = nla_get_u32(attr);
+
+ attr = opt_attrs[TEAM_ATTR_OPTION_ARRAY_INDEX];
+ if (attr) {
+ opt_is_array = true;
+ opt_array_index = nla_get_u32(attr);
+ }
list_for_each_entry(opt_inst, &team->option_inst_list, list) {
struct team_option *option = opt_inst->option;
struct team_gsetter_ctx ctx;
+ struct team_option_inst_info *opt_inst_info;
int tmp_ifindex;
- tmp_ifindex = opt_inst->port ?
- opt_inst->port->dev->ifindex : 0;
+ opt_inst_info = &opt_inst->info;
+ tmp_ifindex = opt_inst_info->port ?
+ opt_inst_info->port->dev->ifindex : 0;
if (option->type != opt_type ||
strcmp(option->name, opt_name) ||
- tmp_ifindex != opt_port_ifindex)
+ tmp_ifindex != opt_port_ifindex ||
+ (option->array_size && !opt_is_array) ||
+ opt_inst_info->array_index != opt_array_index)
continue;
opt_found = true;
- ctx.port = opt_inst->port;
+ ctx.info = opt_inst_info;
switch (opt_type) {
case TEAM_OPTION_TYPE_U32:
ctx.data.u32_val = nla_get_u32(attr_data);
@@ -1715,6 +1869,8 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
err = team_option_set(team, opt_inst, &ctx);
if (err)
goto team_put;
+ opt_inst->changed = true;
+ list_add(&opt_inst->tmp_list, &opt_inst_list);
}
if (!opt_found) {
err = -ENOENT;
@@ -1722,6 +1878,8 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
}
}
+ err = team_nl_send_event_options_get(team, &opt_inst_list);
+
team_put:
team_nl_team_put(team);
@@ -1746,7 +1904,7 @@ static int team_nl_fill_port_list_get(struct sk_buff *skb,
goto nla_put_failure;
port_list = nla_nest_start(skb, TEAM_ATTR_LIST_PORT);
if (!port_list)
- return -EMSGSIZE;
+ goto nla_put_failure;
list_for_each_entry(port, &team->port_list, list) {
struct nlattr *port_item;
@@ -1838,27 +1996,18 @@ static struct genl_multicast_group team_change_event_mcgrp = {
.name = TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME,
};
-static int team_nl_send_event_options_get(struct team *team)
+static int team_nl_send_multicast(struct sk_buff *skb,
+ struct team *team, u32 pid)
{
- struct sk_buff *skb;
- int err;
- struct net *net = dev_net(team->dev);
-
- skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOMEM;
-
- err = team_nl_fill_options_get(skb, 0, 0, 0, team, false);
- if (err < 0)
- goto err_fill;
-
- err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
- GFP_KERNEL);
- return err;
+ return genlmsg_multicast_netns(dev_net(team->dev), skb, 0,
+ team_change_event_mcgrp.id, GFP_KERNEL);
+}
-err_fill:
- nlmsg_free(skb);
- return err;
+static int team_nl_send_event_options_get(struct team *team,
+ struct list_head *sel_opt_inst_list)
+{
+ return team_nl_send_options_get(team, 0, 0, 0, team_nl_send_multicast,
+ sel_opt_inst_list);
}
static int team_nl_send_event_port_list_get(struct team *team)
@@ -1918,10 +2067,17 @@ static void team_nl_fini(void)
static void __team_options_change_check(struct team *team)
{
int err;
+ struct team_option_inst *opt_inst;
+ LIST_HEAD(sel_opt_inst_list);
- err = team_nl_send_event_options_get(team);
+ list_for_each_entry(opt_inst, &team->option_inst_list, list) {
+ if (opt_inst->changed)
+ list_add_tail(&opt_inst->tmp_list, &sel_opt_inst_list);
+ }
+ err = team_nl_send_event_options_get(team, &sel_opt_inst_list);
if (err)
- netdev_warn(team->dev, "Failed to send options change via netlink\n");
+ netdev_warn(team->dev, "Failed to send options change via netlink (err %d)\n",
+ err);
}
/* rtnl lock is held */
@@ -1965,6 +2121,7 @@ static void team_port_change_check(struct team_port *port, bool linkup)
mutex_unlock(&team->lock);
}
+
/************************************
* Net device notifier event handler
************************************/
diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
index fd6bd03..253b8a5 100644
--- a/drivers/net/team/team_mode_activebackup.c
+++ b/drivers/net/team/team_mode_activebackup.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team_mode_activebackup.c - Active-backup mode for team
+ * drivers/net/team/team_mode_activebackup.c - Active-backup mode for team
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -40,7 +40,7 @@ static bool ab_transmit(struct team *team, struct sk_buff *skb)
{
struct team_port *active_port;
- active_port = rcu_dereference(ab_priv(team)->active_port);
+ active_port = rcu_dereference_bh(ab_priv(team)->active_port);
if (unlikely(!active_port))
goto drop;
skb->dev = active_port->dev;
@@ -61,8 +61,12 @@ static void ab_port_leave(struct team *team, struct team_port *port)
static int ab_active_port_get(struct team *team, struct team_gsetter_ctx *ctx)
{
- if (ab_priv(team)->active_port)
- ctx->data.u32_val = ab_priv(team)->active_port->dev->ifindex;
+ struct team_port *active_port;
+
+ active_port = rcu_dereference_protected(ab_priv(team)->active_port,
+ lockdep_is_held(&team->lock));
+ if (active_port)
+ ctx->data.u32_val = active_port->dev->ifindex;
else
ctx->data.u32_val = 0;
return 0;
@@ -108,7 +112,7 @@ static const struct team_mode_ops ab_mode_ops = {
.port_leave = ab_port_leave,
};
-static struct team_mode ab_mode = {
+static const struct team_mode ab_mode = {
.kind = "activebackup",
.owner = THIS_MODULE,
.priv_size = sizeof(struct ab_priv),
diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
index 86e8183..51a4b19 100644
--- a/drivers/net/team/team_mode_loadbalance.c
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -17,34 +17,210 @@
#include <linux/filter.h>
#include <linux/if_team.h>
+struct lb_priv;
+
+typedef struct team_port *lb_select_tx_port_func_t(struct team *,
+ struct lb_priv *,
+ struct sk_buff *,
+ unsigned char);
+
+#define LB_TX_HASHTABLE_SIZE 256 /* hash is a char */
+
+struct lb_stats {
+ u64 tx_bytes;
+};
+
+struct lb_pcpu_stats {
+ struct lb_stats hash_stats[LB_TX_HASHTABLE_SIZE];
+ struct u64_stats_sync syncp;
+};
+
+struct lb_stats_info {
+ struct lb_stats stats;
+ struct lb_stats last_stats;
+ struct team_option_inst_info *opt_inst_info;
+};
+
+struct lb_port_mapping {
+ struct team_port __rcu *port;
+ struct team_option_inst_info *opt_inst_info;
+};
+
+struct lb_priv_ex {
+ struct team *team;
+ struct lb_port_mapping tx_hash_to_port_mapping[LB_TX_HASHTABLE_SIZE];
+ struct sock_fprog *orig_fprog;
+ struct {
+ unsigned int refresh_interval; /* in tenths of second */
+ struct delayed_work refresh_dw;
+ struct lb_stats_info info[LB_TX_HASHTABLE_SIZE];
+ } stats;
+};
+
struct lb_priv {
struct sk_filter __rcu *fp;
- struct sock_fprog *orig_fprog;
+ lb_select_tx_port_func_t __rcu *select_tx_port_func;
+ struct lb_pcpu_stats __percpu *pcpu_stats;
+ struct lb_priv_ex *ex; /* priv extension */
};
-static struct lb_priv *lb_priv(struct team *team)
+static struct lb_priv *get_lb_priv(struct team *team)
{
return (struct lb_priv *) &team->mode_priv;
}
-static bool lb_transmit(struct team *team, struct sk_buff *skb)
+struct lb_port_priv {
+ struct lb_stats __percpu *pcpu_stats;
+ struct lb_stats_info stats_info;
+};
+
+static struct lb_port_priv *get_lb_port_priv(struct team_port *port)
+{
+ return (struct lb_port_priv *) &port->mode_priv;
+}
+
+#define LB_HTPM_PORT_BY_HASH(lp_priv, hash) \
+ (lb_priv)->ex->tx_hash_to_port_mapping[hash].port
+
+#define LB_HTPM_OPT_INST_INFO_BY_HASH(lp_priv, hash) \
+ (lb_priv)->ex->tx_hash_to_port_mapping[hash].opt_inst_info
+
+static void lb_tx_hash_to_port_mapping_null_port(struct team *team,
+ struct team_port *port)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ bool changed = false;
+ int i;
+
+ for (i = 0; i < LB_TX_HASHTABLE_SIZE; i++) {
+ struct lb_port_mapping *pm;
+
+ pm = &lb_priv->ex->tx_hash_to_port_mapping[i];
+ if (rcu_access_pointer(pm->port) == port) {
+ RCU_INIT_POINTER(pm->port, NULL);
+ team_option_inst_set_change(pm->opt_inst_info);
+ changed = true;
+ }
+ }
+ if (changed)
+ team_options_change_check(team);
+}
+
+/* Basic tx selection based solely by hash */
+static struct team_port *lb_hash_select_tx_port(struct team *team,
+ struct lb_priv *lb_priv,
+ struct sk_buff *skb,
+ unsigned char hash)
{
- struct sk_filter *fp;
- struct team_port *port;
- unsigned int hash;
int port_index;
- fp = rcu_dereference(lb_priv(team)->fp);
- if (unlikely(!fp))
- goto drop;
- hash = SK_RUN_FILTER(fp, skb);
port_index = hash % team->en_port_count;
- port = team_get_port_by_index_rcu(team, port_index);
+ return team_get_port_by_index_rcu(team, port_index);
+}
+
+/* Hash to port mapping select tx port */
+static struct team_port *lb_htpm_select_tx_port(struct team *team,
+ struct lb_priv *lb_priv,
+ struct sk_buff *skb,
+ unsigned char hash)
+{
+ return rcu_dereference_bh(LB_HTPM_PORT_BY_HASH(lb_priv, hash));
+}
+
+struct lb_select_tx_port {
+ char *name;
+ lb_select_tx_port_func_t *func;
+};
+
+static const struct lb_select_tx_port lb_select_tx_port_list[] = {
+ {
+ .name = "hash",
+ .func = lb_hash_select_tx_port,
+ },
+ {
+ .name = "hash_to_port_mapping",
+ .func = lb_htpm_select_tx_port,
+ },
+};
+#define LB_SELECT_TX_PORT_LIST_COUNT ARRAY_SIZE(lb_select_tx_port_list)
+
+static char *lb_select_tx_port_get_name(lb_select_tx_port_func_t *func)
+{
+ int i;
+
+ for (i = 0; i < LB_SELECT_TX_PORT_LIST_COUNT; i++) {
+ const struct lb_select_tx_port *item;
+
+ item = &lb_select_tx_port_list[i];
+ if (item->func == func)
+ return item->name;
+ }
+ return NULL;
+}
+
+static lb_select_tx_port_func_t *lb_select_tx_port_get_func(const char *name)
+{
+ int i;
+
+ for (i = 0; i < LB_SELECT_TX_PORT_LIST_COUNT; i++) {
+ const struct lb_select_tx_port *item;
+
+ item = &lb_select_tx_port_list[i];
+ if (!strcmp(item->name, name))
+ return item->func;
+ }
+ return NULL;
+}
+
+static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
+ struct sk_buff *skb)
+{
+ struct sk_filter *fp;
+ uint32_t lhash;
+ unsigned char *c;
+
+ fp = rcu_dereference_bh(lb_priv->fp);
+ if (unlikely(!fp))
+ return 0;
+ lhash = SK_RUN_FILTER(fp, skb);
+ c = (char *) &lhash;
+ return c[0] ^ c[1] ^ c[2] ^ c[3];
+}
+
+static void lb_update_tx_stats(unsigned int tx_bytes, struct lb_priv *lb_priv,
+ struct lb_port_priv *lb_port_priv,
+ unsigned char hash)
+{
+ struct lb_pcpu_stats *pcpu_stats;
+ struct lb_stats *port_stats;
+ struct lb_stats *hash_stats;
+
+ pcpu_stats = this_cpu_ptr(lb_priv->pcpu_stats);
+ port_stats = this_cpu_ptr(lb_port_priv->pcpu_stats);
+ hash_stats = &pcpu_stats->hash_stats[hash];
+ u64_stats_update_begin(&pcpu_stats->syncp);
+ port_stats->tx_bytes += tx_bytes;
+ hash_stats->tx_bytes += tx_bytes;
+ u64_stats_update_end(&pcpu_stats->syncp);
+}
+
+static bool lb_transmit(struct team *team, struct sk_buff *skb)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *select_tx_port_func;
+ struct team_port *port;
+ unsigned char hash;
+ unsigned int tx_bytes = skb->len;
+
+ hash = lb_get_skb_hash(lb_priv, skb);
+ select_tx_port_func = rcu_dereference_bh(lb_priv->select_tx_port_func);
+ port = select_tx_port_func(team, lb_priv, skb, hash);
if (unlikely(!port))
goto drop;
skb->dev = port->dev;
if (dev_queue_xmit(skb))
return false;
+ lb_update_tx_stats(tx_bytes, lb_priv, get_lb_port_priv(port), hash);
return true;
drop:
@@ -54,14 +230,16 @@ drop:
static int lb_bpf_func_get(struct team *team, struct team_gsetter_ctx *ctx)
{
- if (!lb_priv(team)->orig_fprog) {
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
+ if (!lb_priv->ex->orig_fprog) {
ctx->data.bin_val.len = 0;
ctx->data.bin_val.ptr = NULL;
return 0;
}
- ctx->data.bin_val.len = lb_priv(team)->orig_fprog->len *
+ ctx->data.bin_val.len = lb_priv->ex->orig_fprog->len *
sizeof(struct sock_filter);
- ctx->data.bin_val.ptr = lb_priv(team)->orig_fprog->filter;
+ ctx->data.bin_val.ptr = lb_priv->ex->orig_fprog->filter;
return 0;
}
@@ -94,7 +272,9 @@ static void __fprog_destroy(struct sock_fprog *fprog)
static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
{
+ struct lb_priv *lb_priv = get_lb_priv(team);
struct sk_filter *fp = NULL;
+ struct sk_filter *orig_fp;
struct sock_fprog *fprog = NULL;
int err;
@@ -110,14 +290,238 @@ static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
}
}
- if (lb_priv(team)->orig_fprog) {
+ if (lb_priv->ex->orig_fprog) {
/* Clear old filter data */
- __fprog_destroy(lb_priv(team)->orig_fprog);
- sk_unattached_filter_destroy(lb_priv(team)->fp);
+ __fprog_destroy(lb_priv->ex->orig_fprog);
+ orig_fp = rcu_dereference_protected(lb_priv->fp,
+ lockdep_is_held(&team->lock));
+ sk_unattached_filter_destroy(orig_fp);
}
- rcu_assign_pointer(lb_priv(team)->fp, fp);
- lb_priv(team)->orig_fprog = fprog;
+ rcu_assign_pointer(lb_priv->fp, fp);
+ lb_priv->ex->orig_fprog = fprog;
+ return 0;
+}
+
+static int lb_tx_method_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+ char *name;
+
+ func = rcu_dereference_protected(lb_priv->select_tx_port_func,
+ lockdep_is_held(&team->lock));
+ name = lb_select_tx_port_get_name(func);
+ BUG_ON(!name);
+ ctx->data.str_val = name;
+ return 0;
+}
+
+static int lb_tx_method_set(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+
+ func = lb_select_tx_port_get_func(ctx->data.str_val);
+ if (!func)
+ return -EINVAL;
+ rcu_assign_pointer(lb_priv->select_tx_port_func, func);
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = info->array_index;
+
+ LB_HTPM_OPT_INST_INFO_BY_HASH(lb_priv, hash) = info;
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ struct team_port *port;
+ unsigned char hash = ctx->info->array_index;
+
+ port = LB_HTPM_PORT_BY_HASH(lb_priv, hash);
+ ctx->data.u32_val = port ? port->dev->ifindex : 0;
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ struct team_port *port;
+ unsigned char hash = ctx->info->array_index;
+
+ list_for_each_entry(port, &team->port_list, list) {
+ if (ctx->data.u32_val == port->dev->ifindex &&
+ team_port_enabled(port)) {
+ rcu_assign_pointer(LB_HTPM_PORT_BY_HASH(lb_priv, hash),
+ port);
+ return 0;
+ }
+ }
+ return -ENODEV;
+}
+
+static int lb_hash_stats_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = info->array_index;
+
+ lb_priv->ex->stats.info[hash].opt_inst_info = info;
+ return 0;
+}
+
+static int lb_hash_stats_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = ctx->info->array_index;
+
+ ctx->data.bin_val.ptr = &lb_priv->ex->stats.info[hash].stats;
+ ctx->data.bin_val.len = sizeof(struct lb_stats);
+ return 0;
+}
+
+static int lb_port_stats_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct team_port *port = info->port;
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ lb_port_priv->stats_info.opt_inst_info = info;
+ return 0;
+}
+
+static int lb_port_stats_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ ctx->data.bin_val.ptr = &lb_port_priv->stats_info.stats;
+ ctx->data.bin_val.len = sizeof(struct lb_stats);
+ return 0;
+}
+
+static void __lb_stats_info_refresh_prepare(struct lb_stats_info *s_info)
+{
+ memcpy(&s_info->last_stats, &s_info->stats, sizeof(struct lb_stats));
+ memset(&s_info->stats, 0, sizeof(struct lb_stats));
+}
+
+static bool __lb_stats_info_refresh_check(struct lb_stats_info *s_info,
+ struct team *team)
+{
+ if (memcmp(&s_info->last_stats, &s_info->stats,
+ sizeof(struct lb_stats))) {
+ team_option_inst_set_change(s_info->opt_inst_info);
+ return true;
+ }
+ return false;
+}
+
+static void __lb_one_cpu_stats_add(struct lb_stats *acc_stats,
+ struct lb_stats *cpu_stats,
+ struct u64_stats_sync *syncp)
+{
+ unsigned int start;
+ struct lb_stats tmp;
+
+ do {
+ start = u64_stats_fetch_begin_bh(syncp);
+ tmp.tx_bytes = cpu_stats->tx_bytes;
+ } while (u64_stats_fetch_retry_bh(syncp, start));
+ acc_stats->tx_bytes += tmp.tx_bytes;
+}
+
+static void lb_stats_refresh(struct work_struct *work)
+{
+ struct team *team;
+ struct lb_priv *lb_priv;
+ struct lb_priv_ex *lb_priv_ex;
+ struct lb_pcpu_stats *pcpu_stats;
+ struct lb_stats *stats;
+ struct lb_stats_info *s_info;
+ struct team_port *port;
+ bool changed = false;
+ int i;
+ int j;
+
+ lb_priv_ex = container_of(work, struct lb_priv_ex,
+ stats.refresh_dw.work);
+
+ team = lb_priv_ex->team;
+ lb_priv = get_lb_priv(team);
+
+ if (!mutex_trylock(&team->lock)) {
+ schedule_delayed_work(&lb_priv_ex->stats.refresh_dw, 0);
+ return;
+ }
+
+ for (j = 0; j < LB_TX_HASHTABLE_SIZE; j++) {
+ s_info = &lb_priv->ex->stats.info[j];
+ __lb_stats_info_refresh_prepare(s_info);
+ for_each_possible_cpu(i) {
+ pcpu_stats = per_cpu_ptr(lb_priv->pcpu_stats, i);
+ stats = &pcpu_stats->hash_stats[j];
+ __lb_one_cpu_stats_add(&s_info->stats, stats,
+ &pcpu_stats->syncp);
+ }
+ changed |= __lb_stats_info_refresh_check(s_info, team);
+ }
+
+ list_for_each_entry(port, &team->port_list, list) {
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ s_info = &lb_port_priv->stats_info;
+ __lb_stats_info_refresh_prepare(s_info);
+ for_each_possible_cpu(i) {
+ pcpu_stats = per_cpu_ptr(lb_priv->pcpu_stats, i);
+ stats = per_cpu_ptr(lb_port_priv->pcpu_stats, i);
+ __lb_one_cpu_stats_add(&s_info->stats, stats,
+ &pcpu_stats->syncp);
+ }
+ changed |= __lb_stats_info_refresh_check(s_info, team);
+ }
+
+ if (changed)
+ team_options_change_check(team);
+
+ schedule_delayed_work(&lb_priv_ex->stats.refresh_dw,
+ (lb_priv_ex->stats.refresh_interval * HZ) / 10);
+
+ mutex_unlock(&team->lock);
+}
+
+static int lb_stats_refresh_interval_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
+ ctx->data.u32_val = lb_priv->ex->stats.refresh_interval;
+ return 0;
+}
+
+static int lb_stats_refresh_interval_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned int interval;
+
+ interval = ctx->data.u32_val;
+ if (lb_priv->ex->stats.refresh_interval == interval)
+ return 0;
+ lb_priv->ex->stats.refresh_interval = interval;
+ if (interval)
+ schedule_delayed_work(&lb_priv->ex->stats.refresh_dw, 0);
+ else
+ cancel_delayed_work(&lb_priv->ex->stats.refresh_dw);
return 0;
}
@@ -128,30 +532,125 @@ static const struct team_option lb_options[] = {
.getter = lb_bpf_func_get,
.setter = lb_bpf_func_set,
},
+ {
+ .name = "lb_tx_method",
+ .type = TEAM_OPTION_TYPE_STRING,
+ .getter = lb_tx_method_get,
+ .setter = lb_tx_method_set,
+ },
+ {
+ .name = "lb_tx_hash_to_port_mapping",
+ .array_size = LB_TX_HASHTABLE_SIZE,
+ .type = TEAM_OPTION_TYPE_U32,
+ .init = lb_tx_hash_to_port_mapping_init,
+ .getter = lb_tx_hash_to_port_mapping_get,
+ .setter = lb_tx_hash_to_port_mapping_set,
+ },
+ {
+ .name = "lb_hash_stats",
+ .array_size = LB_TX_HASHTABLE_SIZE,
+ .type = TEAM_OPTION_TYPE_BINARY,
+ .init = lb_hash_stats_init,
+ .getter = lb_hash_stats_get,
+ },
+ {
+ .name = "lb_port_stats",
+ .per_port = true,
+ .type = TEAM_OPTION_TYPE_BINARY,
+ .init = lb_port_stats_init,
+ .getter = lb_port_stats_get,
+ },
+ {
+ .name = "lb_stats_refresh_interval",
+ .type = TEAM_OPTION_TYPE_U32,
+ .getter = lb_stats_refresh_interval_get,
+ .setter = lb_stats_refresh_interval_set,
+ },
};
static int lb_init(struct team *team)
{
- return team_options_register(team, lb_options,
- ARRAY_SIZE(lb_options));
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+ int err;
+
+ /* set default tx port selector */
+ func = lb_select_tx_port_get_func("hash");
+ BUG_ON(!func);
+ rcu_assign_pointer(lb_priv->select_tx_port_func, func);
+
+ lb_priv->ex = kzalloc(sizeof(*lb_priv->ex), GFP_KERNEL);
+ if (!lb_priv->ex)
+ return -ENOMEM;
+ lb_priv->ex->team = team;
+
+ lb_priv->pcpu_stats = alloc_percpu(struct lb_pcpu_stats);
+ if (!lb_priv->pcpu_stats) {
+ err = -ENOMEM;
+ goto err_alloc_pcpu_stats;
+ }
+
+ INIT_DELAYED_WORK(&lb_priv->ex->stats.refresh_dw, lb_stats_refresh);
+
+ err = team_options_register(team, lb_options, ARRAY_SIZE(lb_options));
+ if (err)
+ goto err_options_register;
+ return 0;
+
+err_options_register:
+ free_percpu(lb_priv->pcpu_stats);
+err_alloc_pcpu_stats:
+ kfree(lb_priv->ex);
+ return err;
}
static void lb_exit(struct team *team)
{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
team_options_unregister(team, lb_options,
ARRAY_SIZE(lb_options));
+ cancel_delayed_work_sync(&lb_priv->ex->stats.refresh_dw);
+ free_percpu(lb_priv->pcpu_stats);
+ kfree(lb_priv->ex);
+}
+
+static int lb_port_enter(struct team *team, struct team_port *port)
+{
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ lb_port_priv->pcpu_stats = alloc_percpu(struct lb_stats);
+ if (!lb_port_priv->pcpu_stats)
+ return -ENOMEM;
+ return 0;
+}
+
+static void lb_port_leave(struct team *team, struct team_port *port)
+{
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ free_percpu(lb_port_priv->pcpu_stats);
+}
+
+static void lb_port_disabled(struct team *team, struct team_port *port)
+{
+ lb_tx_hash_to_port_mapping_null_port(team, port);
}
static const struct team_mode_ops lb_mode_ops = {
.init = lb_init,
.exit = lb_exit,
+ .port_enter = lb_port_enter,
+ .port_leave = lb_port_leave,
+ .port_disabled = lb_port_disabled,
.transmit = lb_transmit,
};
-static struct team_mode lb_mode = {
+static const struct team_mode lb_mode = {
.kind = "loadbalance",
.owner = THIS_MODULE,
.priv_size = sizeof(struct lb_priv),
+ .port_priv_size = sizeof(struct lb_port_priv),
.ops = &lb_mode_ops,
};
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
index 6abfbdc..52dd0ec 100644
--- a/drivers/net/team/team_mode_roundrobin.c
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team_mode_roundrobin.c - Round-robin mode for team
+ * drivers/net/team/team_mode_roundrobin.c - Round-robin mode for team
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -81,7 +81,7 @@ static const struct team_mode_ops rr_mode_ops = {
.port_change_mac = rr_port_change_mac,
};
-static struct team_mode rr_mode = {
+static const struct team_mode rr_mode = {
.kind = "roundrobin",
.owner = THIS_MODULE,
.priv_size = sizeof(struct rr_priv),
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 8185f57..99efd60 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -60,9 +60,11 @@ struct team_port {
unsigned int mtu;
} orig;
- struct rcu_head rcu;
+ long mode_priv[0];
};
+extern bool team_port_enabled(struct team_port *port);
+
struct team_mode_ops {
int (*init)(struct team *team);
void (*exit)(struct team *team);
@@ -73,6 +75,8 @@ struct team_mode_ops {
int (*port_enter)(struct team *team, struct team_port *port);
void (*port_leave)(struct team *team, struct team_port *port);
void (*port_change_mac)(struct team *team, struct team_port *port);
+ void (*port_enabled)(struct team *team, struct team_port *port);
+ void (*port_disabled)(struct team *team, struct team_port *port);
};
enum team_option_type {
@@ -82,6 +86,11 @@ enum team_option_type {
TEAM_OPTION_TYPE_BOOL,
};
+struct team_option_inst_info {
+ u32 array_index;
+ struct team_port *port; /* != NULL if per-port */
+};
+
struct team_gsetter_ctx {
union {
u32 u32_val;
@@ -92,23 +101,28 @@ struct team_gsetter_ctx {
} bin_val;
bool bool_val;
} data;
- struct team_port *port;
+ struct team_option_inst_info *info;
};
struct team_option {
struct list_head list;
const char *name;
bool per_port;
+ unsigned int array_size; /* != 0 means the option is array */
enum team_option_type type;
+ int (*init)(struct team *team, struct team_option_inst_info *info);
int (*getter)(struct team *team, struct team_gsetter_ctx *ctx);
int (*setter)(struct team *team, struct team_gsetter_ctx *ctx);
};
+extern void team_option_inst_set_change(struct team_option_inst_info *opt_inst_info);
+extern void team_options_change_check(struct team *team);
+
struct team_mode {
- struct list_head list;
const char *kind;
struct module *owner;
size_t priv_size;
+ size_t port_priv_size;
const struct team_mode_ops *ops;
};
@@ -178,8 +192,8 @@ extern int team_options_register(struct team *team,
extern void team_options_unregister(struct team *team,
const struct team_option *option,
size_t option_count);
-extern int team_mode_register(struct team_mode *mode);
-extern int team_mode_unregister(struct team_mode *mode);
+extern int team_mode_register(const struct team_mode *mode);
+extern void team_mode_unregister(const struct team_mode *mode);
#endif /* __KERNEL__ */
@@ -241,6 +255,7 @@ enum {
TEAM_ATTR_OPTION_DATA, /* dynamic */
TEAM_ATTR_OPTION_REMOVED, /* flag */
TEAM_ATTR_OPTION_PORT_IFINDEX, /* u32 */ /* for per-port options */
+ TEAM_ATTR_OPTION_ARRAY_INDEX, /* u32 */ /* for array options */
__TEAM_ATTR_OPTION_MAX,
TEAM_ATTR_OPTION_MAX = __TEAM_ATTR_OPTION_MAX - 1,
11 years, 2 months
[PATCH F17] team: update to latest net-next
by Jiri Pirko
Update team driver to latest net-next.
Split patches available here:
http://people.redhat.com/jpirko/f17_team_update/
Modification of the kernel config is needed:
+CONFIG_NET_TEAM_MODE_LOADBALANCE=m
David S. Miller (2):
team: Stop using NLA_PUT*().
team: Revert previous two changes.
Jiri Pirko (37):
filter: Allow to create sk-unattached filters
filter: add XOR operation
team: add binary option type
team: add loadbalance mode
team: add support for per-port options
team: add bool option type
team: add user_linkup and user_linkup_enabled per-port option
team: ab: walk through port list non-rcu
team: add missed "statics"
team: lb: let userspace care about port macs
team: allow to enable/disable ports
team: add per-port option for enabling/disabling ports
team: make team_mode struct const
team: for nomode use dummy struct team_mode
team: add mode priv to port
team: lb: push hash counting into separate function
team: allow read/write-only options
team: introduce array options
team: comments: s/net\/drivers\/team/drivers\/net\/team/
team: push array_index and port into separate structure
team: allow async option changes
team: fix error path in team_nl_fill_options_get()
team: fix error path in team_nl_fill_port_list_get()
team: allow to specify one option instance to be send to userspace
team: pass NULL to __team_option_inst_add() instead of 0
team: add port_[enabled/disabled] mode callbacks
team: lb: introduce infrastructure for userspace driven tx
loadbalancing
team: implement multipart netlink messages for options transfers
team: ensure correct order of netlink messages delivery
team: allow to send multiple set events in one message
team: use rcu_dereference_bh() in tx path
team: use rcu_access_pointer to access RCU pointer by writer
team: use RCU_INIT_POINTER for NULL assignment of RCU pointer
team: do RCU update path fixups
team: fix team_adjust_ops with regard to enabled ports
team: do not allow to map disabled ports
team: remove unused rcu_head field from team_port struct
drivers/net/team/Kconfig | 11 +
drivers/net/team/Makefile | 1 +
drivers/net/team/team.c | 862 ++++++++++++++++++++++-------
drivers/net/team/team_mode_activebackup.c | 30 +-
drivers/net/team/team_mode_loadbalance.c | 673 ++++++++++++++++++++++
drivers/net/team/team_mode_roundrobin.c | 6 +-
include/linux/filter.h | 7 +-
include/linux/if_team.h | 90 ++-
net/core/filter.c | 70 ++-
9 files changed, 1514 insertions(+), 236 deletions(-)
create mode 100644 drivers/net/team/team_mode_loadbalance.c
Signed-off-by: Jiri Pirko <jpirko(a)redhat.com>
diff --git a/drivers/net/team/Kconfig b/drivers/net/team/Kconfig
index 248a144..89024d5 100644
--- a/drivers/net/team/Kconfig
+++ b/drivers/net/team/Kconfig
@@ -40,4 +40,15 @@ config NET_TEAM_MODE_ACTIVEBACKUP
To compile this team mode as a module, choose M here: the module
will be called team_mode_activebackup.
+config NET_TEAM_MODE_LOADBALANCE
+ tristate "Load-balance mode support"
+ depends on NET_TEAM
+ ---help---
+ This mode provides load balancing functionality. Tx port selection
+ is done using BPF function set up from userspace (bpf_hash_func
+ option)
+
+ To compile this team mode as a module, choose M here: the module
+ will be called team_mode_loadbalance.
+
endif # NET_TEAM
diff --git a/drivers/net/team/Makefile b/drivers/net/team/Makefile
index 85f2028..fb9f4c1 100644
--- a/drivers/net/team/Makefile
+++ b/drivers/net/team/Makefile
@@ -5,3 +5,4 @@
obj-$(CONFIG_NET_TEAM) += team.o
obj-$(CONFIG_NET_TEAM_MODE_ROUNDROBIN) += team_mode_roundrobin.o
obj-$(CONFIG_NET_TEAM_MODE_ACTIVEBACKUP) += team_mode_activebackup.o
+obj-$(CONFIG_NET_TEAM_MODE_LOADBALANCE) += team_mode_loadbalance.o
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 8f81805..5350eea 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team.c - Network team device driver
+ * drivers/net/team/team.c - Network team device driver
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -65,7 +65,7 @@ static int __set_port_mac(struct net_device *port_dev,
return dev_set_mac_address(port_dev, &addr);
}
-int team_port_set_orig_mac(struct team_port *port)
+static int team_port_set_orig_mac(struct team_port *port)
{
return __set_port_mac(port->dev, port->orig.dev_addr);
}
@@ -76,12 +76,28 @@ int team_port_set_team_mac(struct team_port *port)
}
EXPORT_SYMBOL(team_port_set_team_mac);
+static void team_refresh_port_linkup(struct team_port *port)
+{
+ port->linkup = port->user.linkup_enabled ? port->user.linkup :
+ port->state.linkup;
+}
+
/*******************
* Options handling
*******************/
-struct team_option *__team_find_option(struct team *team, const char *opt_name)
+struct team_option_inst { /* One for each option instance */
+ struct list_head list;
+ struct list_head tmp_list;
+ struct team_option *option;
+ struct team_option_inst_info info;
+ bool changed;
+ bool removed;
+};
+
+static struct team_option *__team_find_option(struct team *team,
+ const char *opt_name)
{
struct team_option *option;
@@ -92,9 +108,140 @@ struct team_option *__team_find_option(struct team *team, const char *opt_name)
return NULL;
}
-int __team_options_register(struct team *team,
- const struct team_option *option,
- size_t option_count)
+static void __team_option_inst_del(struct team_option_inst *opt_inst)
+{
+ list_del(&opt_inst->list);
+ kfree(opt_inst);
+}
+
+static void __team_option_inst_del_option(struct team *team,
+ struct team_option *option)
+{
+ struct team_option_inst *opt_inst, *tmp;
+
+ list_for_each_entry_safe(opt_inst, tmp, &team->option_inst_list, list) {
+ if (opt_inst->option == option)
+ __team_option_inst_del(opt_inst);
+ }
+}
+
+static int __team_option_inst_add(struct team *team, struct team_option *option,
+ struct team_port *port)
+{
+ struct team_option_inst *opt_inst;
+ unsigned int array_size;
+ unsigned int i;
+ int err;
+
+ array_size = option->array_size;
+ if (!array_size)
+ array_size = 1; /* No array but still need one instance */
+
+ for (i = 0; i < array_size; i++) {
+ opt_inst = kmalloc(sizeof(*opt_inst), GFP_KERNEL);
+ if (!opt_inst)
+ return -ENOMEM;
+ opt_inst->option = option;
+ opt_inst->info.port = port;
+ opt_inst->info.array_index = i;
+ opt_inst->changed = true;
+ opt_inst->removed = false;
+ list_add_tail(&opt_inst->list, &team->option_inst_list);
+ if (option->init) {
+ err = option->init(team, &opt_inst->info);
+ if (err)
+ return err;
+ }
+
+ }
+ return 0;
+}
+
+static int __team_option_inst_add_option(struct team *team,
+ struct team_option *option)
+{
+ struct team_port *port;
+ int err;
+
+ if (!option->per_port) {
+ err = __team_option_inst_add(team, option, NULL);
+ if (err)
+ goto inst_del_option;
+ }
+
+ list_for_each_entry(port, &team->port_list, list) {
+ err = __team_option_inst_add(team, option, port);
+ if (err)
+ goto inst_del_option;
+ }
+ return 0;
+
+inst_del_option:
+ __team_option_inst_del_option(team, option);
+ return err;
+}
+
+static void __team_option_inst_mark_removed_option(struct team *team,
+ struct team_option *option)
+{
+ struct team_option_inst *opt_inst;
+
+ list_for_each_entry(opt_inst, &team->option_inst_list, list) {
+ if (opt_inst->option == option) {
+ opt_inst->changed = true;
+ opt_inst->removed = true;
+ }
+ }
+}
+
+static void __team_option_inst_del_port(struct team *team,
+ struct team_port *port)
+{
+ struct team_option_inst *opt_inst, *tmp;
+
+ list_for_each_entry_safe(opt_inst, tmp, &team->option_inst_list, list) {
+ if (opt_inst->option->per_port &&
+ opt_inst->info.port == port)
+ __team_option_inst_del(opt_inst);
+ }
+}
+
+static int __team_option_inst_add_port(struct team *team,
+ struct team_port *port)
+{
+ struct team_option *option;
+ int err;
+
+ list_for_each_entry(option, &team->option_list, list) {
+ if (!option->per_port)
+ continue;
+ err = __team_option_inst_add(team, option, port);
+ if (err)
+ goto inst_del_port;
+ }
+ return 0;
+
+inst_del_port:
+ __team_option_inst_del_port(team, port);
+ return err;
+}
+
+static void __team_option_inst_mark_removed_port(struct team *team,
+ struct team_port *port)
+{
+ struct team_option_inst *opt_inst;
+
+ list_for_each_entry(opt_inst, &team->option_inst_list, list) {
+ if (opt_inst->info.port == port) {
+ opt_inst->changed = true;
+ opt_inst->removed = true;
+ }
+ }
+}
+
+static int __team_options_register(struct team *team,
+ const struct team_option *option,
+ size_t option_count)
{
int i;
struct team_option **dst_opts;
@@ -107,26 +254,32 @@ int __team_options_register(struct team *team,
for (i = 0; i < option_count; i++, option++) {
if (__team_find_option(team, option->name)) {
err = -EEXIST;
- goto rollback;
+ goto alloc_rollback;
}
dst_opts[i] = kmemdup(option, sizeof(*option), GFP_KERNEL);
if (!dst_opts[i]) {
err = -ENOMEM;
- goto rollback;
+ goto alloc_rollback;
}
}
for (i = 0; i < option_count; i++) {
- dst_opts[i]->changed = true;
- dst_opts[i]->removed = false;
+ err = __team_option_inst_add_option(team, dst_opts[i]);
+ if (err)
+ goto inst_rollback;
list_add_tail(&dst_opts[i]->list, &team->option_list);
}
kfree(dst_opts);
return 0;
-rollback:
- for (i = 0; i < option_count; i++)
+inst_rollback:
+ for (i--; i >= 0; i--)
+ __team_option_inst_del_option(team, dst_opts[i]);
+
+ i = option_count - 1;
+alloc_rollback:
+ for (i--; i >= 0; i--)
kfree(dst_opts[i]);
kfree(dst_opts);
@@ -143,10 +296,8 @@ static void __team_options_mark_removed(struct team *team,
struct team_option *del_opt;
del_opt = __team_find_option(team, option->name);
- if (del_opt) {
- del_opt->changed = true;
- del_opt->removed = true;
- }
+ if (del_opt)
+ __team_option_inst_mark_removed_option(team, del_opt);
}
}
@@ -161,6 +312,7 @@ static void __team_options_unregister(struct team *team,
del_opt = __team_find_option(team, option->name);
if (del_opt) {
+ __team_option_inst_del_option(team, del_opt);
list_del(&del_opt->list);
kfree(del_opt);
}
@@ -193,25 +345,39 @@ void team_options_unregister(struct team *team,
}
EXPORT_SYMBOL(team_options_unregister);
-static int team_option_get(struct team *team, struct team_option *option,
- void *arg)
+static int team_option_get(struct team *team,
+ struct team_option_inst *opt_inst,
+ struct team_gsetter_ctx *ctx)
{
- return option->getter(team, arg);
+ if (!opt_inst->option->getter)
+ return -EOPNOTSUPP;
+ return opt_inst->option->getter(team, ctx);
}
-static int team_option_set(struct team *team, struct team_option *option,
- void *arg)
+static int team_option_set(struct team *team,
+ struct team_option_inst *opt_inst,
+ struct team_gsetter_ctx *ctx)
{
- int err;
+ if (!opt_inst->option->setter)
+ return -EOPNOTSUPP;
+ return opt_inst->option->setter(team, ctx);
+}
- err = option->setter(team, arg);
- if (err)
- return err;
+void team_option_inst_set_change(struct team_option_inst_info *opt_inst_info)
+{
+ struct team_option_inst *opt_inst;
+
+ opt_inst = container_of(opt_inst_info, struct team_option_inst, info);
+ opt_inst->changed = true;
+}
+EXPORT_SYMBOL(team_option_inst_set_change);
- option->changed = true;
+void team_options_change_check(struct team *team)
+{
__team_options_change_check(team);
- return err;
}
+EXPORT_SYMBOL(team_options_change_check);
+
/****************
* Mode handling
@@ -220,13 +386,18 @@ static int team_option_set(struct team *team, struct team_option *option,
static LIST_HEAD(mode_list);
static DEFINE_SPINLOCK(mode_list_lock);
-static struct team_mode *__find_mode(const char *kind)
+struct team_mode_item {
+ struct list_head list;
+ const struct team_mode *mode;
+};
+
+static struct team_mode_item *__find_mode(const char *kind)
{
- struct team_mode *mode;
+ struct team_mode_item *mitem;
- list_for_each_entry(mode, &mode_list, list) {
- if (strcmp(mode->kind, kind) == 0)
- return mode;
+ list_for_each_entry(mitem, &mode_list, list) {
+ if (strcmp(mitem->mode->kind, kind) == 0)
+ return mitem;
}
return NULL;
}
@@ -241,49 +412,65 @@ static bool is_good_mode_name(const char *name)
return true;
}
-int team_mode_register(struct team_mode *mode)
+int team_mode_register(const struct team_mode *mode)
{
int err = 0;
+ struct team_mode_item *mitem;
if (!is_good_mode_name(mode->kind) ||
mode->priv_size > TEAM_MODE_PRIV_SIZE)
return -EINVAL;
+
+ mitem = kmalloc(sizeof(*mitem), GFP_KERNEL);
+ if (!mitem)
+ return -ENOMEM;
+
spin_lock(&mode_list_lock);
if (__find_mode(mode->kind)) {
err = -EEXIST;
+ kfree(mitem);
goto unlock;
}
- list_add_tail(&mode->list, &mode_list);
+ mitem->mode = mode;
+ list_add_tail(&mitem->list, &mode_list);
unlock:
spin_unlock(&mode_list_lock);
return err;
}
EXPORT_SYMBOL(team_mode_register);
-int team_mode_unregister(struct team_mode *mode)
+void team_mode_unregister(const struct team_mode *mode)
{
+ struct team_mode_item *mitem;
+
spin_lock(&mode_list_lock);
- list_del_init(&mode->list);
+ mitem = __find_mode(mode->kind);
+ if (mitem) {
+ list_del_init(&mitem->list);
+ kfree(mitem);
+ }
spin_unlock(&mode_list_lock);
- return 0;
}
EXPORT_SYMBOL(team_mode_unregister);
-static struct team_mode *team_mode_get(const char *kind)
+static const struct team_mode *team_mode_get(const char *kind)
{
- struct team_mode *mode;
+ struct team_mode_item *mitem;
+ const struct team_mode *mode = NULL;
spin_lock(&mode_list_lock);
- mode = __find_mode(kind);
- if (!mode) {
+ mitem = __find_mode(kind);
+ if (!mitem) {
spin_unlock(&mode_list_lock);
request_module("team-mode-%s", kind);
spin_lock(&mode_list_lock);
- mode = __find_mode(kind);
+ mitem = __find_mode(kind);
}
- if (mode)
+ if (mitem) {
+ mode = mitem->mode;
if (!try_module_get(mode->owner))
mode = NULL;
+ }
spin_unlock(&mode_list_lock);
return mode;
@@ -307,26 +494,45 @@ rx_handler_result_t team_dummy_receive(struct team *team,
return RX_HANDLER_ANOTHER;
}
-static void team_adjust_ops(struct team *team)
+static const struct team_mode __team_no_mode = {
+ .kind = "*NOMODE*",
+};
+
+static bool team_is_mode_set(struct team *team)
+{
+ return team->mode != &__team_no_mode;
+}
+
+static void team_set_no_mode(struct team *team)
+{
+ team->mode = &__team_no_mode;
+}
+
+static void __team_adjust_ops(struct team *team, int en_port_count)
{
/*
* To avoid checks in rx/tx skb paths, ensure here that non-null and
* correct ops are always set.
*/
- if (list_empty(&team->port_list) ||
- !team->mode || !team->mode->ops->transmit)
+ if (!en_port_count || !team_is_mode_set(team) ||
+ !team->mode->ops->transmit)
team->ops.transmit = team_dummy_transmit;
else
team->ops.transmit = team->mode->ops->transmit;
- if (list_empty(&team->port_list) ||
- !team->mode || !team->mode->ops->receive)
+ if (!en_port_count || !team_is_mode_set(team) ||
+ !team->mode->ops->receive)
team->ops.receive = team_dummy_receive;
else
team->ops.receive = team->mode->ops->receive;
}
+static void team_adjust_ops(struct team *team)
+{
+ __team_adjust_ops(team, team->en_port_count);
+}
+
/*
* We can benefit from the fact that it's ensured no port is present
* at the time of mode change. Therefore no packets are in fly so there's no
@@ -336,7 +542,7 @@ static int __team_change_mode(struct team *team,
const struct team_mode *new_mode)
{
/* Check if mode was previously set and do cleanup if so */
- if (team->mode) {
+ if (team_is_mode_set(team)) {
void (*exit_op)(struct team *team) = team->ops.exit;
/* Clear ops area so no callback is called any longer */
@@ -346,7 +552,7 @@ static int __team_change_mode(struct team *team,
if (exit_op)
exit_op(team);
team_mode_put(team->mode);
- team->mode = NULL;
+ team_set_no_mode(team);
/* zero private data area */
memset(&team->mode_priv, 0,
sizeof(struct team) - offsetof(struct team, mode_priv));
@@ -372,7 +578,7 @@ static int __team_change_mode(struct team *team,
static int team_change_mode(struct team *team, const char *kind)
{
- struct team_mode *new_mode;
+ const struct team_mode *new_mode;
struct net_device *dev = team->dev;
int err;
@@ -381,7 +587,7 @@ static int team_change_mode(struct team *team, const char *kind)
return -EBUSY;
}
- if (team->mode && strcmp(team->mode->kind, kind) == 0) {
+ if (team_is_mode_set(team) && strcmp(team->mode->kind, kind) == 0) {
netdev_err(dev, "Unable to change to the same mode the team is in\n");
return -EINVAL;
}
@@ -424,8 +630,12 @@ static rx_handler_result_t team_handle_frame(struct sk_buff **pskb)
port = team_port_get_rcu(skb->dev);
team = port->team;
-
- res = team->ops.receive(team, port, skb);
+ if (!team_port_enabled(port)) {
+ /* allow exact match delivery for disabled ports */
+ res = RX_HANDLER_EXACT;
+ } else {
+ res = team->ops.receive(team, port, skb);
+ }
if (res == RX_HANDLER_ANOTHER) {
struct team_pcpu_stats *pcpu_stats;
@@ -461,17 +671,29 @@ static bool team_port_find(const struct team *team,
return false;
}
+bool team_port_enabled(struct team_port *port)
+{
+ return port->index != -1;
+}
+EXPORT_SYMBOL(team_port_enabled);
+
/*
- * Add/delete port to the team port list. Write guarded by rtnl_lock.
- * Takes care of correct port->index setup (might be racy).
+ * Enable/disable port by adding to enabled port hashlist and setting
+ * port->index (Might be racy so reader could see incorrect ifindex when
+ * processing a flying packet, but that is not a problem). Write guarded
+ * by team->lock.
*/
-static void team_port_list_add_port(struct team *team,
- struct team_port *port)
+static void team_port_enable(struct team *team,
+ struct team_port *port)
{
- port->index = team->port_count++;
+ if (team_port_enabled(port))
+ return;
+ port->index = team->en_port_count++;
hlist_add_head_rcu(&port->hlist,
team_port_index_hash(team, port->index));
- list_add_tail_rcu(&port->list, &team->port_list);
+ team_adjust_ops(team);
+ if (team->ops.port_enabled)
+ team->ops.port_enabled(team, port);
}
static void __reconstruct_port_hlist(struct team *team, int rm_index)
@@ -479,7 +701,7 @@ static void __reconstruct_port_hlist(struct team *team, int rm_index)
int i;
struct team_port *port;
- for (i = rm_index + 1; i < team->port_count; i++) {
+ for (i = rm_index + 1; i < team->en_port_count; i++) {
port = team_get_port_by_index(team, i);
hlist_del_rcu(&port->hlist);
port->index--;
@@ -488,15 +710,23 @@ static void __reconstruct_port_hlist(struct team *team, int rm_index)
}
}
-static void team_port_list_del_port(struct team *team,
- struct team_port *port)
+static void team_port_disable(struct team *team,
+ struct team_port *port)
{
- int rm_index = port->index;
-
+ if (!team_port_enabled(port))
+ return;
+ if (team->ops.port_disabled)
+ team->ops.port_disabled(team, port);
hlist_del_rcu(&port->hlist);
- list_del_rcu(&port->list);
- __reconstruct_port_hlist(team, rm_index);
- team->port_count--;
+ __reconstruct_port_hlist(team, port->index);
+ port->index = -1;
+ __team_adjust_ops(team, team->en_port_count - 1);
+ /*
+ * Wait until readers see adjusted ops. This ensures that
+ * readers never see team->en_port_count == 0
+ */
+ synchronize_rcu();
+ team->en_port_count--;
}
#define TEAM_VLAN_FEATURES (NETIF_F_ALL_CSUM | NETIF_F_SG | \
@@ -591,7 +821,8 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
return -EBUSY;
}
- port = kzalloc(sizeof(struct team_port), GFP_KERNEL);
+ port = kzalloc(sizeof(struct team_port) + team->mode->port_priv_size,
+ GFP_KERNEL);
if (!port)
return -ENOMEM;
@@ -642,15 +873,27 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
goto err_handler_register;
}
- team_port_list_add_port(team, port);
- team_adjust_ops(team);
+ err = __team_option_inst_add_port(team, port);
+ if (err) {
+ netdev_err(dev, "Device %s failed to add per-port options\n",
+ portname);
+ goto err_option_port_add;
+ }
+
+ port->index = -1;
+ team_port_enable(team, port);
+ list_add_tail_rcu(&port->list, &team->port_list);
__team_compute_features(team);
__team_port_change_check(port, !!netif_carrier_ok(port_dev));
+ __team_options_change_check(team);
netdev_info(dev, "Port device %s added\n", portname);
return 0;
+err_option_port_add:
+ netdev_rx_handler_unregister(port_dev);
+
err_handler_register:
netdev_set_master(port_dev, NULL);
@@ -686,10 +929,13 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
return -ENOENT;
}
+ __team_option_inst_mark_removed_port(team, port);
+ __team_options_change_check(team);
+ __team_option_inst_del_port(team, port);
port->removed = true;
__team_port_change_check(port, false);
- team_port_list_del_port(team, port);
- team_adjust_ops(team);
+ team_port_disable(team, port);
+ list_del_rcu(&port->list);
netdev_rx_handler_unregister(port_dev);
netdev_set_master(port_dev, NULL);
vlan_vids_del_by_dev(port_dev, dev);
@@ -710,21 +956,74 @@ static int team_port_del(struct team *team, struct net_device *port_dev)
* Net device ops
*****************/
-static const char team_no_mode_kind[] = "*NOMODE*";
+static int team_mode_option_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ ctx->data.str_val = team->mode->kind;
+ return 0;
+}
+
+static int team_mode_option_set(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ return team_change_mode(team, ctx->data.str_val);
+}
+
+static int team_port_en_option_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+
+ ctx->data.bool_val = team_port_enabled(port);
+ return 0;
+}
+
+static int team_port_en_option_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+
+ if (ctx->data.bool_val)
+ team_port_enable(team, port);
+ else
+ team_port_disable(team, port);
+ return 0;
+}
+
+static int team_user_linkup_option_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+
+ ctx->data.bool_val = port->user.linkup;
+ return 0;
+}
-static int team_mode_option_get(struct team *team, void *arg)
+static int team_user_linkup_option_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
{
- const char **str = arg;
+ struct team_port *port = ctx->info->port;
- *str = team->mode ? team->mode->kind : team_no_mode_kind;
+ port->user.linkup = ctx->data.bool_val;
+ team_refresh_port_linkup(port);
return 0;
}
-static int team_mode_option_set(struct team *team, void *arg)
+static int team_user_linkup_en_option_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
{
- const char **str = arg;
+ struct team_port *port = ctx->info->port;
- return team_change_mode(team, *str);
+ ctx->data.bool_val = port->user.linkup_enabled;
+ return 0;
+}
+
+static int team_user_linkup_en_option_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+
+ port->user.linkup_enabled = ctx->data.bool_val;
+ team_refresh_port_linkup(port);
+ return 0;
}
static const struct team_option team_options[] = {
@@ -734,6 +1033,27 @@ static const struct team_option team_options[] = {
.getter = team_mode_option_get,
.setter = team_mode_option_set,
},
+ {
+ .name = "enabled",
+ .type = TEAM_OPTION_TYPE_BOOL,
+ .per_port = true,
+ .getter = team_port_en_option_get,
+ .setter = team_port_en_option_set,
+ },
+ {
+ .name = "user_linkup",
+ .type = TEAM_OPTION_TYPE_BOOL,
+ .per_port = true,
+ .getter = team_user_linkup_option_get,
+ .setter = team_user_linkup_option_set,
+ },
+ {
+ .name = "user_linkup_enabled",
+ .type = TEAM_OPTION_TYPE_BOOL,
+ .per_port = true,
+ .getter = team_user_linkup_en_option_get,
+ .setter = team_user_linkup_en_option_set,
+ },
};
static int team_init(struct net_device *dev)
@@ -744,18 +1064,20 @@ static int team_init(struct net_device *dev)
team->dev = dev;
mutex_init(&team->lock);
+ team_set_no_mode(team);
team->pcpu_stats = alloc_percpu(struct team_pcpu_stats);
if (!team->pcpu_stats)
return -ENOMEM;
for (i = 0; i < TEAM_PORT_HASHENTRIES; i++)
- INIT_HLIST_HEAD(&team->port_hlist[i]);
+ INIT_HLIST_HEAD(&team->en_port_hlist[i]);
INIT_LIST_HEAD(&team->port_list);
team_adjust_ops(team);
INIT_LIST_HEAD(&team->option_list);
+ INIT_LIST_HEAD(&team->option_inst_list);
err = team_options_register(team, team_options, ARRAY_SIZE(team_options));
if (err)
goto err_options_register;
@@ -1145,10 +1467,7 @@ team_nl_option_policy[TEAM_ATTR_OPTION_MAX + 1] = {
},
[TEAM_ATTR_OPTION_CHANGED] = { .type = NLA_FLAG },
[TEAM_ATTR_OPTION_TYPE] = { .type = NLA_U8 },
- [TEAM_ATTR_OPTION_DATA] = {
- .type = NLA_BINARY,
- .len = TEAM_STRING_MAX_LEN,
- },
+ [TEAM_ATTR_OPTION_DATA] = { .type = NLA_BINARY },
};
static int team_nl_cmd_noop(struct sk_buff *skb, struct genl_info *info)
@@ -1235,98 +1554,210 @@ err_fill:
return err;
}
-static int team_nl_fill_options_get(struct sk_buff *skb,
- u32 pid, u32 seq, int flags,
- struct team *team, bool fillall)
+typedef int team_nl_send_func_t(struct sk_buff *skb,
+ struct team *team, u32 pid);
+
+static int team_nl_send_unicast(struct sk_buff *skb, struct team *team, u32 pid)
+{
+ return genlmsg_unicast(dev_net(team->dev), skb, pid);
+}
+
+static int team_nl_fill_one_option_get(struct sk_buff *skb, struct team *team,
+ struct team_option_inst *opt_inst)
+{
+ struct nlattr *option_item;
+ struct team_option *option = opt_inst->option;
+ struct team_option_inst_info *opt_inst_info = &opt_inst->info;
+ struct team_gsetter_ctx ctx;
+ int err;
+
+ ctx.info = opt_inst_info;
+ err = team_option_get(team, opt_inst, &ctx);
+ if (err)
+ return err;
+
+ option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
+ if (!option_item)
+ return -EMSGSIZE;
+
+ if (nla_put_string(skb, TEAM_ATTR_OPTION_NAME, option->name))
+ goto nest_cancel;
+ if (opt_inst_info->port &&
+ nla_put_u32(skb, TEAM_ATTR_OPTION_PORT_IFINDEX,
+ opt_inst_info->port->dev->ifindex))
+ goto nest_cancel;
+ if (opt_inst->option->array_size &&
+ nla_put_u32(skb, TEAM_ATTR_OPTION_ARRAY_INDEX,
+ opt_inst_info->array_index))
+ goto nest_cancel;
+
+ switch (option->type) {
+ case TEAM_OPTION_TYPE_U32:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32))
+ goto nest_cancel;
+ if (nla_put_u32(skb, TEAM_ATTR_OPTION_DATA, ctx.data.u32_val))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_STRING:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING))
+ goto nest_cancel;
+ if (nla_put_string(skb, TEAM_ATTR_OPTION_DATA,
+ ctx.data.str_val))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_BINARY:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_BINARY))
+ goto nest_cancel;
+ if (nla_put(skb, TEAM_ATTR_OPTION_DATA, ctx.data.bin_val.len,
+ ctx.data.bin_val.ptr))
+ goto nest_cancel;
+ break;
+ case TEAM_OPTION_TYPE_BOOL:
+ if (nla_put_u8(skb, TEAM_ATTR_OPTION_TYPE, NLA_FLAG))
+ goto nest_cancel;
+ if (ctx.data.bool_val &&
+ nla_put_flag(skb, TEAM_ATTR_OPTION_DATA))
+ goto nest_cancel;
+ break;
+ default:
+ BUG();
+ }
+ if (opt_inst->removed && nla_put_flag(skb, TEAM_ATTR_OPTION_REMOVED))
+ goto nest_cancel;
+ if (opt_inst->changed) {
+ if (nla_put_flag(skb, TEAM_ATTR_OPTION_CHANGED))
+ goto nest_cancel;
+ opt_inst->changed = false;
+ }
+ nla_nest_end(skb, option_item);
+ return 0;
+
+nest_cancel:
+ nla_nest_cancel(skb, option_item);
+ return -EMSGSIZE;
+}
+
+static int __send_and_alloc_skb(struct sk_buff **pskb,
+ struct team *team, u32 pid,
+ team_nl_send_func_t *send_func)
+{
+ int err;
+
+ if (*pskb) {
+ err = send_func(*pskb, team, pid);
+ if (err)
+ return err;
+ }
+ *pskb = genlmsg_new(NLMSG_DEFAULT_SIZE - GENL_HDRLEN, GFP_KERNEL);
+ if (!*pskb)
+ return -ENOMEM;
+ return 0;
+}
+
+static int team_nl_send_options_get(struct team *team, u32 pid, u32 seq,
+ int flags, team_nl_send_func_t *send_func,
+ struct list_head *sel_opt_inst_list)
{
struct nlattr *option_list;
+ struct nlmsghdr *nlh;
void *hdr;
- struct team_option *option;
+ struct team_option_inst *opt_inst;
+ int err;
+ struct sk_buff *skb = NULL;
+ bool incomplete;
+ int i;
- hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags,
+ opt_inst = list_first_entry(sel_opt_inst_list,
+ struct team_option_inst, tmp_list);
+
+start_again:
+ err = __send_and_alloc_skb(&skb, team, pid, send_func);
+ if (err)
+ return err;
+
+ hdr = genlmsg_put(skb, pid, seq, &team_nl_family, flags | NLM_F_MULTI,
TEAM_CMD_OPTIONS_GET);
if (IS_ERR(hdr))
return PTR_ERR(hdr);
- NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
+ if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
+ goto nla_put_failure;
option_list = nla_nest_start(skb, TEAM_ATTR_LIST_OPTION);
if (!option_list)
- return -EMSGSIZE;
+ goto nla_put_failure;
- list_for_each_entry(option, &team->option_list, list) {
- struct nlattr *option_item;
- long arg;
-
- /* Include only changed options if fill all mode is not on */
- if (!fillall && !option->changed)
- continue;
- option_item = nla_nest_start(skb, TEAM_ATTR_ITEM_OPTION);
- if (!option_item)
- goto nla_put_failure;
- NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_NAME, option->name);
- if (option->changed) {
- NLA_PUT_FLAG(skb, TEAM_ATTR_OPTION_CHANGED);
- option->changed = false;
- }
- if (option->removed)
- NLA_PUT_FLAG(skb, TEAM_ATTR_OPTION_REMOVED);
- switch (option->type) {
- case TEAM_OPTION_TYPE_U32:
- NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_U32);
- team_option_get(team, option, &arg);
- NLA_PUT_U32(skb, TEAM_ATTR_OPTION_DATA, arg);
- break;
- case TEAM_OPTION_TYPE_STRING:
- NLA_PUT_U8(skb, TEAM_ATTR_OPTION_TYPE, NLA_STRING);
- team_option_get(team, option, &arg);
- NLA_PUT_STRING(skb, TEAM_ATTR_OPTION_DATA,
- (char *) arg);
- break;
- default:
- BUG();
+ i = 0;
+ incomplete = false;
+ list_for_each_entry_from(opt_inst, sel_opt_inst_list, tmp_list) {
+ err = team_nl_fill_one_option_get(skb, team, opt_inst);
+ if (err) {
+ if (err == -EMSGSIZE) {
+ if (!i)
+ goto errout;
+ incomplete = true;
+ break;
+ }
+ goto errout;
}
- nla_nest_end(skb, option_item);
+ i++;
}
nla_nest_end(skb, option_list);
- return genlmsg_end(skb, hdr);
+ genlmsg_end(skb, hdr);
+ if (incomplete)
+ goto start_again;
+
+send_done:
+ nlh = nlmsg_put(skb, pid, seq, NLMSG_DONE, 0, flags | NLM_F_MULTI);
+ if (!nlh) {
+ err = __send_and_alloc_skb(&skb, team, pid, send_func);
+ if (err)
+ goto errout;
+ goto send_done;
+ }
+
+ return send_func(skb, team, pid);
nla_put_failure:
+ err = -EMSGSIZE;
+errout:
genlmsg_cancel(skb, hdr);
- return -EMSGSIZE;
-}
-
-static int team_nl_fill_options_get_all(struct sk_buff *skb,
- struct genl_info *info, int flags,
- struct team *team)
-{
- return team_nl_fill_options_get(skb, info->snd_pid,
- info->snd_seq, NLM_F_ACK,
- team, true);
+ nlmsg_free(skb);
+ return err;
}
static int team_nl_cmd_options_get(struct sk_buff *skb, struct genl_info *info)
{
struct team *team;
+ struct team_option_inst *opt_inst;
int err;
+ LIST_HEAD(sel_opt_inst_list);
team = team_nl_team_get(info);
if (!team)
return -EINVAL;
- err = team_nl_send_generic(info, team, team_nl_fill_options_get_all);
+ list_for_each_entry(opt_inst, &team->option_inst_list, list)
+ list_add_tail(&opt_inst->tmp_list, &sel_opt_inst_list);
+ err = team_nl_send_options_get(team, info->snd_pid, info->snd_seq,
+ NLM_F_ACK, team_nl_send_unicast,
+ &sel_opt_inst_list);
team_nl_team_put(team);
return err;
}
+static int team_nl_send_event_options_get(struct team *team,
+ struct list_head *sel_opt_inst_list);
+
static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
{
struct team *team;
int err = 0;
int i;
struct nlattr *nl_option;
+ LIST_HEAD(opt_inst_list);
team = team_nl_team_get(info);
if (!team)
@@ -1339,9 +1770,14 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
}
nla_for_each_nested(nl_option, info->attrs[TEAM_ATTR_LIST_OPTION], i) {
- struct nlattr *mode_attrs[TEAM_ATTR_OPTION_MAX + 1];
+ struct nlattr *opt_attrs[TEAM_ATTR_OPTION_MAX + 1];
+ struct nlattr *attr;
+ struct nlattr *attr_data;
enum team_option_type opt_type;
- struct team_option *option;
+ int opt_port_ifindex = 0; /* != 0 for per-port options */
+ u32 opt_array_index = 0;
+ bool opt_is_array = false;
+ struct team_option_inst *opt_inst;
char *opt_name;
bool opt_found = false;
@@ -1349,50 +1785,92 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
err = -EINVAL;
goto team_put;
}
- err = nla_parse_nested(mode_attrs, TEAM_ATTR_OPTION_MAX,
+ err = nla_parse_nested(opt_attrs, TEAM_ATTR_OPTION_MAX,
nl_option, team_nl_option_policy);
if (err)
goto team_put;
- if (!mode_attrs[TEAM_ATTR_OPTION_NAME] ||
- !mode_attrs[TEAM_ATTR_OPTION_TYPE] ||
- !mode_attrs[TEAM_ATTR_OPTION_DATA]) {
+ if (!opt_attrs[TEAM_ATTR_OPTION_NAME] ||
+ !opt_attrs[TEAM_ATTR_OPTION_TYPE]) {
err = -EINVAL;
goto team_put;
}
- switch (nla_get_u8(mode_attrs[TEAM_ATTR_OPTION_TYPE])) {
+ switch (nla_get_u8(opt_attrs[TEAM_ATTR_OPTION_TYPE])) {
case NLA_U32:
opt_type = TEAM_OPTION_TYPE_U32;
break;
case NLA_STRING:
opt_type = TEAM_OPTION_TYPE_STRING;
break;
+ case NLA_BINARY:
+ opt_type = TEAM_OPTION_TYPE_BINARY;
+ break;
+ case NLA_FLAG:
+ opt_type = TEAM_OPTION_TYPE_BOOL;
+ break;
default:
goto team_put;
}
- opt_name = nla_data(mode_attrs[TEAM_ATTR_OPTION_NAME]);
- list_for_each_entry(option, &team->option_list, list) {
- long arg;
- struct nlattr *opt_data_attr;
+ attr_data = opt_attrs[TEAM_ATTR_OPTION_DATA];
+ if (opt_type != TEAM_OPTION_TYPE_BOOL && !attr_data) {
+ err = -EINVAL;
+ goto team_put;
+ }
+
+ opt_name = nla_data(opt_attrs[TEAM_ATTR_OPTION_NAME]);
+ attr = opt_attrs[TEAM_ATTR_OPTION_PORT_IFINDEX];
+ if (attr)
+ opt_port_ifindex = nla_get_u32(attr);
+
+ attr = opt_attrs[TEAM_ATTR_OPTION_ARRAY_INDEX];
+ if (attr) {
+ opt_is_array = true;
+ opt_array_index = nla_get_u32(attr);
+ }
+ list_for_each_entry(opt_inst, &team->option_inst_list, list) {
+ struct team_option *option = opt_inst->option;
+ struct team_gsetter_ctx ctx;
+ struct team_option_inst_info *opt_inst_info;
+ int tmp_ifindex;
+
+ opt_inst_info = &opt_inst->info;
+ tmp_ifindex = opt_inst_info->port ?
+ opt_inst_info->port->dev->ifindex : 0;
if (option->type != opt_type ||
- strcmp(option->name, opt_name))
+ strcmp(option->name, opt_name) ||
+ tmp_ifindex != opt_port_ifindex ||
+ (option->array_size && !opt_is_array) ||
+ opt_inst_info->array_index != opt_array_index)
continue;
opt_found = true;
- opt_data_attr = mode_attrs[TEAM_ATTR_OPTION_DATA];
+ ctx.info = opt_inst_info;
switch (opt_type) {
case TEAM_OPTION_TYPE_U32:
- arg = nla_get_u32(opt_data_attr);
+ ctx.data.u32_val = nla_get_u32(attr_data);
break;
case TEAM_OPTION_TYPE_STRING:
- arg = (long) nla_data(opt_data_attr);
+ if (nla_len(attr_data) > TEAM_STRING_MAX_LEN) {
+ err = -EINVAL;
+ goto team_put;
+ }
+ ctx.data.str_val = nla_data(attr_data);
+ break;
+ case TEAM_OPTION_TYPE_BINARY:
+ ctx.data.bin_val.len = nla_len(attr_data);
+ ctx.data.bin_val.ptr = nla_data(attr_data);
+ break;
+ case TEAM_OPTION_TYPE_BOOL:
+ ctx.data.bool_val = attr_data ? true : false;
break;
default:
BUG();
}
- err = team_option_set(team, option, &arg);
+ err = team_option_set(team, opt_inst, &ctx);
if (err)
goto team_put;
+ opt_inst->changed = true;
+ list_add(&opt_inst->tmp_list, &opt_inst_list);
}
if (!opt_found) {
err = -ENOENT;
@@ -1400,6 +1878,8 @@ static int team_nl_cmd_options_set(struct sk_buff *skb, struct genl_info *info)
}
}
+ err = team_nl_send_event_options_get(team, &opt_inst_list);
+
team_put:
team_nl_team_put(team);
@@ -1420,10 +1900,11 @@ static int team_nl_fill_port_list_get(struct sk_buff *skb,
if (IS_ERR(hdr))
return PTR_ERR(hdr);
- NLA_PUT_U32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex);
+ if (nla_put_u32(skb, TEAM_ATTR_TEAM_IFINDEX, team->dev->ifindex))
+ goto nla_put_failure;
port_list = nla_nest_start(skb, TEAM_ATTR_LIST_PORT);
if (!port_list)
- return -EMSGSIZE;
+ goto nla_put_failure;
list_for_each_entry(port, &team->port_list, list) {
struct nlattr *port_item;
@@ -1434,17 +1915,20 @@ static int team_nl_fill_port_list_get(struct sk_buff *skb,
port_item = nla_nest_start(skb, TEAM_ATTR_ITEM_PORT);
if (!port_item)
goto nla_put_failure;
- NLA_PUT_U32(skb, TEAM_ATTR_PORT_IFINDEX, port->dev->ifindex);
+ if (nla_put_u32(skb, TEAM_ATTR_PORT_IFINDEX, port->dev->ifindex))
+ goto nla_put_failure;
if (port->changed) {
- NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_CHANGED);
+ if (nla_put_flag(skb, TEAM_ATTR_PORT_CHANGED))
+ goto nla_put_failure;
port->changed = false;
}
- if (port->removed)
- NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_REMOVED);
- if (port->linkup)
- NLA_PUT_FLAG(skb, TEAM_ATTR_PORT_LINKUP);
- NLA_PUT_U32(skb, TEAM_ATTR_PORT_SPEED, port->speed);
- NLA_PUT_U8(skb, TEAM_ATTR_PORT_DUPLEX, port->duplex);
+ if ((port->removed &&
+ nla_put_flag(skb, TEAM_ATTR_PORT_REMOVED)) ||
+ (port->state.linkup &&
+ nla_put_flag(skb, TEAM_ATTR_PORT_LINKUP)) ||
+ nla_put_u32(skb, TEAM_ATTR_PORT_SPEED, port->state.speed) ||
+ nla_put_u8(skb, TEAM_ATTR_PORT_DUPLEX, port->state.duplex))
+ goto nla_put_failure;
nla_nest_end(skb, port_item);
}
@@ -1512,27 +1996,18 @@ static struct genl_multicast_group team_change_event_mcgrp = {
.name = TEAM_GENL_CHANGE_EVENT_MC_GRP_NAME,
};
-static int team_nl_send_event_options_get(struct team *team)
+static int team_nl_send_multicast(struct sk_buff *skb,
+ struct team *team, u32 pid)
{
- struct sk_buff *skb;
- int err;
- struct net *net = dev_net(team->dev);
-
- skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
- if (!skb)
- return -ENOMEM;
-
- err = team_nl_fill_options_get(skb, 0, 0, 0, team, false);
- if (err < 0)
- goto err_fill;
-
- err = genlmsg_multicast_netns(net, skb, 0, team_change_event_mcgrp.id,
- GFP_KERNEL);
- return err;
+ return genlmsg_multicast_netns(dev_net(team->dev), skb, 0,
+ team_change_event_mcgrp.id, GFP_KERNEL);
+}
-err_fill:
- nlmsg_free(skb);
- return err;
+static int team_nl_send_event_options_get(struct team *team,
+ struct list_head *sel_opt_inst_list)
+{
+ return team_nl_send_options_get(team, 0, 0, 0, team_nl_send_multicast,
+ sel_opt_inst_list);
}
static int team_nl_send_event_port_list_get(struct team *team)
@@ -1592,10 +2067,17 @@ static void team_nl_fini(void)
static void __team_options_change_check(struct team *team)
{
int err;
+ struct team_option_inst *opt_inst;
+ LIST_HEAD(sel_opt_inst_list);
- err = team_nl_send_event_options_get(team);
+ list_for_each_entry(opt_inst, &team->option_inst_list, list) {
+ if (opt_inst->changed)
+ list_add_tail(&opt_inst->tmp_list, &sel_opt_inst_list);
+ }
+ err = team_nl_send_event_options_get(team, &sel_opt_inst_list);
if (err)
- netdev_warn(team->dev, "Failed to send options change via netlink\n");
+ netdev_warn(team->dev, "Failed to send options change via netlink (err %d)\n",
+ err);
}
/* rtnl lock is held */
@@ -1603,23 +2085,24 @@ static void __team_port_change_check(struct team_port *port, bool linkup)
{
int err;
- if (!port->removed && port->linkup == linkup)
+ if (!port->removed && port->state.linkup == linkup)
return;
port->changed = true;
- port->linkup = linkup;
+ port->state.linkup = linkup;
+ team_refresh_port_linkup(port);
if (linkup) {
struct ethtool_cmd ecmd;
err = __ethtool_get_settings(port->dev, &ecmd);
if (!err) {
- port->speed = ethtool_cmd_speed(&ecmd);
- port->duplex = ecmd.duplex;
+ port->state.speed = ethtool_cmd_speed(&ecmd);
+ port->state.duplex = ecmd.duplex;
goto send_event;
}
}
- port->speed = 0;
- port->duplex = 0;
+ port->state.speed = 0;
+ port->state.duplex = 0;
send_event:
err = team_nl_send_event_port_list_get(port->team);
@@ -1638,6 +2121,7 @@ static void team_port_change_check(struct team_port *port, bool linkup)
mutex_unlock(&team->lock);
}
+
/************************************
* Net device notifier event handler
************************************/
diff --git a/drivers/net/team/team_mode_activebackup.c b/drivers/net/team/team_mode_activebackup.c
index f4d960e..253b8a5 100644
--- a/drivers/net/team/team_mode_activebackup.c
+++ b/drivers/net/team/team_mode_activebackup.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team_mode_activebackup.c - Active-backup mode for team
+ * drivers/net/team/team_mode_activebackup.c - Active-backup mode for team
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -40,7 +40,7 @@ static bool ab_transmit(struct team *team, struct sk_buff *skb)
{
struct team_port *active_port;
- active_port = rcu_dereference(ab_priv(team)->active_port);
+ active_port = rcu_dereference_bh(ab_priv(team)->active_port);
if (unlikely(!active_port))
goto drop;
skb->dev = active_port->dev;
@@ -59,23 +59,25 @@ static void ab_port_leave(struct team *team, struct team_port *port)
RCU_INIT_POINTER(ab_priv(team)->active_port, NULL);
}
-static int ab_active_port_get(struct team *team, void *arg)
+static int ab_active_port_get(struct team *team, struct team_gsetter_ctx *ctx)
{
- u32 *ifindex = arg;
+ struct team_port *active_port;
- *ifindex = 0;
- if (ab_priv(team)->active_port)
- *ifindex = ab_priv(team)->active_port->dev->ifindex;
+ active_port = rcu_dereference_protected(ab_priv(team)->active_port,
+ lockdep_is_held(&team->lock));
+ if (active_port)
+ ctx->data.u32_val = active_port->dev->ifindex;
+ else
+ ctx->data.u32_val = 0;
return 0;
}
-static int ab_active_port_set(struct team *team, void *arg)
+static int ab_active_port_set(struct team *team, struct team_gsetter_ctx *ctx)
{
- u32 *ifindex = arg;
struct team_port *port;
- list_for_each_entry_rcu(port, &team->port_list, list) {
- if (port->dev->ifindex == *ifindex) {
+ list_for_each_entry(port, &team->port_list, list) {
+ if (port->dev->ifindex == ctx->data.u32_val) {
rcu_assign_pointer(ab_priv(team)->active_port, port);
return 0;
}
@@ -92,12 +94,12 @@ static const struct team_option ab_options[] = {
},
};
-int ab_init(struct team *team)
+static int ab_init(struct team *team)
{
return team_options_register(team, ab_options, ARRAY_SIZE(ab_options));
}
-void ab_exit(struct team *team)
+static void ab_exit(struct team *team)
{
team_options_unregister(team, ab_options, ARRAY_SIZE(ab_options));
}
@@ -110,7 +112,7 @@ static const struct team_mode_ops ab_mode_ops = {
.port_leave = ab_port_leave,
};
-static struct team_mode ab_mode = {
+static const struct team_mode ab_mode = {
.kind = "activebackup",
.owner = THIS_MODULE,
.priv_size = sizeof(struct ab_priv),
diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
new file mode 100644
index 0000000..51a4b19
--- /dev/null
+++ b/drivers/net/team/team_mode_loadbalance.c
@@ -0,0 +1,673 @@
+/*
+ * drivers/net/team/team_mode_loadbalance.c - Load-balancing mode for team
+ * Copyright (c) 2012 Jiri Pirko <jpirko(a)redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_team.h>
+
+struct lb_priv;
+
+typedef struct team_port *lb_select_tx_port_func_t(struct team *,
+ struct lb_priv *,
+ struct sk_buff *,
+ unsigned char);
+
+#define LB_TX_HASHTABLE_SIZE 256 /* hash is a char */
+
+struct lb_stats {
+ u64 tx_bytes;
+};
+
+struct lb_pcpu_stats {
+ struct lb_stats hash_stats[LB_TX_HASHTABLE_SIZE];
+ struct u64_stats_sync syncp;
+};
+
+struct lb_stats_info {
+ struct lb_stats stats;
+ struct lb_stats last_stats;
+ struct team_option_inst_info *opt_inst_info;
+};
+
+struct lb_port_mapping {
+ struct team_port __rcu *port;
+ struct team_option_inst_info *opt_inst_info;
+};
+
+struct lb_priv_ex {
+ struct team *team;
+ struct lb_port_mapping tx_hash_to_port_mapping[LB_TX_HASHTABLE_SIZE];
+ struct sock_fprog *orig_fprog;
+ struct {
+ unsigned int refresh_interval; /* in tenths of second */
+ struct delayed_work refresh_dw;
+ struct lb_stats_info info[LB_TX_HASHTABLE_SIZE];
+ } stats;
+};
+
+struct lb_priv {
+ struct sk_filter __rcu *fp;
+ lb_select_tx_port_func_t __rcu *select_tx_port_func;
+ struct lb_pcpu_stats __percpu *pcpu_stats;
+ struct lb_priv_ex *ex; /* priv extension */
+};
+
+static struct lb_priv *get_lb_priv(struct team *team)
+{
+ return (struct lb_priv *) &team->mode_priv;
+}
+
+struct lb_port_priv {
+ struct lb_stats __percpu *pcpu_stats;
+ struct lb_stats_info stats_info;
+};
+
+static struct lb_port_priv *get_lb_port_priv(struct team_port *port)
+{
+ return (struct lb_port_priv *) &port->mode_priv;
+}
+
+#define LB_HTPM_PORT_BY_HASH(lp_priv, hash) \
+ (lb_priv)->ex->tx_hash_to_port_mapping[hash].port
+
+#define LB_HTPM_OPT_INST_INFO_BY_HASH(lp_priv, hash) \
+ (lb_priv)->ex->tx_hash_to_port_mapping[hash].opt_inst_info
+
+static void lb_tx_hash_to_port_mapping_null_port(struct team *team,
+ struct team_port *port)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ bool changed = false;
+ int i;
+
+ for (i = 0; i < LB_TX_HASHTABLE_SIZE; i++) {
+ struct lb_port_mapping *pm;
+
+ pm = &lb_priv->ex->tx_hash_to_port_mapping[i];
+ if (rcu_access_pointer(pm->port) == port) {
+ RCU_INIT_POINTER(pm->port, NULL);
+ team_option_inst_set_change(pm->opt_inst_info);
+ changed = true;
+ }
+ }
+ if (changed)
+ team_options_change_check(team);
+}
+
+/* Basic tx selection based solely by hash */
+static struct team_port *lb_hash_select_tx_port(struct team *team,
+ struct lb_priv *lb_priv,
+ struct sk_buff *skb,
+ unsigned char hash)
+{
+ int port_index;
+
+ port_index = hash % team->en_port_count;
+ return team_get_port_by_index_rcu(team, port_index);
+}
+
+/* Hash to port mapping select tx port */
+static struct team_port *lb_htpm_select_tx_port(struct team *team,
+ struct lb_priv *lb_priv,
+ struct sk_buff *skb,
+ unsigned char hash)
+{
+ return rcu_dereference_bh(LB_HTPM_PORT_BY_HASH(lb_priv, hash));
+}
+
+struct lb_select_tx_port {
+ char *name;
+ lb_select_tx_port_func_t *func;
+};
+
+static const struct lb_select_tx_port lb_select_tx_port_list[] = {
+ {
+ .name = "hash",
+ .func = lb_hash_select_tx_port,
+ },
+ {
+ .name = "hash_to_port_mapping",
+ .func = lb_htpm_select_tx_port,
+ },
+};
+#define LB_SELECT_TX_PORT_LIST_COUNT ARRAY_SIZE(lb_select_tx_port_list)
+
+static char *lb_select_tx_port_get_name(lb_select_tx_port_func_t *func)
+{
+ int i;
+
+ for (i = 0; i < LB_SELECT_TX_PORT_LIST_COUNT; i++) {
+ const struct lb_select_tx_port *item;
+
+ item = &lb_select_tx_port_list[i];
+ if (item->func == func)
+ return item->name;
+ }
+ return NULL;
+}
+
+static lb_select_tx_port_func_t *lb_select_tx_port_get_func(const char *name)
+{
+ int i;
+
+ for (i = 0; i < LB_SELECT_TX_PORT_LIST_COUNT; i++) {
+ const struct lb_select_tx_port *item;
+
+ item = &lb_select_tx_port_list[i];
+ if (!strcmp(item->name, name))
+ return item->func;
+ }
+ return NULL;
+}
+
+static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
+ struct sk_buff *skb)
+{
+ struct sk_filter *fp;
+ uint32_t lhash;
+ unsigned char *c;
+
+ fp = rcu_dereference_bh(lb_priv->fp);
+ if (unlikely(!fp))
+ return 0;
+ lhash = SK_RUN_FILTER(fp, skb);
+ c = (char *) &lhash;
+ return c[0] ^ c[1] ^ c[2] ^ c[3];
+}
+
+static void lb_update_tx_stats(unsigned int tx_bytes, struct lb_priv *lb_priv,
+ struct lb_port_priv *lb_port_priv,
+ unsigned char hash)
+{
+ struct lb_pcpu_stats *pcpu_stats;
+ struct lb_stats *port_stats;
+ struct lb_stats *hash_stats;
+
+ pcpu_stats = this_cpu_ptr(lb_priv->pcpu_stats);
+ port_stats = this_cpu_ptr(lb_port_priv->pcpu_stats);
+ hash_stats = &pcpu_stats->hash_stats[hash];
+ u64_stats_update_begin(&pcpu_stats->syncp);
+ port_stats->tx_bytes += tx_bytes;
+ hash_stats->tx_bytes += tx_bytes;
+ u64_stats_update_end(&pcpu_stats->syncp);
+}
+
+static bool lb_transmit(struct team *team, struct sk_buff *skb)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *select_tx_port_func;
+ struct team_port *port;
+ unsigned char hash;
+ unsigned int tx_bytes = skb->len;
+
+ hash = lb_get_skb_hash(lb_priv, skb);
+ select_tx_port_func = rcu_dereference_bh(lb_priv->select_tx_port_func);
+ port = select_tx_port_func(team, lb_priv, skb, hash);
+ if (unlikely(!port))
+ goto drop;
+ skb->dev = port->dev;
+ if (dev_queue_xmit(skb))
+ return false;
+ lb_update_tx_stats(tx_bytes, lb_priv, get_lb_port_priv(port), hash);
+ return true;
+
+drop:
+ dev_kfree_skb_any(skb);
+ return false;
+}
+
+static int lb_bpf_func_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
+ if (!lb_priv->ex->orig_fprog) {
+ ctx->data.bin_val.len = 0;
+ ctx->data.bin_val.ptr = NULL;
+ return 0;
+ }
+ ctx->data.bin_val.len = lb_priv->ex->orig_fprog->len *
+ sizeof(struct sock_filter);
+ ctx->data.bin_val.ptr = lb_priv->ex->orig_fprog->filter;
+ return 0;
+}
+
+static int __fprog_create(struct sock_fprog **pfprog, u32 data_len,
+ const void *data)
+{
+ struct sock_fprog *fprog;
+ struct sock_filter *filter = (struct sock_filter *) data;
+
+ if (data_len % sizeof(struct sock_filter))
+ return -EINVAL;
+ fprog = kmalloc(sizeof(struct sock_fprog), GFP_KERNEL);
+ if (!fprog)
+ return -ENOMEM;
+ fprog->filter = kmemdup(filter, data_len, GFP_KERNEL);
+ if (!fprog->filter) {
+ kfree(fprog);
+ return -ENOMEM;
+ }
+ fprog->len = data_len / sizeof(struct sock_filter);
+ *pfprog = fprog;
+ return 0;
+}
+
+static void __fprog_destroy(struct sock_fprog *fprog)
+{
+ kfree(fprog->filter);
+ kfree(fprog);
+}
+
+static int lb_bpf_func_set(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ struct sk_filter *fp = NULL;
+ struct sk_filter *orig_fp;
+ struct sock_fprog *fprog = NULL;
+ int err;
+
+ if (ctx->data.bin_val.len) {
+ err = __fprog_create(&fprog, ctx->data.bin_val.len,
+ ctx->data.bin_val.ptr);
+ if (err)
+ return err;
+ err = sk_unattached_filter_create(&fp, fprog);
+ if (err) {
+ __fprog_destroy(fprog);
+ return err;
+ }
+ }
+
+ if (lb_priv->ex->orig_fprog) {
+ /* Clear old filter data */
+ __fprog_destroy(lb_priv->ex->orig_fprog);
+ orig_fp = rcu_dereference_protected(lb_priv->fp,
+ lockdep_is_held(&team->lock));
+ sk_unattached_filter_destroy(orig_fp);
+ }
+
+ rcu_assign_pointer(lb_priv->fp, fp);
+ lb_priv->ex->orig_fprog = fprog;
+ return 0;
+}
+
+static int lb_tx_method_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+ char *name;
+
+ func = rcu_dereference_protected(lb_priv->select_tx_port_func,
+ lockdep_is_held(&team->lock));
+ name = lb_select_tx_port_get_name(func);
+ BUG_ON(!name);
+ ctx->data.str_val = name;
+ return 0;
+}
+
+static int lb_tx_method_set(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+
+ func = lb_select_tx_port_get_func(ctx->data.str_val);
+ if (!func)
+ return -EINVAL;
+ rcu_assign_pointer(lb_priv->select_tx_port_func, func);
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = info->array_index;
+
+ LB_HTPM_OPT_INST_INFO_BY_HASH(lb_priv, hash) = info;
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ struct team_port *port;
+ unsigned char hash = ctx->info->array_index;
+
+ port = LB_HTPM_PORT_BY_HASH(lb_priv, hash);
+ ctx->data.u32_val = port ? port->dev->ifindex : 0;
+ return 0;
+}
+
+static int lb_tx_hash_to_port_mapping_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ struct team_port *port;
+ unsigned char hash = ctx->info->array_index;
+
+ list_for_each_entry(port, &team->port_list, list) {
+ if (ctx->data.u32_val == port->dev->ifindex &&
+ team_port_enabled(port)) {
+ rcu_assign_pointer(LB_HTPM_PORT_BY_HASH(lb_priv, hash),
+ port);
+ return 0;
+ }
+ }
+ return -ENODEV;
+}
+
+static int lb_hash_stats_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = info->array_index;
+
+ lb_priv->ex->stats.info[hash].opt_inst_info = info;
+ return 0;
+}
+
+static int lb_hash_stats_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned char hash = ctx->info->array_index;
+
+ ctx->data.bin_val.ptr = &lb_priv->ex->stats.info[hash].stats;
+ ctx->data.bin_val.len = sizeof(struct lb_stats);
+ return 0;
+}
+
+static int lb_port_stats_init(struct team *team,
+ struct team_option_inst_info *info)
+{
+ struct team_port *port = info->port;
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ lb_port_priv->stats_info.opt_inst_info = info;
+ return 0;
+}
+
+static int lb_port_stats_get(struct team *team, struct team_gsetter_ctx *ctx)
+{
+ struct team_port *port = ctx->info->port;
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ ctx->data.bin_val.ptr = &lb_port_priv->stats_info.stats;
+ ctx->data.bin_val.len = sizeof(struct lb_stats);
+ return 0;
+}
+
+static void __lb_stats_info_refresh_prepare(struct lb_stats_info *s_info)
+{
+ memcpy(&s_info->last_stats, &s_info->stats, sizeof(struct lb_stats));
+ memset(&s_info->stats, 0, sizeof(struct lb_stats));
+}
+
+static bool __lb_stats_info_refresh_check(struct lb_stats_info *s_info,
+ struct team *team)
+{
+ if (memcmp(&s_info->last_stats, &s_info->stats,
+ sizeof(struct lb_stats))) {
+ team_option_inst_set_change(s_info->opt_inst_info);
+ return true;
+ }
+ return false;
+}
+
+static void __lb_one_cpu_stats_add(struct lb_stats *acc_stats,
+ struct lb_stats *cpu_stats,
+ struct u64_stats_sync *syncp)
+{
+ unsigned int start;
+ struct lb_stats tmp;
+
+ do {
+ start = u64_stats_fetch_begin_bh(syncp);
+ tmp.tx_bytes = cpu_stats->tx_bytes;
+ } while (u64_stats_fetch_retry_bh(syncp, start));
+ acc_stats->tx_bytes += tmp.tx_bytes;
+}
+
+static void lb_stats_refresh(struct work_struct *work)
+{
+ struct team *team;
+ struct lb_priv *lb_priv;
+ struct lb_priv_ex *lb_priv_ex;
+ struct lb_pcpu_stats *pcpu_stats;
+ struct lb_stats *stats;
+ struct lb_stats_info *s_info;
+ struct team_port *port;
+ bool changed = false;
+ int i;
+ int j;
+
+ lb_priv_ex = container_of(work, struct lb_priv_ex,
+ stats.refresh_dw.work);
+
+ team = lb_priv_ex->team;
+ lb_priv = get_lb_priv(team);
+
+ if (!mutex_trylock(&team->lock)) {
+ schedule_delayed_work(&lb_priv_ex->stats.refresh_dw, 0);
+ return;
+ }
+
+ for (j = 0; j < LB_TX_HASHTABLE_SIZE; j++) {
+ s_info = &lb_priv->ex->stats.info[j];
+ __lb_stats_info_refresh_prepare(s_info);
+ for_each_possible_cpu(i) {
+ pcpu_stats = per_cpu_ptr(lb_priv->pcpu_stats, i);
+ stats = &pcpu_stats->hash_stats[j];
+ __lb_one_cpu_stats_add(&s_info->stats, stats,
+ &pcpu_stats->syncp);
+ }
+ changed |= __lb_stats_info_refresh_check(s_info, team);
+ }
+
+ list_for_each_entry(port, &team->port_list, list) {
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ s_info = &lb_port_priv->stats_info;
+ __lb_stats_info_refresh_prepare(s_info);
+ for_each_possible_cpu(i) {
+ pcpu_stats = per_cpu_ptr(lb_priv->pcpu_stats, i);
+ stats = per_cpu_ptr(lb_port_priv->pcpu_stats, i);
+ __lb_one_cpu_stats_add(&s_info->stats, stats,
+ &pcpu_stats->syncp);
+ }
+ changed |= __lb_stats_info_refresh_check(s_info, team);
+ }
+
+ if (changed)
+ team_options_change_check(team);
+
+ schedule_delayed_work(&lb_priv_ex->stats.refresh_dw,
+ (lb_priv_ex->stats.refresh_interval * HZ) / 10);
+
+ mutex_unlock(&team->lock);
+}
+
+static int lb_stats_refresh_interval_get(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
+ ctx->data.u32_val = lb_priv->ex->stats.refresh_interval;
+ return 0;
+}
+
+static int lb_stats_refresh_interval_set(struct team *team,
+ struct team_gsetter_ctx *ctx)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ unsigned int interval;
+
+ interval = ctx->data.u32_val;
+ if (lb_priv->ex->stats.refresh_interval == interval)
+ return 0;
+ lb_priv->ex->stats.refresh_interval = interval;
+ if (interval)
+ schedule_delayed_work(&lb_priv->ex->stats.refresh_dw, 0);
+ else
+ cancel_delayed_work(&lb_priv->ex->stats.refresh_dw);
+ return 0;
+}
+
+static const struct team_option lb_options[] = {
+ {
+ .name = "bpf_hash_func",
+ .type = TEAM_OPTION_TYPE_BINARY,
+ .getter = lb_bpf_func_get,
+ .setter = lb_bpf_func_set,
+ },
+ {
+ .name = "lb_tx_method",
+ .type = TEAM_OPTION_TYPE_STRING,
+ .getter = lb_tx_method_get,
+ .setter = lb_tx_method_set,
+ },
+ {
+ .name = "lb_tx_hash_to_port_mapping",
+ .array_size = LB_TX_HASHTABLE_SIZE,
+ .type = TEAM_OPTION_TYPE_U32,
+ .init = lb_tx_hash_to_port_mapping_init,
+ .getter = lb_tx_hash_to_port_mapping_get,
+ .setter = lb_tx_hash_to_port_mapping_set,
+ },
+ {
+ .name = "lb_hash_stats",
+ .array_size = LB_TX_HASHTABLE_SIZE,
+ .type = TEAM_OPTION_TYPE_BINARY,
+ .init = lb_hash_stats_init,
+ .getter = lb_hash_stats_get,
+ },
+ {
+ .name = "lb_port_stats",
+ .per_port = true,
+ .type = TEAM_OPTION_TYPE_BINARY,
+ .init = lb_port_stats_init,
+ .getter = lb_port_stats_get,
+ },
+ {
+ .name = "lb_stats_refresh_interval",
+ .type = TEAM_OPTION_TYPE_U32,
+ .getter = lb_stats_refresh_interval_get,
+ .setter = lb_stats_refresh_interval_set,
+ },
+};
+
+static int lb_init(struct team *team)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+ lb_select_tx_port_func_t *func;
+ int err;
+
+ /* set default tx port selector */
+ func = lb_select_tx_port_get_func("hash");
+ BUG_ON(!func);
+ rcu_assign_pointer(lb_priv->select_tx_port_func, func);
+
+ lb_priv->ex = kzalloc(sizeof(*lb_priv->ex), GFP_KERNEL);
+ if (!lb_priv->ex)
+ return -ENOMEM;
+ lb_priv->ex->team = team;
+
+ lb_priv->pcpu_stats = alloc_percpu(struct lb_pcpu_stats);
+ if (!lb_priv->pcpu_stats) {
+ err = -ENOMEM;
+ goto err_alloc_pcpu_stats;
+ }
+
+ INIT_DELAYED_WORK(&lb_priv->ex->stats.refresh_dw, lb_stats_refresh);
+
+ err = team_options_register(team, lb_options, ARRAY_SIZE(lb_options));
+ if (err)
+ goto err_options_register;
+ return 0;
+
+err_options_register:
+ free_percpu(lb_priv->pcpu_stats);
+err_alloc_pcpu_stats:
+ kfree(lb_priv->ex);
+ return err;
+}
+
+static void lb_exit(struct team *team)
+{
+ struct lb_priv *lb_priv = get_lb_priv(team);
+
+ team_options_unregister(team, lb_options,
+ ARRAY_SIZE(lb_options));
+ cancel_delayed_work_sync(&lb_priv->ex->stats.refresh_dw);
+ free_percpu(lb_priv->pcpu_stats);
+ kfree(lb_priv->ex);
+}
+
+static int lb_port_enter(struct team *team, struct team_port *port)
+{
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ lb_port_priv->pcpu_stats = alloc_percpu(struct lb_stats);
+ if (!lb_port_priv->pcpu_stats)
+ return -ENOMEM;
+ return 0;
+}
+
+static void lb_port_leave(struct team *team, struct team_port *port)
+{
+ struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
+
+ free_percpu(lb_port_priv->pcpu_stats);
+}
+
+static void lb_port_disabled(struct team *team, struct team_port *port)
+{
+ lb_tx_hash_to_port_mapping_null_port(team, port);
+}
+
+static const struct team_mode_ops lb_mode_ops = {
+ .init = lb_init,
+ .exit = lb_exit,
+ .port_enter = lb_port_enter,
+ .port_leave = lb_port_leave,
+ .port_disabled = lb_port_disabled,
+ .transmit = lb_transmit,
+};
+
+static const struct team_mode lb_mode = {
+ .kind = "loadbalance",
+ .owner = THIS_MODULE,
+ .priv_size = sizeof(struct lb_priv),
+ .port_priv_size = sizeof(struct lb_port_priv),
+ .ops = &lb_mode_ops,
+};
+
+static int __init lb_init_module(void)
+{
+ return team_mode_register(&lb_mode);
+}
+
+static void __exit lb_cleanup_module(void)
+{
+ team_mode_unregister(&lb_mode);
+}
+
+module_init(lb_init_module);
+module_exit(lb_cleanup_module);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Jiri Pirko <jpirko(a)redhat.com>");
+MODULE_DESCRIPTION("Load-balancing mode for team");
+MODULE_ALIAS("team-mode-loadbalance");
diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
index a0e8f80..52dd0ec 100644
--- a/drivers/net/team/team_mode_roundrobin.c
+++ b/drivers/net/team/team_mode_roundrobin.c
@@ -1,5 +1,5 @@
/*
- * net/drivers/team/team_mode_roundrobin.c - Round-robin mode for team
+ * drivers/net/team/team_mode_roundrobin.c - Round-robin mode for team
* Copyright (c) 2011 Jiri Pirko <jpirko(a)redhat.com>
*
* This program is free software; you can redistribute it and/or modify
@@ -50,7 +50,7 @@ static bool rr_transmit(struct team *team, struct sk_buff *skb)
struct team_port *port;
int port_index;
- port_index = rr_priv(team)->sent_packets++ % team->port_count;
+ port_index = rr_priv(team)->sent_packets++ % team->en_port_count;
port = team_get_port_by_index_rcu(team, port_index);
port = __get_first_port_up(team, port);
if (unlikely(!port))
@@ -81,7 +81,7 @@ static const struct team_mode_ops rr_mode_ops = {
.port_change_mac = rr_port_change_mac,
};
-static struct team_mode rr_mode = {
+static const struct team_mode rr_mode = {
.kind = "roundrobin",
.owner = THIS_MODULE,
.priv_size = sizeof(struct rr_priv),
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 8eeb205..7209099 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -126,7 +126,8 @@ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
#define SKF_AD_HATYPE 28
#define SKF_AD_RXHASH 32
#define SKF_AD_CPU 36
-#define SKF_AD_MAX 40
+#define SKF_AD_ALU_XOR_X 40
+#define SKF_AD_MAX 44
#define SKF_NET_OFF (-0x100000)
#define SKF_LL_OFF (-0x200000)
@@ -153,6 +154,9 @@ static inline unsigned int sk_filter_len(const struct sk_filter *fp)
extern int sk_filter(struct sock *sk, struct sk_buff *skb);
extern unsigned int sk_run_filter(const struct sk_buff *skb,
const struct sock_filter *filter);
+extern int sk_unattached_filter_create(struct sk_filter **pfp,
+ struct sock_fprog *fprog);
+extern void sk_unattached_filter_destroy(struct sk_filter *fp);
extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk);
extern int sk_detach_filter(struct sock *sk);
extern int sk_chk_filter(struct sock_filter *filter, unsigned int flen);
@@ -228,6 +232,7 @@ enum {
BPF_S_ANC_HATYPE,
BPF_S_ANC_RXHASH,
BPF_S_ANC_CPU,
+ BPF_S_ANC_ALU_XOR_X,
};
#endif /* __KERNEL__ */
diff --git a/include/linux/if_team.h b/include/linux/if_team.h
index 58404b0..99efd60 100644
--- a/include/linux/if_team.h
+++ b/include/linux/if_team.h
@@ -28,10 +28,28 @@ struct team;
struct team_port {
struct net_device *dev;
- struct hlist_node hlist; /* node in hash list */
+ struct hlist_node hlist; /* node in enabled ports hash list */
struct list_head list; /* node in ordinary list */
struct team *team;
- int index;
+ int index; /* index of enabled port. If disabled, it's set to -1 */
+
+ bool linkup; /* either state.linkup or user.linkup */
+
+ struct {
+ bool linkup;
+ u32 speed;
+ u8 duplex;
+ } state;
+
+ /* Values set by userspace */
+ struct {
+ bool linkup;
+ bool linkup_enabled;
+ } user;
+
+ /* Custom gennetlink interface related flags */
+ bool changed;
+ bool removed;
/*
* A place for storing original values of the device before it
@@ -42,17 +60,11 @@ struct team_port {
unsigned int mtu;
} orig;
- bool linkup;
- u32 speed;
- u8 duplex;
-
- /* Custom gennetlink interface related flags */
- bool changed;
- bool removed;
-
- struct rcu_head rcu;
+ long mode_priv[0];
};
+extern bool team_port_enabled(struct team_port *port);
+
struct team_mode_ops {
int (*init)(struct team *team);
void (*exit)(struct team *team);
@@ -63,30 +75,54 @@ struct team_mode_ops {
int (*port_enter)(struct team *team, struct team_port *port);
void (*port_leave)(struct team *team, struct team_port *port);
void (*port_change_mac)(struct team *team, struct team_port *port);
+ void (*port_enabled)(struct team *team, struct team_port *port);
+ void (*port_disabled)(struct team *team, struct team_port *port);
};
enum team_option_type {
TEAM_OPTION_TYPE_U32,
TEAM_OPTION_TYPE_STRING,
+ TEAM_OPTION_TYPE_BINARY,
+ TEAM_OPTION_TYPE_BOOL,
+};
+
+struct team_option_inst_info {
+ u32 array_index;
+ struct team_port *port; /* != NULL if per-port */
+};
+
+struct team_gsetter_ctx {
+ union {
+ u32 u32_val;
+ const char *str_val;
+ struct {
+ const void *ptr;
+ u32 len;
+ } bin_val;
+ bool bool_val;
+ } data;
+ struct team_option_inst_info *info;
};
struct team_option {
struct list_head list;
const char *name;
+ bool per_port;
+ unsigned int array_size; /* != 0 means the option is array */
enum team_option_type type;
- int (*getter)(struct team *team, void *arg);
- int (*setter)(struct team *team, void *arg);
-
- /* Custom gennetlink interface related flags */
- bool changed;
- bool removed;
+ int (*init)(struct team *team, struct team_option_inst_info *info);
+ int (*getter)(struct team *team, struct team_gsetter_ctx *ctx);
+ int (*setter)(struct team *team, struct team_gsetter_ctx *ctx);
};
+extern void team_option_inst_set_change(struct team_option_inst_info *opt_inst_info);
+extern void team_options_change_check(struct team *team);
+
struct team_mode {
- struct list_head list;
const char *kind;
struct module *owner;
size_t priv_size;
+ size_t port_priv_size;
const struct team_mode_ops *ops;
};
@@ -103,13 +139,15 @@ struct team {
struct mutex lock; /* used for overall locking, e.g. port lists write */
/*
- * port lists with port count
+ * List of enabled ports and their count
*/
- int port_count;
- struct hlist_head port_hlist[TEAM_PORT_HASHENTRIES];
- struct list_head port_list;
+ int en_port_count;
+ struct hlist_head en_port_hlist[TEAM_PORT_HASHENTRIES];
+
+ struct list_head port_list; /* list of all ports */
struct list_head option_list;
+ struct list_head option_inst_list; /* list of option instances */
const struct team_mode *mode;
struct team_mode_ops ops;
@@ -119,7 +157,7 @@ struct team {
static inline struct hlist_head *team_port_index_hash(struct team *team,
int port_index)
{
- return &team->port_hlist[port_index & (TEAM_PORT_HASHENTRIES - 1)];
+ return &team->en_port_hlist[port_index & (TEAM_PORT_HASHENTRIES - 1)];
}
static inline struct team_port *team_get_port_by_index(struct team *team,
@@ -154,8 +192,8 @@ extern int team_options_register(struct team *team,
extern void team_options_unregister(struct team *team,
const struct team_option *option,
size_t option_count);
-extern int team_mode_register(struct team_mode *mode);
-extern int team_mode_unregister(struct team_mode *mode);
+extern int team_mode_register(const struct team_mode *mode);
+extern void team_mode_unregister(const struct team_mode *mode);
#endif /* __KERNEL__ */
@@ -216,6 +254,8 @@ enum {
TEAM_ATTR_OPTION_TYPE, /* u8 */
TEAM_ATTR_OPTION_DATA, /* dynamic */
TEAM_ATTR_OPTION_REMOVED, /* flag */
+ TEAM_ATTR_OPTION_PORT_IFINDEX, /* u32 */ /* for per-port options */
+ TEAM_ATTR_OPTION_ARRAY_INDEX, /* u32 */ /* for array options */
__TEAM_ATTR_OPTION_MAX,
TEAM_ATTR_OPTION_MAX = __TEAM_ATTR_OPTION_MAX - 1,
diff --git a/net/core/filter.c b/net/core/filter.c
index 6f755cc..95d05a6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -317,6 +317,9 @@ load_b:
case BPF_S_ANC_CPU:
A = raw_smp_processor_id();
continue;
+ case BPF_S_ANC_ALU_XOR_X:
+ A ^= X;
+ continue;
case BPF_S_ANC_NLATTR: {
struct nlattr *nla;
@@ -561,6 +564,7 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
ANCILLARY(HATYPE);
ANCILLARY(RXHASH);
ANCILLARY(CPU);
+ ANCILLARY(ALU_XOR_X);
}
}
ftest->code = code;
@@ -589,6 +593,67 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
}
EXPORT_SYMBOL(sk_filter_release_rcu);
+static int __sk_prepare_filter(struct sk_filter *fp)
+{
+ int err;
+
+ fp->bpf_func = sk_run_filter;
+
+ err = sk_chk_filter(fp->insns, fp->len);
+ if (err)
+ return err;
+
+ bpf_jit_compile(fp);
+ return 0;
+}
+
+/**
+ * sk_unattached_filter_create - create an unattached filter
+ * @fprog: the filter program
+ * @sk: the socket to use
+ *
+ * Create a filter independent ofr any socket. We first run some
+ * sanity checks on it to make sure it does not explode on us later.
+ * If an error occurs or there is insufficient memory for the filter
+ * a negative errno code is returned. On success the return is zero.
+ */
+int sk_unattached_filter_create(struct sk_filter **pfp,
+ struct sock_fprog *fprog)
+{
+ struct sk_filter *fp;
+ unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+ int err;
+
+ /* Make sure new filter is there and in the right amounts. */
+ if (fprog->filter == NULL)
+ return -EINVAL;
+
+ fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL);
+ if (!fp)
+ return -ENOMEM;
+ memcpy(fp->insns, fprog->filter, fsize);
+
+ atomic_set(&fp->refcnt, 1);
+ fp->len = fprog->len;
+
+ err = __sk_prepare_filter(fp);
+ if (err)
+ goto free_mem;
+
+ *pfp = fp;
+ return 0;
+free_mem:
+ kfree(fp);
+ return err;
+}
+EXPORT_SYMBOL_GPL(sk_unattached_filter_create);
+
+void sk_unattached_filter_destroy(struct sk_filter *fp)
+{
+ sk_filter_release(fp);
+}
+EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
+
/**
* sk_attach_filter - attach a socket filter
* @fprog: the filter program
@@ -619,16 +684,13 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
atomic_set(&fp->refcnt, 1);
fp->len = fprog->len;
- fp->bpf_func = sk_run_filter;
- err = sk_chk_filter(fp->insns, fp->len);
+ err = __sk_prepare_filter(fp);
if (err) {
sk_filter_uncharge(sk, fp);
return err;
}
- bpf_jit_compile(fp);
-
old_fp = rcu_dereference_protected(sk->sk_filter,
sock_owned_by_user(sk));
rcu_assign_pointer(sk->sk_filter, fp);
11 years, 2 months
Fedora Kernel Meeting Minutes Jun 22, 2012
by Josh Boyer
(Note, slightly edited due to lack of #info use during the meeting)
==============================
#fedora-meeting: Fedora Kernel
==============================
Meeting started by jwb at 18:03:03 UTC. The full logs are available at
http://meetbot.fedoraproject.org/fedora-meeting/2012-06-22/fedora-kernel....
.
Meeting summary
---------------
* init (jwb, 18:03:31)
* F15 (jwb, 18:04:25)
* 2.6.43.8 (3.3.8) final kernel update pushed before EOL
* F16/F17 (jwb, 18:06:07)
* Currently at 3.4.3. A few issues in the 3.4 kernel being worked through
* Rawhide (jwb, 18:15:15)
* 3.5-rc3
* Need more rawhide kernel testing. Should work fine with F16/F17 userspace
* Kernel QA (jwb, 18:26:09)
* kernel test repo can be found at
git.fedorahosted.org/git/kernel-tests.git . contributions welcome
(jwb, 18:28:28)
* Kernel updates and bodhi (jwb, 18:29:24)
* Plan to disable auto promotion/pushing to allow more flexibility between
release updates
* Strongly prefer negative karma points be linked to a reported bug
* Negative karma for a bug we didn't say we fixed in a particular update
is not particularly helpful. It holds up fixes for others.
* Open Floor (jwb, 18:37:55)
Meeting ended at 18:41:09 UTC.
Action Items
------------
Action Items, by person
-----------------------
* **UNASSIGNED**
* (none)
People Present (lines said)
---------------------------
* jwb (69)
* davej (28)
* brunowolff (8)
* zodbot (3)
* jsmith (1)
Generated by `MeetBot`_ 0.1.4
.. _`MeetBot`: http://wiki.debian.org/MeetBot
11 years, 3 months
[PATCH PULL F16 F17] Various Uprobes fixes backport from linux-tip
by Anton Arapov
The split-out series is available in the git repository at:
git://fedorapeople.org/home/fedora/aarapov/public_git/kernel-uprobes.git tags/f17_exported
Ananth N Mavinakayanahalli (1):
uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
Josh Stone (1):
uprobes: add exports necessary for uprobes use by modules
Oleg Nesterov (21):
uprobes: Optimize is_swbp_at_addr() for current->mm
uprobes: Change read_opcode() to use FOLL_FORCE
uprobes: Introduce find_active_uprobe() helper
uprobes: Teach find_active_uprobe() to provide the "is_swbp" info
uprobes: Change register_for_each_vma() to take mm->mmap_sem for writing
uprobes: Teach handle_swbp() to rely on "is_swbp" rather than uprobes_srcu
uprobes: Kill uprobes_srcu/uprobe_srcu_id
uprobes: Valid_vma() should reject VM_HUGETLB
uprobes: __copy_insn() should ensure a_ops->readpage != NULL
uprobes: Write_opcode()->__replace_page() can race with try_to_unmap()
uprobes: Install_breakpoint() should fail if is_swbp_insn() == T
uprobes: Rework register_for_each_vma() to make it O(n)
uprobes: Change build_map_info() to try kmalloc(GFP_NOWAIT) first
uprobes: Copy_insn() shouldn't depend on mm/vma/vaddr
uprobes: Copy_insn() should not return -ENOMEM if __copy_insn() fails
uprobes: No need to re-check vma_address() in write_opcode()
uprobes: Simplify the usage of uprobe->pending_list
uprobes: Don't use loff_t for the valid virtual address
uprobes: __copy_insn() needs "loff_t offset"
uprobes: Remove the unnecessary initialization in add_utask()
uprobes: Move BUG_ON(UPROBE_SWBP_INSN_SIZE) from write_opcode() to install_breakpoint()
Peter Zijlstra (1):
uprobes: Document uprobe_register() vs uprobe_mmap() race
Signed-off-by: Anton Arapov <anton(a)redhat.com>
---
arch/x86/include/asm/uprobes.h | 2 +-
arch/x86/kernel/ptrace.c | 6 +
arch/x86/kernel/uprobes.c | 3 +-
include/linux/sched.h | 1 -
kernel/events/uprobes.c | 464 ++++++++++++++++++++--------------------
5 files changed, 240 insertions(+), 236 deletions(-)
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 1e9bed1..f3971bb 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -48,7 +48,7 @@ struct arch_uprobe_task {
#endif
};
-extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm);
+extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index cf11783..4609190 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1415,6 +1415,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)
#endif
}
+/*
+ * This is declared in linux/regset.h and defined in machine-dependent
+ * code. We put the export here to ensure no machine forgets it.
+ */
+EXPORT_SYMBOL_GPL(task_user_regset_view);
+
static void fill_sigtrap_info(struct task_struct *tsk,
struct pt_regs *regs,
int error_code, int si_code,
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index dc4e910..36fd420 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -409,9 +409,10 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
* @arch_uprobe: the probepoint information.
+ * @addr: virtual address at which to install the probepoint
* Return 0 on success or a -ve number on error.
*/
-int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm)
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
{
int ret;
struct insn insn;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index cff94cd..6869c60 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1619,7 +1619,6 @@ struct task_struct {
#endif
#ifdef CONFIG_UPROBES
struct uprobe_task *utask;
- int uprobe_srcu_id;
#endif
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 985be4d..d9e5ba5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -34,17 +34,34 @@
#include <linux/kdebug.h> /* notifier mechanism */
#include <linux/uprobes.h>
+#include <linux/export.h>
#define UINSNS_PER_PAGE (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
#define MAX_UPROBE_XOL_SLOTS UINSNS_PER_PAGE
-static struct srcu_struct uprobes_srcu;
static struct rb_root uprobes_tree = RB_ROOT;
static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree access */
#define UPROBES_HASH_SZ 13
+/*
+ * We need separate register/unregister and mmap/munmap lock hashes because
+ * of mmap_sem nesting.
+ *
+ * uprobe_register() needs to install probes on (potentially) all processes
+ * and thus needs to acquire multiple mmap_sems (consequtively, not
+ * concurrently), whereas uprobe_mmap() is called while holding mmap_sem
+ * for the particular process doing the mmap.
+ *
+ * uprobe_register()->register_for_each_vma() needs to drop/acquire mmap_sem
+ * because of lock order against i_mmap_mutex. This means there's a hole in
+ * the register vma iteration where a mmap() can happen.
+ *
+ * Thus uprobe_register() can race with uprobe_mmap() and we can try and
+ * install a probe where one is already installed.
+ */
+
/* serialize (un)register */
static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
@@ -61,17 +78,6 @@ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ];
*/
static atomic_t uprobe_events = ATOMIC_INIT(0);
-/*
- * Maintain a temporary per vma info that can be used to search if a vma
- * has already been handled. This structure is introduced since extending
- * vm_area_struct wasnt recommended.
- */
-struct vma_info {
- struct list_head probe_list;
- struct mm_struct *mm;
- loff_t vaddr;
-};
-
struct uprobe {
struct rb_node rb_node; /* node in the rb tree */
atomic_t ref;
@@ -100,7 +106,8 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register)
if (!is_register)
return true;
- if ((vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)) == (VM_READ|VM_EXEC))
+ if ((vma->vm_flags & (VM_HUGETLB|VM_READ|VM_WRITE|VM_EXEC|VM_SHARED))
+ == (VM_READ|VM_EXEC))
return true;
return false;
@@ -129,33 +136,17 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset)
static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage)
{
struct mm_struct *mm = vma->vm_mm;
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *ptep;
- spinlock_t *ptl;
unsigned long addr;
- int err = -EFAULT;
+ spinlock_t *ptl;
+ pte_t *ptep;
addr = page_address_in_vma(page, vma);
if (addr == -EFAULT)
- goto out;
-
- pgd = pgd_offset(mm, addr);
- if (!pgd_present(*pgd))
- goto out;
-
- pud = pud_offset(pgd, addr);
- if (!pud_present(*pud))
- goto out;
-
- pmd = pmd_offset(pud, addr);
- if (!pmd_present(*pmd))
- goto out;
+ return -EFAULT;
- ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ ptep = page_check_address(page, mm, addr, &ptl, 0);
if (!ptep)
- goto out;
+ return -EAGAIN;
get_page(kpage);
page_add_new_anon_rmap(kpage, vma, addr);
@@ -174,10 +165,8 @@ static int __replace_page(struct vm_area_struct *vma, struct page *page, struct
try_to_free_swap(page);
put_page(page);
pte_unmap_unlock(ptep, ptl);
- err = 0;
-out:
- return err;
+ return 0;
}
/**
@@ -222,9 +211,8 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
void *vaddr_old, *vaddr_new;
struct vm_area_struct *vma;
struct uprobe *uprobe;
- loff_t addr;
int ret;
-
+retry:
/* Read the page with vaddr into memory */
ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma);
if (ret <= 0)
@@ -246,10 +234,6 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (mapping != vma->vm_file->f_mapping)
goto put_out;
- addr = vma_address(vma, uprobe->offset);
- if (vaddr != (unsigned long)addr)
- goto put_out;
-
ret = -ENOMEM;
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);
if (!new_page)
@@ -267,11 +251,7 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
vaddr_new = kmap_atomic(new_page);
memcpy(vaddr_new, vaddr_old, PAGE_SIZE);
-
- /* poke the new insn in, ASSUMES we don't cross page boundary */
- vaddr &= ~PAGE_MASK;
- BUG_ON(vaddr + UPROBE_SWBP_INSN_SIZE > PAGE_SIZE);
- memcpy(vaddr_new + vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+ memcpy(vaddr_new + (vaddr & ~PAGE_MASK), &opcode, UPROBE_SWBP_INSN_SIZE);
kunmap_atomic(vaddr_new);
kunmap_atomic(vaddr_old);
@@ -291,6 +271,8 @@ unlock_out:
put_out:
put_page(old_page);
+ if (unlikely(ret == -EAGAIN))
+ goto retry;
return ret;
}
@@ -312,7 +294,7 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_
void *vaddr_new;
int ret;
- ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &page, NULL);
+ ret = get_user_pages(NULL, mm, vaddr, 1, 0, 1, &page, NULL);
if (ret <= 0)
return ret;
@@ -333,10 +315,20 @@ static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr)
uprobe_opcode_t opcode;
int result;
+ if (current->mm == mm) {
+ pagefault_disable();
+ result = __copy_from_user_inatomic(&opcode, (void __user*)vaddr,
+ sizeof(opcode));
+ pagefault_enable();
+
+ if (likely(result == 0))
+ goto out;
+ }
+
result = read_opcode(mm, vaddr, &opcode);
if (result)
return result;
-
+out:
if (is_swbp_insn(&opcode))
return 1;
@@ -355,7 +347,9 @@ static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr)
int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
{
int result;
-
+ /*
+ * See the comment near uprobes_hash().
+ */
result = is_swbp_at_addr(mm, vaddr);
if (result == 1)
return -EEXIST;
@@ -520,7 +514,6 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
uprobe->inode = igrab(inode);
uprobe->offset = offset;
init_rwsem(&uprobe->consumer_rwsem);
- INIT_LIST_HEAD(&uprobe->pending_list);
/* add to uprobes_tree, sorted on inode:offset */
cur_uprobe = insert_uprobe(uprobe);
@@ -588,20 +581,22 @@ static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc)
}
static int
-__copy_insn(struct address_space *mapping, struct vm_area_struct *vma, char *insn,
- unsigned long nbytes, unsigned long offset)
+__copy_insn(struct address_space *mapping, struct file *filp, char *insn,
+ unsigned long nbytes, loff_t offset)
{
- struct file *filp = vma->vm_file;
struct page *page;
void *vaddr;
- unsigned long off1;
- unsigned long idx;
+ unsigned long off;
+ pgoff_t idx;
if (!filp)
return -EINVAL;
- idx = (unsigned long)(offset >> PAGE_CACHE_SHIFT);
- off1 = offset &= ~PAGE_MASK;
+ if (!mapping->a_ops->readpage)
+ return -EIO;
+
+ idx = offset >> PAGE_CACHE_SHIFT;
+ off = offset & ~PAGE_MASK;
/*
* Ensure that the page that has the original instruction is
@@ -612,22 +607,20 @@ __copy_insn(struct address_space *mapping, struct vm_area_struct *vma, char *ins
return PTR_ERR(page);
vaddr = kmap_atomic(page);
- memcpy(insn, vaddr + off1, nbytes);
+ memcpy(insn, vaddr + off, nbytes);
kunmap_atomic(vaddr);
page_cache_release(page);
return 0;
}
-static int
-copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr)
+static int copy_insn(struct uprobe *uprobe, struct file *filp)
{
struct address_space *mapping;
unsigned long nbytes;
int bytes;
- addr &= ~PAGE_MASK;
- nbytes = PAGE_SIZE - addr;
+ nbytes = PAGE_SIZE - (uprobe->offset & ~PAGE_MASK);
mapping = uprobe->inode->i_mapping;
/* Instruction at end of binary; copy only available bytes */
@@ -638,13 +631,13 @@ copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr)
/* Instruction at the page-boundary; copy bytes in second page */
if (nbytes < bytes) {
- if (__copy_insn(mapping, vma, uprobe->arch.insn + nbytes,
- bytes - nbytes, uprobe->offset + nbytes))
- return -ENOMEM;
-
+ int err = __copy_insn(mapping, filp, uprobe->arch.insn + nbytes,
+ bytes - nbytes, uprobe->offset + nbytes);
+ if (err)
+ return err;
bytes = nbytes;
}
- return __copy_insn(mapping, vma, uprobe->arch.insn, bytes, uprobe->offset);
+ return __copy_insn(mapping, filp, uprobe->arch.insn, bytes, uprobe->offset);
}
/*
@@ -672,9 +665,8 @@ copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr)
*/
static int
install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
- struct vm_area_struct *vma, loff_t vaddr)
+ struct vm_area_struct *vma, unsigned long vaddr)
{
- unsigned long addr;
int ret;
/*
@@ -687,20 +679,22 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
if (!uprobe->consumers)
return -EEXIST;
- addr = (unsigned long)vaddr;
-
if (!(uprobe->flags & UPROBE_COPY_INSN)) {
- ret = copy_insn(uprobe, vma, addr);
+ ret = copy_insn(uprobe, vma->vm_file);
if (ret)
return ret;
if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
- return -EEXIST;
+ return -ENOTSUPP;
- ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
+ ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, vaddr);
if (ret)
return ret;
+ /* write_opcode() assumes we don't cross page boundary */
+ BUG_ON((uprobe->offset & ~PAGE_MASK) +
+ UPROBE_SWBP_INSN_SIZE > PAGE_SIZE);
+
uprobe->flags |= UPROBE_COPY_INSN;
}
@@ -713,7 +707,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
* Hence increment before and decrement on failure.
*/
atomic_inc(&mm->uprobes_state.count);
- ret = set_swbp(&uprobe->arch, mm, addr);
+ ret = set_swbp(&uprobe->arch, mm, vaddr);
if (ret)
atomic_dec(&mm->uprobes_state.count);
@@ -721,27 +715,21 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
}
static void
-remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr)
+remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
{
- if (!set_orig_insn(&uprobe->arch, mm, (unsigned long)vaddr, true))
+ if (!set_orig_insn(&uprobe->arch, mm, vaddr, true))
atomic_dec(&mm->uprobes_state.count);
}
/*
- * There could be threads that have hit the breakpoint and are entering the
- * notifier code and trying to acquire the uprobes_treelock. The thread
- * calling delete_uprobe() that is removing the uprobe from the rb_tree can
- * race with these threads and might acquire the uprobes_treelock compared
- * to some of the breakpoint hit threads. In such a case, the breakpoint
- * hit threads will not find the uprobe. The current unregistering thread
- * waits till all other threads have hit a breakpoint, to acquire the
- * uprobes_treelock before the uprobe is removed from the rbtree.
+ * There could be threads that have already hit the breakpoint. They
+ * will recheck the current insn and restart if find_uprobe() fails.
+ * See find_active_uprobe().
*/
static void delete_uprobe(struct uprobe *uprobe)
{
unsigned long flags;
- synchronize_srcu(&uprobes_srcu);
spin_lock_irqsave(&uprobes_treelock, flags);
rb_erase(&uprobe->rb_node, &uprobes_tree);
spin_unlock_irqrestore(&uprobes_treelock, flags);
@@ -750,139 +738,135 @@ static void delete_uprobe(struct uprobe *uprobe)
atomic_dec(&uprobe_events);
}
-static struct vma_info *
-__find_next_vma_info(struct address_space *mapping, struct list_head *head,
- struct vma_info *vi, loff_t offset, bool is_register)
+struct map_info {
+ struct map_info *next;
+ struct mm_struct *mm;
+ unsigned long vaddr;
+};
+
+static inline struct map_info *free_map_info(struct map_info *info)
+{
+ struct map_info *next = info->next;
+ kfree(info);
+ return next;
+}
+
+static struct map_info *
+build_map_info(struct address_space *mapping, loff_t offset, bool is_register)
{
+ unsigned long pgoff = offset >> PAGE_SHIFT;
struct prio_tree_iter iter;
struct vm_area_struct *vma;
- struct vma_info *tmpvi;
- unsigned long pgoff;
- int existing_vma;
- loff_t vaddr;
-
- pgoff = offset >> PAGE_SHIFT;
+ struct map_info *curr = NULL;
+ struct map_info *prev = NULL;
+ struct map_info *info;
+ int more = 0;
+ again:
+ mutex_lock(&mapping->i_mmap_mutex);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
if (!valid_vma(vma, is_register))
continue;
- existing_vma = 0;
- vaddr = vma_address(vma, offset);
-
- list_for_each_entry(tmpvi, head, probe_list) {
- if (tmpvi->mm == vma->vm_mm && tmpvi->vaddr == vaddr) {
- existing_vma = 1;
- break;
- }
+ if (!prev && !more) {
+ /*
+ * Needs GFP_NOWAIT to avoid i_mmap_mutex recursion through
+ * reclaim. This is optimistic, no harm done if it fails.
+ */
+ prev = kmalloc(sizeof(struct map_info),
+ GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
+ if (prev)
+ prev->next = NULL;
}
-
- /*
- * Another vma needs a probe to be installed. However skip
- * installing the probe if the vma is about to be unlinked.
- */
- if (!existing_vma && atomic_inc_not_zero(&vma->vm_mm->mm_users)) {
- vi->mm = vma->vm_mm;
- vi->vaddr = vaddr;
- list_add(&vi->probe_list, head);
-
- return vi;
+ if (!prev) {
+ more++;
+ continue;
}
- }
- return NULL;
-}
-
-/*
- * Iterate in the rmap prio tree and find a vma where a probe has not
- * yet been inserted.
- */
-static struct vma_info *
-find_next_vma_info(struct address_space *mapping, struct list_head *head,
- loff_t offset, bool is_register)
-{
- struct vma_info *vi, *retvi;
+ if (!atomic_inc_not_zero(&vma->vm_mm->mm_users))
+ continue;
- vi = kzalloc(sizeof(struct vma_info), GFP_KERNEL);
- if (!vi)
- return ERR_PTR(-ENOMEM);
+ info = prev;
+ prev = prev->next;
+ info->next = curr;
+ curr = info;
- mutex_lock(&mapping->i_mmap_mutex);
- retvi = __find_next_vma_info(mapping, head, vi, offset, is_register);
+ info->mm = vma->vm_mm;
+ info->vaddr = vma_address(vma, offset);
+ }
mutex_unlock(&mapping->i_mmap_mutex);
- if (!retvi)
- kfree(vi);
+ if (!more)
+ goto out;
+
+ prev = curr;
+ while (curr) {
+ mmput(curr->mm);
+ curr = curr->next;
+ }
- return retvi;
+ do {
+ info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
+ if (!info) {
+ curr = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ info->next = prev;
+ prev = info;
+ } while (--more);
+
+ goto again;
+ out:
+ while (prev)
+ prev = free_map_info(prev);
+ return curr;
}
static int register_for_each_vma(struct uprobe *uprobe, bool is_register)
{
- struct list_head try_list;
- struct vm_area_struct *vma;
- struct address_space *mapping;
- struct vma_info *vi, *tmpvi;
- struct mm_struct *mm;
- loff_t vaddr;
- int ret;
+ struct map_info *info;
+ int err = 0;
- mapping = uprobe->inode->i_mapping;
- INIT_LIST_HEAD(&try_list);
+ info = build_map_info(uprobe->inode->i_mapping,
+ uprobe->offset, is_register);
+ if (IS_ERR(info))
+ return PTR_ERR(info);
- ret = 0;
+ while (info) {
+ struct mm_struct *mm = info->mm;
+ struct vm_area_struct *vma;
- for (;;) {
- vi = find_next_vma_info(mapping, &try_list, uprobe->offset, is_register);
- if (!vi)
- break;
+ if (err)
+ goto free;
- if (IS_ERR(vi)) {
- ret = PTR_ERR(vi);
- break;
- }
+ down_write(&mm->mmap_sem);
+ vma = find_vma(mm, (unsigned long)info->vaddr);
+ if (!vma || !valid_vma(vma, is_register))
+ goto unlock;
- mm = vi->mm;
- down_read(&mm->mmap_sem);
- vma = find_vma(mm, (unsigned long)vi->vaddr);
- if (!vma || !valid_vma(vma, is_register)) {
- list_del(&vi->probe_list);
- kfree(vi);
- up_read(&mm->mmap_sem);
- mmput(mm);
- continue;
- }
- vaddr = vma_address(vma, uprobe->offset);
if (vma->vm_file->f_mapping->host != uprobe->inode ||
- vaddr != vi->vaddr) {
- list_del(&vi->probe_list);
- kfree(vi);
- up_read(&mm->mmap_sem);
- mmput(mm);
- continue;
- }
-
- if (is_register)
- ret = install_breakpoint(uprobe, mm, vma, vi->vaddr);
- else
- remove_breakpoint(uprobe, mm, vi->vaddr);
+ vma_address(vma, uprobe->offset) != info->vaddr)
+ goto unlock;
- up_read(&mm->mmap_sem);
- mmput(mm);
if (is_register) {
- if (ret && ret == -EEXIST)
- ret = 0;
- if (ret)
- break;
+ err = install_breakpoint(uprobe, mm, vma, info->vaddr);
+ /*
+ * We can race against uprobe_mmap(), see the
+ * comment near uprobe_hash().
+ */
+ if (err == -EEXIST)
+ err = 0;
+ } else {
+ remove_breakpoint(uprobe, mm, info->vaddr);
}
+ unlock:
+ up_write(&mm->mmap_sem);
+ free:
+ mmput(mm);
+ info = free_map_info(info);
}
- list_for_each_entry_safe(vi, tmpvi, &try_list, probe_list) {
- list_del(&vi->probe_list);
- kfree(vi);
- }
-
- return ret;
+ return err;
}
static int __uprobe_register(struct uprobe *uprobe)
@@ -945,6 +929,7 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
return ret;
}
+EXPORT_SYMBOL_GPL(uprobe_register);
/*
* uprobe_unregister - unregister a already registered probe.
@@ -976,6 +961,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
if (uprobe)
put_uprobe(uprobe);
}
+EXPORT_SYMBOL_GPL(uprobe_unregister);
/*
* Of all the nodes that correspond to the given inode, return the node
@@ -1048,7 +1034,7 @@ static void build_probe_list(struct inode *inode, struct list_head *head)
int uprobe_mmap(struct vm_area_struct *vma)
{
struct list_head tmp_list;
- struct uprobe *uprobe, *u;
+ struct uprobe *uprobe;
struct inode *inode;
int ret, count;
@@ -1066,12 +1052,9 @@ int uprobe_mmap(struct vm_area_struct *vma)
ret = 0;
count = 0;
- list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
- loff_t vaddr;
-
- list_del(&uprobe->pending_list);
+ list_for_each_entry(uprobe, &tmp_list, pending_list) {
if (!ret) {
- vaddr = vma_address(vma, uprobe->offset);
+ loff_t vaddr = vma_address(vma, uprobe->offset);
if (vaddr < vma->vm_start || vaddr >= vma->vm_end) {
put_uprobe(uprobe);
@@ -1079,8 +1062,10 @@ int uprobe_mmap(struct vm_area_struct *vma)
}
ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
-
- /* Ignore double add: */
+ /*
+ * We can race against uprobe_register(), see the
+ * comment near uprobe_hash().
+ */
if (ret == -EEXIST) {
ret = 0;
@@ -1115,7 +1100,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end)
{
struct list_head tmp_list;
- struct uprobe *uprobe, *u;
+ struct uprobe *uprobe;
struct inode *inode;
if (!atomic_read(&uprobe_events) || !valid_vma(vma, false))
@@ -1132,11 +1117,8 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
mutex_lock(uprobes_mmap_hash(inode));
build_probe_list(inode, &tmp_list);
- list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
- loff_t vaddr;
-
- list_del(&uprobe->pending_list);
- vaddr = vma_address(vma, uprobe->offset);
+ list_for_each_entry(uprobe, &tmp_list, pending_list) {
+ loff_t vaddr = vma_address(vma, uprobe->offset);
if (vaddr >= start && vaddr < end) {
/*
@@ -1378,9 +1360,6 @@ void uprobe_free_utask(struct task_struct *t)
{
struct uprobe_task *utask = t->utask;
- if (t->uprobe_srcu_id != -1)
- srcu_read_unlock_raw(&uprobes_srcu, t->uprobe_srcu_id);
-
if (!utask)
return;
@@ -1398,7 +1377,6 @@ void uprobe_free_utask(struct task_struct *t)
void uprobe_copy_process(struct task_struct *t)
{
t->utask = NULL;
- t->uprobe_srcu_id = -1;
}
/*
@@ -1417,7 +1395,6 @@ static struct uprobe_task *add_utask(void)
if (unlikely(!utask))
return NULL;
- utask->active_uprobe = NULL;
current->utask = utask;
return utask;
}
@@ -1479,41 +1456,64 @@ static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
return false;
}
+static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
+{
+ struct mm_struct *mm = current->mm;
+ struct uprobe *uprobe = NULL;
+ struct vm_area_struct *vma;
+
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, bp_vaddr);
+ if (vma && vma->vm_start <= bp_vaddr) {
+ if (valid_vma(vma, false)) {
+ struct inode *inode;
+ loff_t offset;
+
+ inode = vma->vm_file->f_mapping->host;
+ offset = bp_vaddr - vma->vm_start;
+ offset += (vma->vm_pgoff << PAGE_SHIFT);
+ uprobe = find_uprobe(inode, offset);
+ }
+
+ if (!uprobe)
+ *is_swbp = is_swbp_at_addr(mm, bp_vaddr);
+ } else {
+ *is_swbp = -EFAULT;
+ }
+ up_read(&mm->mmap_sem);
+
+ return uprobe;
+}
+
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
*/
static void handle_swbp(struct pt_regs *regs)
{
- struct vm_area_struct *vma;
struct uprobe_task *utask;
struct uprobe *uprobe;
- struct mm_struct *mm;
unsigned long bp_vaddr;
+ int uninitialized_var(is_swbp);
- uprobe = NULL;
bp_vaddr = uprobe_get_swbp_addr(regs);
- mm = current->mm;
- down_read(&mm->mmap_sem);
- vma = find_vma(mm, bp_vaddr);
-
- if (vma && vma->vm_start <= bp_vaddr && valid_vma(vma, false)) {
- struct inode *inode;
- loff_t offset;
-
- inode = vma->vm_file->f_mapping->host;
- offset = bp_vaddr - vma->vm_start;
- offset += (vma->vm_pgoff << PAGE_SHIFT);
- uprobe = find_uprobe(inode, offset);
- }
-
- srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id);
- current->uprobe_srcu_id = -1;
- up_read(&mm->mmap_sem);
+ uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {
- /* No matching uprobe; signal SIGTRAP. */
- send_sig(SIGTRAP, current, 0);
+ if (is_swbp > 0) {
+ /* No matching uprobe; signal SIGTRAP. */
+ send_sig(SIGTRAP, current, 0);
+ } else {
+ /*
+ * Either we raced with uprobe_unregister() or we can't
+ * access this memory. The latter is only possible if
+ * another thread plays with our ->mm. In both cases
+ * we can simply restart. If this vma was unmapped we
+ * can pretend this insn was not executed yet and get
+ * the (correct) SIGSEGV after restart.
+ */
+ instruction_pointer_set(regs, bp_vaddr);
+ }
return;
}
@@ -1620,7 +1620,6 @@ int uprobe_pre_sstep_notifier(struct pt_regs *regs)
utask->state = UTASK_BP_HIT;
set_thread_flag(TIF_UPROBE);
- current->uprobe_srcu_id = srcu_read_lock_raw(&uprobes_srcu);
return 1;
}
@@ -1655,7 +1654,6 @@ static int __init init_uprobes(void)
mutex_init(&uprobes_mutex[i]);
mutex_init(&uprobes_mmap_mutex[i]);
}
- init_srcu_struct(&uprobes_srcu);
return register_die_notifier(&uprobe_exception_nb);
}
11 years, 3 months