Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits, carrier is set to down in kernel. lacp and loadbalance runners both use same kernel mode "loadbalance", so starting loadbalance runner does not change kernel mode. Kernel still has "user_carrier_enabled" and team interface stays down due to NO_CARRIER.
It seems the only way to clear kernel "user_carrier_enabled" is to change the mode. Should there be a new option that teamd can use to tell kernel to clear "user_carrier_enabled" when lacp runner terminates? Or any other suggestion?
Thanks.
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
carrier is set to down in kernel. lacp and loadbalance runners both use same kernel mode "loadbalance", so starting loadbalance runner does not change kernel mode. Kernel still has "user_carrier_enabled" and team interface stays down due to NO_CARRIER.
Hmm. I think that the correct solution is to teach the runners not to depend on kernel defaults. So in this care the loadbalance runner would set user_carrier_enabled to false during init (not during takeover)
It seems the only way to clear kernel "user_carrier_enabled" is to change the mode. Should there be a new option that teamd can use to tell kernel to clear "user_carrier_enabled" when lacp runner terminates? Or any other suggestion?
Thanks. _______________________________________________ libteam mailing list -- libteam@lists.fedorahosted.org To unsubscribe send an email to libteam-leave@lists.fedorahosted.org
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
1. start off in LACP mode - NO-CARRIER expected as no LACP on other side. root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
2. change to balanced mode - still NO-CARRIER root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"loadbalance"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
3. have to switch to different kernel mode first to clear NO-CARRIER root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"roundrobin"}}' This program is not intended to be run as root. root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"loadbalance"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
This is with debian9, teamd version 1.26-1+b1, kernel 4.9.13-1
carrier is set to down in kernel. lacp and loadbalance runners both use same kernel mode "loadbalance", so starting loadbalance runner does not change kernel mode. Kernel still has "user_carrier_enabled" and team interface stays down due to NO_CARRIER.
Hmm. I think that the correct solution is to teach the runners not to depend on kernel defaults. So in this care the loadbalance runner would set user_carrier_enabled to false during init (not during takeover)
ok, thanks.
It seems the only way to clear kernel "user_carrier_enabled" is to change the mode. Should there be a new option that teamd can use to tell kernel to clear "user_carrier_enabled" when lacp runner terminates? Or any other suggestion?
Thanks. _______________________________________________ libteam mailing list -- libteam@lists.fedorahosted.org To unsubscribe send an email to libteam-leave@lists.fedorahosted.org
Fri, Mar 31, 2017 at 01:41:08PM CEST, gwilkie@brocade.com wrote:
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
- start off in LACP mode - NO-CARRIER expected as no LACP on other side.
root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- change to balanced mode - still NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over
I think that you might be missing the reason of existence of take over. It suppose to "take over" the kernel team instance in case for example the teamd segfaults or something. The intension was to run new teamd instance taking over the kernel instance with exact same config.
Why are you doing this?
--config='{"runner":{"name":"loadbalance"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- have to switch to different kernel mode first to clear NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"roundrobin"}}' This program is not intended to be run as root. root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"loadbalance"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
This is with debian9, teamd version 1.26-1+b1, kernel 4.9.13-1
carrier is set to down in kernel. lacp and loadbalance runners both use same kernel mode "loadbalance", so starting loadbalance runner does not change kernel mode. Kernel still has "user_carrier_enabled" and team interface stays down due to NO_CARRIER.
Hmm. I think that the correct solution is to teach the runners not to depend on kernel defaults. So in this care the loadbalance runner would set user_carrier_enabled to false during init (not during takeover)
ok, thanks.
It seems the only way to clear kernel "user_carrier_enabled" is to change the mode. Should there be a new option that teamd can use to tell kernel to clear "user_carrier_enabled" when lacp runner terminates? Or any other suggestion?
Thanks. _______________________________________________ libteam mailing list -- libteam@lists.fedorahosted.org To unsubscribe send an email to libteam-leave@lists.fedorahosted.org
On 03/31/2017 04:01 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 01:41:08PM CEST, gwilkie@brocade.com wrote:
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
- start off in LACP mode - NO-CARRIER expected as no LACP on other side.
root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- change to balanced mode - still NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over
I think that you might be missing the reason of existence of take over. It suppose to "take over" the kernel team instance in case for example the teamd segfaults or something. The intension was to run new teamd instance taking over the kernel instance with exact same config.
Why are you doing this?
Want to change the teamd config without losing the interface config (addresses, vlans etc) - want to avoid deleting the team device when updating team config.
Cheers.
Fri, Mar 31, 2017 at 06:01:46PM CEST, gwilkie@Brocade.com wrote:
On 03/31/2017 04:01 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 01:41:08PM CEST, gwilkie@brocade.com wrote:
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
- start off in LACP mode - NO-CARRIER expected as no LACP on other side.
root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- change to balanced mode - still NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over
I think that you might be missing the reason of existence of take over. It suppose to "take over" the kernel team instance in case for example the teamd segfaults or something. The intension was to run new teamd instance taking over the kernel instance with exact same config.
Why are you doing this?
Want to change the teamd config without losing the interface config (addresses, vlans etc) - want to avoid deleting the team device when updating team config.
Is it like a regular runtime thing for you? I would expect that you just do the configuration once and you use the same one forever.
On 04/03/2017 02:14 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 06:01:46PM CEST, gwilkie@Brocade.com wrote:
On 03/31/2017 04:01 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 01:41:08PM CEST, gwilkie@brocade.com wrote:
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote:
Hi, like to get your thoughts on this issue we are running into.
Initially, running the lacp runner in teamd. This sends carrier changes to the kernel. Kernel sets "user_carrier_enabled" which blocks its normal carrier checks.
Then change teamd runner from lacp to loadbalance. When lacp runner exits,
Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
- start off in LACP mode - NO-CARRIER expected as no LACP on other side.
root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- change to balanced mode - still NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over
I think that you might be missing the reason of existence of take over. It suppose to "take over" the kernel team instance in case for example the teamd segfaults or something. The intension was to run new teamd instance taking over the kernel instance with exact same config.
Why are you doing this?
Want to change the teamd config without losing the interface config (addresses, vlans etc) - want to avoid deleting the team device when updating team config.
Is it like a regular runtime thing for you? I would expect that you just do the configuration once and you use the same one forever.
I don't think I could say it would be a regular thing done in production - more like something you do during setup or troubleshooting.
Do you think this will get addressed or is it something we will have to live with?
Thx.
Mon, Apr 03, 2017 at 07:07:31PM CEST, gwilkie@brocade.com wrote:
On 04/03/2017 02:14 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 06:01:46PM CEST, gwilkie@Brocade.com wrote:
On 03/31/2017 04:01 PM, Jiri Pirko wrote:
Fri, Mar 31, 2017 at 01:41:08PM CEST, gwilkie@brocade.com wrote:
On 03/30/2017 04:18 PM, Jiri Pirko wrote:
Tue, Mar 28, 2017 at 06:51:09PM CEST, gwilkie@brocade.com wrote: > Hi, like to get your thoughts on this issue we are running into. > > Initially, running the lacp runner in teamd. This sends carrier changes to > the kernel. Kernel sets "user_carrier_enabled" which blocks its normal > carrier checks. > > Then change teamd runner from lacp to loadbalance. When lacp runner exits, Hmm. What exactly are you doing, could you send a list of commands? I suspect some oddities :)
- start off in LACP mode - NO-CARRIER expected as no LACP on other side.
root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over --config='{"runner":{"name":"lacp"}}' This program is not intended to be run as root. root@debian9:~# ip link set dev ens9 master team0 root@debian9:~# ip link set dev ens10 master team0 root@debian9:~# ip link set dev team0 up root@debian9:~# ip link show team0 8: team0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:12:a5 brd ff:ff:ff:ff:ff:ff root@debian9:~#
- change to balanced mode - still NO-CARRIER
root@debian9:~# ip link set dev team0 down root@debian9:~# ip link set dev ens9 nomaster root@debian9:~# ip link set dev ens10 nomaster root@debian9:~# teamd --team-dev team0 --kill root@debian9:~# teamd --team-dev team0 --daemon --no-quit-destroy --take-over
I think that you might be missing the reason of existence of take over. It suppose to "take over" the kernel team instance in case for example the teamd segfaults or something. The intension was to run new teamd instance taking over the kernel instance with exact same config.
Why are you doing this?
Want to change the teamd config without losing the interface config (addresses, vlans etc) - want to avoid deleting the team device when updating team config.
Is it like a regular runtime thing for you? I would expect that you just do the configuration once and you use the same one forever.
I don't think I could say it would be a regular thing done in production - more like something you do during setup or troubleshooting.
I don't understand why is it a problem to re-set to iface again in this case.
Do you think this will get addressed or is it something we will have to live with?
Currently I don't really see reason for changing the behaviour. If anything, I would add a check in takeover for the team driver instance options. So that would just fail in case you would try to do takeover with different setup.
libteam@lists.fedorahosted.org