On 21 February 2014 03:42, Flavio Leitner <fbl@redhat.com> wrote:
>
> On Thu, Feb 20, 2014 at 09:41:57PM -0300, Flavio Leitner wrote:
> > On Thu, Feb 20, 2014 at 09:27:17PM -0300, Flavio Leitner wrote:
> > > On Thu, Feb 20, 2014 at 09:11:00PM -0300, Flavio Leitner wrote:
> > > > On Thu, Feb 20, 2014 at 04:23:21PM +0100, Jonas Johansson wrote:
> > > > > Hi,
> > > > >
> > > > > I have configured teamd witch the lacp runner. When a port has state
> > > > > disabled, all LACPDU frames are received in lacpdu_recv(), but when LACP
> > > > > has reached an agreement (state "current") the port will become enabled and
> > > > > LACPDU frames are no longer received by the port. After the LACP timeout
> > > > > the state will be changed to "expired" and the port will be disabled and
> > > > > LACPDUs can be received again.
> > > > >
> > > > > In the kernel driver, team.c, the team_handle_frame() will return
> > > > > RX_HANDLER_EXACT when a port is disabled and RX_HANDLER_ANOTHER for an
> > > > > enabled LACP port. This means that the a enabled port will divert all
> > > > > traffic to the team device, which teamd (using lacp) isn't listening to.
> > > > >
> > > > > I made a kernel patch which seems to work. Thoughts?
> > > >
> > > > Interesting, I can't make it work regardless of the patch.
> > > > My switch reports only one port as part of the trunk.
> > >
> > > The port that is failing shows the LACPDU with states
> > > Active, Aggregation only.
> >
> >         Actor Information TLV (0x01), length 20
> >           System 00:10:18:38:0d:dc (oui Unknown), System Priority 65535,
> > Key 0, Port 3, Port Priority 255
> >           State Flags [Activity, Aggregation, Synchronization,
> > Collecting, Distributing]
> >         Partner Information TLV (0x02), length 20
> >           System 30:46:9a:10:b9:1a (oui Unknown), System Priority 32768,
> > Key 14, Port 5, Port Priority 128
> >           State Flags [Activity, Aggregation]
>
> Jonas, I am running upstream kernel and upstream teamd. It is stable
> with both ports in 'current' state.  If I attach tcpdump on them, I can
> see the LACPDUs. If I attach gdb to teamd, I can see them on all ports.
>
> At the __netif_receive_skb_core(), the original receiving device is
> saved in orig_dev, then a device handler is called which works as you
> described. So, if the device is disabled, it's an exact match and the
> code works as you said.  However, when the port is enabled, it returns
> RX_HANDLER_ANOTHER with skb->dev updated to be the master device.
> Therefore, another round takes place and there will be no rx_handler
> this time.  So, it goes down to:
>
> 3621         /* deliver only exact match when indicated */
> 3622         null_or_dev = deliver_exact ? skb->dev : NULL;
> 3623
> 3624         type = skb->protocol;
> 3625         list_for_each_entry_rcu(ptype,
> 3626                         &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
> 3627                 if (ptype->type == type &&
> 3628                     (ptype->dev == null_or_dev || ptype->dev == skb->dev ||
> 3629                      ptype->dev == orig_dev)) {
> 3630                         if (pt_prev)
> 3631                                 ret = deliver_skb(skb, pt_prev, orig_dev);
> 3632                         pt_prev = ptype;
> 3633                 }
> 3634         }
>
> Notice on line 3629 comparing ptype->dev with orig_dev. Here is where the
> LACPDU packet is delivered to team sockets.
>
> But I can't explain why my second port doesn't move to DISTRIBUTING.
> The interesting thing is that unplugging the working port, fixes the
> second one.
>
> fbl

Thanks a lot for the help.
I've been running on a kernel with some patches, and its seems like the orig_dev in my setup isn't set properly. I need to fix this and probably all will work as expected.
Thanks again.

/Jonas