On Wed, Jul 16, 2014 at 12:41:39PM +0200, Vitezslav Samel wrote:
On Wed, Jul 16, 2014 at 09:32:51PM +1200, Nikola Pajkovsky wrote:
> Vitezslav Samel <vitezslav(a)samel.cz> writes:
>
> > On Sun, Jul 13, 2014 at 06:02:19PM +1200, Nikola Pajkovsky wrote:
> >> Vitezslav Samel <vitezslav(a)samel.cz> writes:
> >>
> >> > Vitezslav Samel (3):
> >> > introduce packet capturing abstraction
> >> > capt.c: add capturing using recvmmsg()
> >> > capt.c: add capturing using mmap()ed PACKET_RX_RING memory
> >>
> >> why do we using now recvmsg, recvmmsg and mmap? Which one is faster?
> >> Please elaborate deeply, because you don't have useful commit message.
> >
> > It's all about speed:
> >
> > - in recvmsg() case there are 2 syscalls per packet (poll() and
> > then recvmsg() gives us one packet);
> > - in recvmmsg() case there are 2 syscalls (poll() and then recvmmsg() can
> > give us more packets if available);
> > - in mmaped case there is syscall only in case there's no packet
(poll())
> >
> > On my workloads going from recvmsg() to mmap-style receiver lowers
> > number of dropped packets 100x (from ten thousands to hundreds). And
> > this is still with one thread.
> >
> > The packet capturing abstraction was chosen to have modular packet
> > receiving techniques: in our case recvmsg(), recvmmsg() and mmap.
> > recvmsg() is available (almost) always, recvmmsg() is available only
> > in linux-2.6.34+ and glibc-2.12+ and mmap-style receiver can be turned off
> > in the kernel. The capturing interface tries mmap-style first, then
> > tries recvmmsg() (if configured in) and recvmsg() is the slowest fallback.
>
> To be honest, it took me awhile to find out, what the patches all
> about. Now, I have better picture, that you want to implement *zero
> copy* for rx.
>
> linux/Documentation/networking/packet_mmap.txt
>
> Since commit 889b8f964f2f ("packet: Kill CONFIG_PACKET_MMAP."), kills
> CONFIG_PACKET_MMAP and have enabled struct packet_ring_buffer rx_ring
> and tx_ring by default, the recvmmsg() becames not that interesting. I'm not
Didn't know about that.
> saying, that we should not implement it, but I would rather go with mmap
> as default and recvmsg() as fallback. I haven't check if RHEL6 has
> CONFIG_PACKET_MMAP enabled, but it would not surprise me, if it has.
>
> Since we are doing because of speeding things up, we should avoid
> *trying* like in recvmsg() and then do fallback to continue like in
>
> [PATCH 2/3] capt.c: add capturing using recvmmsg()
>
> and rather *doing* it. The trying things will became huge bottleneck and
> waste of time.
Trying is done only at packet capturing initialization; then only
initialized capturing function is done without any trying in the fast
path.
> I'm still reading your code over and over and over. It's making more and
> more sense ;).
>
> One way of implementing it as via weak functions, where weak
> functions are recv*() and strong are mmaped/whatever (choose during
> build).
I don't think weak functions are the best solution for us. The choice
is done in compile time, but I want to make iptraf-ng versatile and to
make the choice at run time.
> Another option is, that we will build all of them (recvmsg(),
> recvmmsg(), mmap(), ...) and set one as default, which one can be
> override via cmd like --recv recvmsg/recvmmsg/mmap/... (or
> enable/disable during build via [NO_]MMAP=YesPlease)
Could be. The default one would be mmap and can be overriden to
something else. But ...
> Or have it like in linux kernel. Have module_init/module_exit and
> pkt_ops struct, which will hold pointer functions like you have, some
> config file with
>
> CONFIG_MMAP=y # for enabled
> # CONFIG_MMAP not set
>
> and build it according to config.
>
> I have been wondering how modules work in linux kernel and who the heck
> they can call static functions from modules via
> module_init/module_exit.
>
> So what do you think?
... I think my approach is best: when initializing try the best one,
then try others when the best one isn't available/buggy/... with
recvmsg() as sane fallback/default. Then in hot path just use what was
initialized.
What could be added is override from command line. Should I add it?
Ping?
Is there anything I should do to have this series included?
Cheers,
Vita