[RFC][PATCH] Add --split support for dump on filesystem

Thu Mar 27 13:18:38 UTC 2014

On Thu, Mar 27, 2014 at 08:04:55AM +0100, HATAYAMA Daisuke wrote:
> From: Vivek Goyal <vgoyal at redhat.com>
> Subject: Re: [RFC][PATCH] Add --split support for dump on filesystem
> Date: Wed, 26 Mar 2014 14:05:07 -0400
> 
> > On Tue, Mar 25, 2014 at 08:08:48PM +0900, HATAYAMA, Daisuke wrote:
> >> Hello,
> >> 
> >> This is an RFC patch intended to first review basic design of --split option support.
> >> 
> >> This version automatically appends --split option if more than 1 cpu is available on kdump 2nd kernel. I guess someone propably doesn't like the situation that multiple vmcores are generated implicitly without any explicit user operation. So, I'd like comments on this design first.
> > 
> > Hi Hatayama,
> > 
> > Can you give some more details about how --split feature of makedumpfile
> > works. I have never used it. Why should I split the file into multiple
> > files? And how to get back original single file.
> > 
> 
> crash utility supports vmcores splitted by makedumpfile --split. The
> syntax is:
> 
> $ crash vmlinux vmcore-0 vmcore-1 ... vmcore-{N-1}

What's the advantage of that. I would rather have a way to take all
these fragments and come up with a single file and pass that file to
crash.

Anyway, saving multiple files and managing those files is not very
convinient.

> 
> > Also, I don't think we should be adding --split automatically. I want to
> > stick to user specified core collector and options and not add things
> > silently.
> > 
> > If user wants to take advantage of parallelism, they need to modify
> > nr_cpus and they need to modify core_collector line also and we should
> > document it properly.
> > 
> 
> The problem is that we now don't have a way to specify the number of
> parallelism in core_collector since we specify it in --split as the
> number of vmcore arguments.

We can do two things.

- We can check of number of cpus available in second kernel in
  makedumpfile and makedumpfile can fork off threads accordingly.

- Or we can create a new commandline arguments which specifies how
  many threads to fork off for compression. A user who will be modifying
  nr_cpus, can also modify this command line parameter.

I think we can in fact have both. First will be the default behavior which
can be overridden with an command line option.

> 
> How about this? We do parallel processing if
> 
> - in core_collector makedumpfile is specified with --split option, and
> - nr_cpus is larger than 1.

Instead of checking for nr_cpus, just look into /sys or somewhere else and
see how many processors are online. And fork off those many threads
accordingly.

> 
> i.e., if --split is specified explicitly, we think user intend to do
> parallel processing.
> 
> I'll post a documentation after design is fixed.

--split implies that we are saving to multiple files. And it might also
mean that do parallel processing.

So instead of relying on --split, we could probably create a new command
line parameter. We want to do parallel processing but when it comes to
saving vmcore we still want to write to single vmcore file. Parallel
processing can help with faster filtering and faster compression of pages.

> 
> > Also can't we take advatage of parallelism for compression and while
> > writing compressed data write it to a single file. That way no special
> > configuration will be required and makedumpfile should be able to fork
> > as many threads as number of cpus, do the compression and write the
> > output to a single file.
> > 
> 
> First, at least, current makedumpfile cannot do it. To do it, we need
> to use pthread; rigorously, it's not but doing it by fork() is
> harmful.
> 
> Historically, the reason why makedumpfile chose --split was that at
> that to avoid increasing initramfs by containing libc.so. (But now
> this is no longer a problem since we often include the commands that
> link libc.so in initramfs such as scp.)

Yep, now libc is part of initramfs so we should be able to use pthreads.

> 
> Also, splitting dump into multiple vmcores has another merit that it's
> possible to parallelize even I/O into multiple disks. This is necessay
> when we strongly need full dump.

I can understand need of --split in some cases. But that will be useful
only in select corner cases. 

If we enable writing to single file with multiple threads doing filtering
and compression, this is going to be more useful, I think,

> 
> So, doing it is possible. It's easier to do by pthread. I assume the
> logic that multiple threads write compressed data into the same buffer
> and the thread that detects the buffer is full, flushes the
> buffer.

This sounds reasonable.

> But makedumpfile now doesn't have the feature, we need to
> newly implement it.

I agree. This looks like a new feature. It would be great to have it
though on large memory machines.

Thanks
Vivek