On Wed, Apr 09, 2014 at 03:46:22PM +0900, HATAYAMA Daisuke wrote:
Hi Hatayama,
So, I'll post a patch to support --split patch as a means of supporting parallelism. Do you agree with this direction?
--split is already an existing parameter which very clearly means that split a file into multiple files. I don't know how --split is implemented, but I am assuming it creates as many threads as there are split files and these threads will do filtering, compression and IO.
Usage of --split requires that we *have* to specify multiple files as output files.
How about creating a new option say --parallel-dump <nr-threads>. This option can take number of threads to create for filetering and compression and possibly for doing IO also (to single file).
--parallel-dump will take only single output file as argument. So dump is always saved in single file.
We can also make argument <nr-threads> optional. If user specifies number of threads, those many threads will be launched. Otherwise makedumpfile can detect how many cpus are online and launch those many threads.
For example.
# makedumpfile --parallel-dump 2 /proc/vmcore /var/crash/saved-vmcore
In this case 2 threads will be launched which will coordinate internally on filtering and compression and possibly on doing IO too.
# maedumpfile --parallel-dump /proc/vmcore /var/cras/saved-vmcore
In this case makedumpfile will determine how many cpus are online and launch as many threads (one thread for each cpu). This is close to "-j" option of "make" utility. Difference is that "-j" without option launches as many threads as possible.
In the patch, --split is no logner automatically inserted. User should specify it in core_collector explicitly, and then kdump script detects it and appends a multiple vmcore argumetns accordingly. The number of the multiple arguments is the number of cpus running in the 2nd kernel, i.e., the number specified in nr_cpus.
That is, if in /etc/kdump.conf core_collector is specified as
core_collector makedumpfile --split -l -d 31
and the number of online cpus on the 2nd kernel is more than 2, say 3 here, then, the following command is executed:
$ makedumpfile --split -l -d 31 /proc/vmcore vmcore-0-incomplete vmcore-1-incomplete vmcore-2-incomplete
I want to avoid modifying core_collector internally by script. I want to honor core_collector as specified in /etc/kdump.conf file. That way user knows exactly what core collector will be used and user can configure it accordingly.
So if user wants dump to be saved into multiple files, they need to explicitly edit /etc/kdump.conf and specify --split as well as name of split files.
Right now we don't seem to have a way to specify destination vmcore file name. If need be, we can possibly create a new option to specify file names and then --split option should work.
If makedumpfile --split is specified but the number of online cpus on the 2nd kernel is 1, then --split is removed as makedumpfile fails while warning that more vmcore arugments are needed.
Again, I don't want scripts to play with user specified core_collector. I want to use it as specified by user.
So if user wants --split dump, they need to configrue it that way.
Once parallel dump is implemented, I think we can possibly change default core collector to include --paralle-dump and that way it will automatically launch multiple threads if there are more than 1 cpu in second kernel booted.
core_collector makedumpfile -l --message-level 1 -d 31 --parallel-dump
Thanks Vivek