RawhideKernelNodebug - Boot

poma pomidorabelisima at gmail.com
Thu Feb 6 06:56:09 UTC 2014


On 06.02.2014 02:04, Tang Chen wrote:
> On 02/05/2014 07:51 PM, poma wrote:
>> On 05.02.2014 06:46, poma wrote:
>>> On 31.01.2014 11:47, poma wrote:
>>>
>>>> This is what I thought,
>>>>
>>>> A video mode, boot loader parameter i.e. "vga=" expressed in a decimal
>>>> notation e.g. 773 is broken - the boot is stuck, however one expressed
>>>> in a hexadecimal notation (starting with "0x") e.g. 0x305, no problemos
>>>> at all.
>>>> To be more precise, what is interesting in this story, is an erratic
>>>> occurrence of this problem,
>>>> e.g.
>>>> - Rawhide's 3.14.0-0.rc0.git15.1.fc21.x86_64 - a decimal notation
>>>> *sometimes* works
>>>> - RawhideKernelNodebug's 3.14.0-0.rc0.git15.2.fc21.x86_64 - a decimal
>>>> notation *never* works
>>>> - Rawhide's 3.14.0-0.rc0.git17.1.fc21.x86_64 - a decimal notation
>>>> *sometimes* works
>>>
>>> To complement and finish.
>>>
>>> "The kernel hang/crash/stuck/whatnot during boot" is resolved with these
>>> two patches:
>>> - [PATCH 1/2] numa, mem-hotplug: Initialize numa_kernel_nodes in
>>> numa_clear_kernel_node_hotplug().
>>>    http://marc.info/?l=linux-kernel&m=139089978307936&q=raw
>>> - [PATCH 2/2] numa, mem-hotplug: Fix array index overflow when
>>> synchronizing nid to memblock.reserved.
>>>    http://marc.info/?l=linux-kernel&m=139089984707944&q=raw
>>>
>>> Tested via:
>>> - kernel-3.14.0-0.rc1.git0.2.fc21.x86_64.rpm
>>>    http://koji.fedoraproject.org/koji/buildinfo?buildID=496017
>>>    "Add NUMA oops patches"
>>>    http://pkgs.fedoraproject.org/cgit/kernel.git/plain/tang-numa-1.patch
>>>    http://pkgs.fedoraproject.org/cgit/kernel.git/plain/tang-numa-2.patch
>>> - my nodebug version,
>>>    kernel-3.14.0-0.rc1.git0.4.fc21.x86_64.rpm
>>>
>>>    Both are PASSED.
>>>
>>> Affected Fedora's kernels:
>>> - kernel-3.14.0-0.rc1.git0.1.fc21
>>> - kernel-3.14.0-0.rc0.git19.1.fc21
>>> - kernel-3.14.0-0.rc0.git18.1.fc21
>>> - kernel-3.14.0-0.rc0.git17.1.fc21
>>>    http://koji.fedoraproject.org/koji/packageinfo?packageID=8
>>>
>>> Badly affected was this card:
>>> Chipset: G98 (NV98)
>>> Family : NV50
>>> NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] [10de:06e4] (rev a1)
>>> Nouveau
>>>
>>> While this one worked without problemos: !?
>>> Chipset: MCP79/MCP7A (NVAC)
>>> Family : NV50
>>> NVIDIA Corporation ION VGA [10de:087d] (rev b1)
>>> Nouveau
>>>
>>> Another notes.
>>> After the affected kernel hang at the very beginning of the
>>> loading/booting due to the mentioned application of the vesa mode, it
>>> was necessary to power off machine, otherwise correct kernels would
>>> become affected.
>>> Just to reset machine wasn't enough.
>>> At the end, even a hexadecimal notation became broken. :)
>>>
>>>
>>> Ahoy folks!
>>> poma
>>>
>>>
>>> More about this can be found here:
>>> - [PATCH] numa, mem-hotplug: Fix stack overflow in numa when seting
>>> kernel nodes to unhotpluggable.
>>>    http://thread.gmane.org/gmane.linux.kernel/1634270
>>> - [PATCH 0/2] numa, mem-hotplug: Fix array out of boundary in numa
>>> initialization.
>>>    http://thread.gmane.org/gmane.linux.kernel/1636585
>>>
>>>
>>> The TOG:
>>>
>>>    +---------------------------------------------------------+
>>>    |  color  |     800x600   |    1024x768   |    1280x1024  |
>>>    |    :    |---------------+---------------+---------------|
>>>    |  depth  |   hex  : dec  |   hex  : dec  |   hex  : dec  |
>>>    |---------+--------:------+--------:------+--------:------|
>>>    |   256   |  0x303 : 771  |  0x305 : 773  |  0x307 : 775  |
>>>    |   64k   |  0x314 : 788  |  0x317 : 791  |  0x31A : 794  |
>>>    |   16M   |  0x315 : 789  |  0x318 : 792  |  0x31B : 795  |
>>>    +---------------------------------------------------------+
>>>
>>
>> I think this is also interesting to note, referring to Dave's original
>> report,
>> - hang on early boot, numa init code stack overflow?
>>    http://thread.gmane.org/gmane.linux.kernel/1633909
> 
> Hi Poma,

Ahoy!

> I'm sorry I didn't quite follow the original email. The above "stack 
> overflow"
> problem has been fixed by the following patches, I think.
> 
> - [PATCH 1/2] numa, mem-hotplug: Initialize numa_kernel_nodes in
> numa_clear_kernel_node_hotplug().
>    http://marc.info/?l=linux-kernel&m=139089978307936&q=raw
> - [PATCH 2/2] numa, mem-hotplug: Fix array index overflow when
> synchronizing nid to memblock.reserved.
>    http://marc.info/?l=linux-kernel&m=139089984707944&q=raw

In my case it's true.

> Actually it was not stack overflow, but array index over boundary.

OK.

>> if applied, 'earlyprintk=vga[,keep]', the affectness of the affected
>> kernels is gone with the wind. :)
> 
> I'm not quite familiar with this boot option. Are you saying with this boot
> option, the "stack overflow" like problem will go away ? Although it was 
> not
> stack overflow.

Exactly.
Without your two patches,
i.e.
/boot/extlinux/extlinux.conf
…
  append … == OK
…
  append … vga=773 == BROKEN
…
  append … vga=773 earlyprintk=vga == OK
…
  append … vga=773 earlyprintk=vga,keep == OK
…

With your two patches, everything is OK.

https://www.kernel.org/doc/Documentation/kernel-parameters.txt
earlyprintk=	[X86, …]
		earlyprintk=vga
…
		earlyprintk is useful when the kernel crashes before
		the normal console is initialized. It is not enabled by
		default because it has some cosmetic problems.

		Append ",keep" to not disable it when the real console
		takes over.
…


Actual breakage was introduced within this patch,
i.e. a series of patches:
http://pkgs.fedoraproject.org/repo/pkgs/kernel/patch-3.13-git17.xz/5c23515f7271ef9eb03c242a9305e2b7/patch-3.13-git17.xz

>> http://www.youtube.com/embed/l-ceg763Voc?rel=0&border=0&autoplay=1
> 
> And sorry, what is this ?

Bugs Bunny! :)
What's up, doc?


poma



More information about the kernel mailing list