Nvidia Signal 11 error

Gerry Doris gdoris at rogers.com
Tue Jan 11 00:00:18 UTC 2005


On Sat, 8 Jan 2005, Nifty Hat Mitch wrote:

> On Wed, Jan 05, 2005 at 07:03:37PM +0200, Chadley Wilson wrote:
> 
> > I get quite an odd error with my nvidia mx440se
> ...
> > all my 3d apps run for a while then suddenly terminate, no errors
> > reported, logs are empty.  I have run the apps from a terminal, in
> > Quake3 Arena and celestia I get this when the apps terminate.
> > Received signal 11, exiting...
> 
> Signal 11 is simply:
> 
> 	 SIGSEGV      11       Core    Invalid memory reference
> 
> If the application runs for a while.... then terminates a couple
> things can be going on.
> 
> Hardware problem.
> 	 DRAM, VRAM, thermal run away.  Make sure that fans and heat
> 	 sinks are clean and functional.  Modern chips (CMOS) generate
> 	 most of their heat when gates change state.  Computational or
> 	 logic busy applications will heat up parts to failure.  Make
> 	 sure BIOS settings are sane and avoid overclocking.  The
> 	 symptom is that things run for a while then terminate.
> 	 AGP board, 2x, 4x, 8x... what is the BIOS permitting
> 	 what is being selected when the driver loads.
> 
> Library collision.
> 	 nVidia 3D libraries and Mesa libraries occupy the same name
> 	 space.  It is possible for lots of things to work with
> 	 incorrect libraries involved.  One hint is that the nVidia
> 	 installer makes noise that the installation has been modified
> 	 if you reinstall the driver+libs.  In my limited experience
> 	 nVidia has a library structure that can execute in hardware
> 	 or in software.  Simple things without race conditions or
> 	 side effects will run just fine.  When things get busy the
> 	 mixed-up libraries trip up.  The symptom is that things run
> 	 for a while then terminate.
> 
> Memory leak.
>          Applications and drivers can fail to reuse memory correctly
>          and can continue to allocate additional memory resources.  As
>          memory is exhausted bad things can happen, i.e. it will run
>          then fall over.  The symptom is that things run for a while
>          then terminate.
> 
> Asynchronous bug.
> 	 Some events including interrupts happen at odd times.  In
> 	 some code there are race conditions between the validation of
> 	 a memory block and it's use.  This can be the application or
> 	 the kernel (or both sort of).  If I recall quake, arena and
> 	 company do lots of texture mapping, and lots of texture map
> 	 data transfer, with asynchronous signaling (hardware and
> 	 software), increased heat of chips and more.  Some of these
> 	 might be more common on multi-processor systems.  The symptom
> 	 is that things run for a while then terminate.
> 
> Kernel bugs:
>          kernel-2.6.9-1.11_FC2.i686.rpm contains this comment. 
>            * Thu Dec 16 2004
>            - Better version of the PCI Posting fixes for agpgart.
>            - Add missing cache flush to the AGP code.
>          So try different kernel versions and tell us which you
>          are using.  Always update and  test the latest rpm.  
> 
> Can you check for memory leaking?  Can you compile or run under a
> debugger? Can you run with debugging libraries (symbols).  Can you
> enable a core dump and report the stack trace?

I used to get these all the time until I stopped using the 3d screen 
savers.  Once I switched to a blank screen I never saw them again.

-- 
Gerry

"The lyfe so short, the craft so long to learne"  Chaucer




More information about the users mailing list