After updating my F25 installation (which included installing kernel-10-11-12-200) I can no longer boot X (w/ startx). rpm -qa | fgrep kmod-nvidia-4.11.12 returns
kmod-nvidia-4.11.12-200.fc25.x86_64-375.66-3.fc25.x86_64
so the kernel module gets built and installed. But I see
Aug 2 21:41:11 pons kernel: nvidia: Unknown symbol mcount (err 0)
in /var/log/messages which I've not seen before. I removed the module with dnf and rebuilt it with akmods (which again produced the "unknown symbol" message, and X still won't boot. Is anyone else having this problem? Suggestions for a fix?
-Sherman
On Wed, Aug 2, 2017 at 11:59 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
After updating my F25 installation (which included installing kernel-10-11-12-200) I can no longer boot X (w/ startx). rpm -qa | fgrep kmod-nvidia-4.11.12 returns
kmod-nvidia-4.11.12-200.fc25.x86_64-375.66-3.fc25.x86_64
so the kernel module gets built and installed. But I see
Aug 2 21:41:11 pons kernel: nvidia: Unknown symbol mcount (err 0)
in /var/log/messages which I've not seen before. I removed the module with dnf and rebuilt it with akmods (which again produced the "unknown symbol" message, and X still won't boot. Is anyone else having this problem? Suggestions for a fix?
Are you using akmods?
This typically occurs due to an API change in the kernel happens from time to time and it may take nvidia a little while to fix.
Try booting the previous kernel and see if that works.
On 08/03/2017 05:22 AM, Richard Shaw wrote:
On Wed, Aug 2, 2017 at 11:59 PM, Sherman Grunewagen <sugarwagon@gmx.com mailto:sugarwagon@gmx.com> wrote:
After updating my F25 installation (which included installing kernel-10-11-12-200) I can no longer boot X (w/ startx). rpm -qa | fgrep kmod-nvidia-4.11.12 returns kmod-nvidia-4.11.12-200.fc25.x86_64-375.66-3.fc25.x86_64 so the kernel module gets built and installed. But I see Aug 2 21:41:11 pons kernel: nvidia: Unknown symbol mcount (err 0) in /var/log/messages which I've not seen before. I removed the module with dnf and rebuilt it with akmods (which again produced the "unknown symbol" message, and X still won't boot. Is anyone else having this problem? Suggestions for a fix?
Are you using akmods?
Yes. I followed the rpmfusion framework instructions when I installed F25 and all has worked till now.
This typically occurs due to an API change in the kernel happens from time to time and it may take nvidia a little while to fix.
Should I expect everyone else using the nvidia / rpmfusion to be having the same problem? If so, I'm surprised to see no complaints other than mine. I searched the web but there was nothing recent with the "unknown symbol (err 0)" error.
Try booting the previous kernel and see if that works.
It does. I'm running on it now. If the API has changed, shouldn't I expect all later kernels (including F26) to cause the nvidia driver to fail to load? Also, I wonder why it built successfully. (Not questioning you so much as trying to understand.)
Thanks. -Sherman
On Thu, Aug 3, 2017 at 10:00 AM, Sherman Grunewagen sugarwagon@gmx.com wrote:
On 08/03/2017 05:22 AM, Richard Shaw wrote:
On Wed, Aug 2, 2017 at 11:59 PM, Sherman Grunewagen <sugarwagon@gmx.com mailto:sugarwagon@gmx.com> wrote:
After updating my F25 installation (which included installing
kernel-10-11-12-200) I can no longer boot X (w/ startx). rpm -qa | fgrep kmod-nvidia-4.11.12 returns
kmod-nvidia-4.11.12-200.fc25.x86_64-375.66-3.fc25.x86_64 so the kernel module gets built and installed. But I see Aug 2 21:41:11 pons kernel: nvidia: Unknown symbol mcount (err 0) in /var/log/messages which I've not seen before. I removed the
module with dnf and rebuilt it with akmods (which again produced the "unknown symbol" message, and X still won't boot. Is anyone else having this problem? Suggestions for a fix?
Are you using akmods?
Yes. I followed the rpmfusion framework instructions when I installed F25 and all has worked till now.
This typically occurs due to an API change in the kernel happens from time
to time and it may take nvidia a little while to fix.
Should I expect everyone else using the nvidia / rpmfusion to be having the same problem? If so, I'm surprised to see no complaints other than mine. I searched the web but there was nothing recent with the "unknown symbol (err 0)" error.
Well, I mention that because that's the most typical reason the akmod build fails but it looks like yours succeeded. I doubt this is the reason but a google search showed the same error when a kernel module was compiled with a different version of GCC than the kernel itself but that seems unlikely...
Try booting the previous kernel and see if that works.
It does. I'm running on it now. If the API has changed, shouldn't I expect all later kernels (including F26) to cause the nvidia driver to fail to load? Also, I wonder why it built successfully. (Not questioning you so much as trying to understand.)
To check to see if it was a transient error, you could try removing the kmod-nvidia package from /var/cache/akmods/nvidia/ and rerunning akmods.
I was never able to figure out the problem but there was a period where the akmods run would complete but the installed package wasn't quite right and rebuilding the kmod fixed it...
Thanks, Richard
On 08/03/2017 08:33 AM, Richard Shaw wrote:
On Thu, Aug 3, 2017 at 10:00 AM, Sherman Grunewagen <sugarwagon@gmx.com mailto:sugarwagon@gmx.com> wrote:
On 08/03/2017 05:22 AM, Richard Shaw wrote: On Wed, Aug 2, 2017 at 11:59 PM, Sherman Grunewagen <sugarwagon@gmx.com <mailto:sugarwagon@gmx.com> <mailto:sugarwagon@gmx.com <mailto:sugarwagon@gmx.com>>> wrote: After updating my F25 installation (which included installing kernel-10-11-12-200) I can no longer boot X (w/ startx). rpm -qa | fgrep kmod-nvidia-4.11.12 returns kmod-nvidia-4.11.12-200.fc25.x86_64-375.66-3.fc25.x86_64 so the kernel module gets built and installed. But I see Aug 2 21:41:11 pons kernel: nvidia: Unknown symbol mcount (err 0) in /var/log/messages which I've not seen before. I removed the module with dnf and rebuilt it with akmods (which again produced the "unknown symbol" message, and X still won't boot. Is anyone else having this problem? Suggestions for a fix? Are you using akmods? Yes. I followed the rpmfusion framework instructions when I installed F25 and all has worked till now. This typically occurs due to an API change in the kernel happens from time to time and it may take nvidia a little while to fix. Should I expect everyone else using the nvidia / rpmfusion to be having the same problem? If so, I'm surprised to see no complaints other than mine. I searched the web but there was nothing recent with the "unknown symbol (err 0)" error.
Well, I mention that because that's the most typical reason the akmod build fails but it looks like yours succeeded. I doubt this is the reason but a google search showed the same error when a kernel module was compiled with a different version of GCC than the kernel itself but that seems unlikely...
Try booting the previous kernel and see if that works. It does. I'm running on it now. If the API has changed, shouldn't I expect all later kernels (including F26) to cause the nvidia driver to fail to load? Also, I wonder why it built successfully. (Not questioning you so much as trying to understand.)
To check to see if it was a transient error, you could try removing the kmod-nvidia package from /var/cache/akmods/nvidia/ and rerunning akmods.
I was never able to figure out the problem but there was a period where the akmods run would complete but the installed package wasn't quite right and rebuilding the kmod fixed it...
Thanks for your continuing help, Richard!
Is what I did at the start (see OP) equivalent? I used dnf to remove kmod-nvidia and then rebuilt by calling akmods from the commandline. Anyway I'll try what you suggest and report back in a sec ... tick, tick.
Ok. Removing the akmod-nvidia with dnf _does_not_ remove the rpm and log in /var/cache/akmods/nvidia/. So I did the "dnf erase", removed the entries for the 10.11.12 kernel version in /var/cache/akmods/nvidia/, and rebuilt with "akmods". This appears at the bottome of the log after a "successful" build:
Installing: kmod-nvidia-4.11.12-200.fc25.x86_64 x86_64 2:375.66-3.fc25 @commandline 6.1 M
Transaction Summary ================================================================================ Install 1 Package
Total size: 6.1 M Installed size: 18 M Downloading Packages: Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Installing : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 depmod: WARNING: /lib/modules/4.11.12-200.fc25.x86_64/extra/nvidia/nvidia-modeset.ko needs unknown symbol mcount depmod: WARNING: /lib/modules/4.11.12-200.fc25.x86_64/extra/nvidia/nvidia-drm.ko needs unknown symbol mcount depmod: WARNING: /lib/modules/4.11.12-200.fc25.x86_64/extra/nvidia/nvidia-uvm.ko needs unknown symbol mcount depmod: WARNING: /lib/modules/4.11.12-200.fc25.x86_64/extra/nvidia/nvidia.ko needs unknown symbol mcount Verifying : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1
Installed: kmod-nvidia-4.11.12-200.fc25.x86_64.x86_64 2:375.66-3.fc25
Complete! 2017/08/03 09:13:21 akmods: Successful. ---------
And, of course, X won't boot. Looking for errors in the build log, I found this mystery:
2017/08/03 09:13:14 akmodsbuild: cc1: error: /usr/local/include: Permission denied 2017/08/03 09:13:14 akmodsbuild: cc1: error: /usr/local/include: Permission denied 2017/08/03 09:13:14 akmodsbuild: ./scripts/gcc-version.sh: line 31: printf: #: invalid number 2017/08/03 09:13:14 akmodsbuild: ./scripts/gcc-version.sh: line 31: printf: #: invalid number 2017/08/03 09:13:14 akmodsbuild: /bin/sh: line 0: [: too many arguments
Except for these lines, this log and those for the other kmod-nvidia builds for earlier kernels look virtually identical.
I can't fathom how Permission would be denied. I ran the akmods as root!
-Sherman
Ok, for the same nvidia module and kernel version for me...
Dependencies resolved. ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: kmod-nvidia-4.11.12-200.fc25.x86_64 x86_64 2:375.66-3.fc25 @commandline 6.1 M
Transaction Summary ================================================================================ Install 1 Package
Total size: 6.1 M Installed size: 18 M Downloading Packages: Running transaction check Waiting for process with pid 13941 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Installing : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 Verifying : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1
Installed: kmod-nvidia-4.11.12-200.fc25.x86_64.x86_64 2:375.66-3.fc25
No errors during install of the package or during the build. There must be something funky in your environment...
Not that anything useful/needed should be in /usr/local/include but what are the permissions on it?
Thanks, Richard
On 08/03/2017 10:48 AM, Richard Shaw wrote:
Ok, for the same nvidia module and kernel version for me...
Dependencies resolved.
Package Arch Version Repository Size
Installing: kmod-nvidia-4.11.12-200.fc25.x86_64 x86_64 2:375.66-3.fc25 @commandline 6.1 M
Transaction Summary
Install 1 Package
Total size: 6.1 M Installed size: 18 M Downloading Packages: Running transaction check Waiting for process with pid 13941 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Installing : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 Verifying : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1
Installed: kmod-nvidia-4.11.12-200.fc25.x86_64.x86_64 2:375.66-3.fc25
Is this from the akmods log or did you install the kmod-nvidia module manually with dnf from the rpmfusion repo (if that's even possible these days)?
No errors during install of the package or during the build. There must be something funky in your environment...
Not that anything useful/needed should be in /usr/local/include but what are the permissions on it?
[root@new_pons ~]# ls -l /usr/local | fgrep include drwxr-x--- 3 root root 4096 Aug 3 09:15 include
The reason it has today's date on it was that I created and deleted a dummy file in it just to see if anything funky was going on.
I'm completely stumped. Again, thanks so much for helping with this.
-Sherman
On Thu, Aug 3, 2017 at 2:22 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
On 08/03/2017 10:48 AM, Richard Shaw wrote:
Ok, for the same nvidia module and kernel version for me...
Dependencies resolved.
==================== Package Arch Version Repository Size ============================================================ ==================== Installing: kmod-nvidia-4.11.12-200.fc25.x86_64 x86_64 2:375.66-3.fc25 @commandline 6.1 M
Transaction Summary
==================== Install 1 Package
Total size: 6.1 M Installed size: 18 M Downloading Packages: Running transaction check Waiting for process with pid 13941 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Installing : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 Verifying : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1
Installed: kmod-nvidia-4.11.12-200.fc25.x86_64.x86_64 2:375.66-3.fc25
Is this from the akmods log or did you install the kmod-nvidia module manually with dnf from the rpmfusion repo (if that's even possible these days)?
Yes, they are still built but usually take a few days to push (it's manual IIRC) so it could prevent you from using a new kernel for a little while which is why I use akmods...
No errors during install of the package or during the build. There must be
something funky in your environment...
Not that anything useful/needed should be in /usr/local/include but what are the permissions on it?
[root@new_pons ~]# ls -l /usr/local | fgrep include drwxr-x--- 3 root root 4096 Aug 3 09:15 include
Ok, there is a problem but I don't know if it's THE problem. Everyone should have read and execute permissions otherwise a normal user won't be able to access the directory (and you should never build software as root!)
I've included all the directories as you should check the others as well:
$ ll /usr/local total 40 drwxr-xr-x. 2 root root 4096 Aug 1 17:45 bin drwxr-xr-x. 2 root root 4096 Feb 3 2016 etc drwxr-xr-x. 2 root root 4096 Feb 3 2016 games drwxr-xr-x. 6 root root 4096 Feb 18 07:27 include drwxr-xr-x. 7 root root 4096 Feb 18 07:27 lib drwxr-xr-x. 2 root root 4096 Feb 3 2016 lib64 drwxr-xr-x. 2 root root 4096 Feb 3 2016 libexec drwxr-xr-x. 2 root root 4096 Feb 18 07:27 sbin drwxr-xr-x. 12 root root 4096 Aug 1 17:45 share drwxr-xr-x. 2 root root 4096 Feb 3 2016 src
A "chmod 0755 /usr/local/include" should fix that particular error, but like I said, there's no telling if that's the underlying problem.
Thanks, Richard
On 08/03/2017 01:35 PM, Richard Shaw wrote:
On Thu, Aug 3, 2017 at 2:22 PM, Sherman Grunewagen <sugarwagon@gmx.com mailto:sugarwagon@gmx.com> wrote:
On 08/03/2017 10:48 AM, Richard Shaw wrote: Ok, for the same nvidia module and kernel version for me... Dependencies resolved. ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: kmod-nvidia-4.11.12-200.fc25.x86_64 x86_64 2:375.66-3.fc25 @commandline 6.1 M Transaction Summary ================================================================================ Install 1 Package Total size: 6.1 M Installed size: 18 M Downloading Packages: Running transaction check Waiting for process with pid 13941 to finish. Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Installing : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 Verifying : kmod-nvidia-4.11.12-200.fc25.x86_64-2:375.66-3.fc25.x86_6 1/1 Installed: kmod-nvidia-4.11.12-200.fc25.x86_64.x86_64 2:375.66-3.fc25 Is this from the akmods log or did you install the kmod-nvidia module manually with dnf from the rpmfusion repo (if that's even possible these days)?
Yes, they are still built but usually take a few days to push (it's manual IIRC) so it could prevent you from using a new kernel for a little while which is why I use akmods...
No errors during install of the package or during the build. There must be something funky in your environment... Not that anything useful/needed should be in /usr/local/include but what are the permissions on it? [root@new_pons ~]# ls -l /usr/local | fgrep include drwxr-x--- 3 root root 4096 Aug 3 09:15 include
Ok, there is a problem but I don't know if it's THE problem. Everyone should have read and execute permissions otherwise a normal user won't be able to access the directory (and you should never build software as root!)
I've included all the directories as you should check the others as well:
$ ll /usr/local total 40 drwxr-xr-x. 2 root root 4096 Aug 1 17:45 bin drwxr-xr-x. 2 root root 4096 Feb 3 2016 etc drwxr-xr-x. 2 root root 4096 Feb 3 2016 games drwxr-xr-x. 6 root root 4096 Feb 18 07:27 include drwxr-xr-x. 7 root root 4096 Feb 18 07:27 lib drwxr-xr-x. 2 root root 4096 Feb 3 2016 lib64 drwxr-xr-x. 2 root root 4096 Feb 3 2016 libexec drwxr-xr-x. 2 root root 4096 Feb 18 07:27 sbin drwxr-xr-x. 12 root root 4096 Aug 1 17:45 share drwxr-xr-x. 2 root root 4096 Feb 3 2016 src
A "chmod 0755 /usr/local/include" should fix that particular error, but like I said, there's no telling if that's the underlying problem.
To be really complete, I'd suggest (as the root user):
# chmod 755 /usr/local/include # find /usr/local/include -type d -exec chmod 755 {} ; # find /usr/local/include -type f -exec chmod 644 {} ;
This would make /usr/local/include and all _directories_ under it mode "rwxr-xr-x". The last command would make all _files_ under /usr/local/include/* mode "rw-r--r--" (the files don't need execute privileges). ---------------------------------------------------------------------- - Rick Stevens, Systems Engineer, AllDigital ricks@alldigital.com - - AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 - - - - UNIX is actually quite user friendly. The problem is that it's - - just very picky of who its friends are! - ----------------------------------------------------------------------
On 08/03/2017 01:35 PM, Richard Shaw wrote:
On Thu, Aug 3, 2017 at 2:22 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
<snip>
Not that anything useful/needed should be in /usr/local/include but what are the permissions on it?
[root@new_pons ~]# ls -l /usr/local | fgrep include drwxr-x--- 3 root root 4096 Aug 3 09:15 include
Ok, there is a problem but I don't know if it's THE problem. Everyone should have read and execute permissions otherwise a normal user won't be able to access the directory (and you should never build software as root!)
I've included all the directories as you should check the others as well:
$ ll /usr/local total 40 drwxr-xr-x. 2 root root 4096 Aug 1 17:45 bin drwxr-xr-x. 2 root root 4096 Feb 3 2016 etc drwxr-xr-x. 2 root root 4096 Feb 3 2016 games drwxr-xr-x. 6 root root 4096 Feb 18 07:27 include drwxr-xr-x. 7 root root 4096 Feb 18 07:27 lib drwxr-xr-x. 2 root root 4096 Feb 3 2016 lib64 drwxr-xr-x. 2 root root 4096 Feb 3 2016 libexec drwxr-xr-x. 2 root root 4096 Feb 18 07:27 sbin drwxr-xr-x. 12 root root 4096 Aug 1 17:45 share drwxr-xr-x. 2 root root 4096 Feb 3 2016 src
A "chmod 0755 /usr/local/include" should fix that particular error, but like I said, there's no telling if that's the underlying problem.
That was the problem! Nice catch. I'm clueless as to how previous kmod-nvidia modules have built and run. And I don't understand why, when running akmods as root, the "make" process is unable to "see" inside /usr/local/include (nor why it needs to, but that's a different question).
But turning on the "other" permissions did the trick! I would not have guessed it.
-Sherman
P.S. I just checked and gcc and its kin were updated yesterday when this problem began. Perhaps there's some new permissions demotion going on in the compiler that wasn't there before.
Forgot the "SOLVED!" in the Subject line.
On 08/03/2017 06:42 PM, Sherman Grunewagen wrote:
On 08/03/2017 01:35 PM, Richard Shaw wrote:
On Thu, Aug 3, 2017 at 2:22 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
<snip>
Not that anything useful/needed should be in /usr/local/include but what are the permissions on it?
[root@new_pons ~]# ls -l /usr/local | fgrep include drwxr-x--- 3 root root 4096 Aug 3 09:15 include
Ok, there is a problem but I don't know if it's THE problem. Everyone should have read and execute permissions otherwise a normal user won't be able to access the directory (and you should never build software as root!)
I've included all the directories as you should check the others as well:
$ ll /usr/local total 40 drwxr-xr-x. 2 root root 4096 Aug 1 17:45 bin drwxr-xr-x. 2 root root 4096 Feb 3 2016 etc drwxr-xr-x. 2 root root 4096 Feb 3 2016 games drwxr-xr-x. 6 root root 4096 Feb 18 07:27 include drwxr-xr-x. 7 root root 4096 Feb 18 07:27 lib drwxr-xr-x. 2 root root 4096 Feb 3 2016 lib64 drwxr-xr-x. 2 root root 4096 Feb 3 2016 libexec drwxr-xr-x. 2 root root 4096 Feb 18 07:27 sbin drwxr-xr-x. 12 root root 4096 Aug 1 17:45 share drwxr-xr-x. 2 root root 4096 Feb 3 2016 src
A "chmod 0755 /usr/local/include" should fix that particular error, but like I said, there's no telling if that's the underlying problem.
That was the problem! Nice catch. I'm clueless as to how previous kmod-nvidia modules have built and run. And I don't understand why, when running akmods as root, the "make" process is unable to "see" inside /usr/local/include (nor why it needs to, but that's a different question).
But turning on the "other" permissions did the trick! I would not have guessed it.
-Sherman
P.S. I just checked and gcc and its kin were updated yesterday when this problem began. Perhaps there's some new permissions demotion going on in the compiler that wasn't there before.
users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org
On Thu, Aug 3, 2017 at 8:42 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
That was the problem! Nice catch. I'm clueless as to how previous kmod-nvidia modules have built and run. And I don't understand why, when running akmods as root, the "make" process is unable to "see" inside /usr/local/include (nor why it needs to, but that's a different question).
But turning on the "other" permissions did the trick! I would not have guessed it.
Well, akmods runs as root, but the script that actually builds the package, akmodsbuild, runs as a normal user. You should never build software as root :)
Glad that was it!
Richard
On 08/04/2017 05:13 AM, Richard Shaw wrote:
On Thu, Aug 3, 2017 at 8:42 PM, Sherman Grunewagen sugarwagon@gmx.com wrote:
That was the problem! Nice catch. I'm clueless as to how previous kmod-nvidia modules have built and run. And I don't understand why, when running akmods as root, the "make" process is unable to "see" inside /usr/local/include (nor why it needs to, but that's a different question).
But turning on the "other" permissions did the trick! I would not have guessed it.
Well, akmods runs as root, but the script that actually builds the package, akmodsbuild, runs as a normal user. You should never build software as root :)
Glad that was it!
Thanks again, Richard.
Your comment about akmodbuild does make me scratch my head because it raises the question (again) of how the previous three builds worked. My /usr/local/include permissions have not changed in years. (When I install a new Fedora I always overwrite /usr/local with my existing one, keeping all permissions and ownerships the same.
All I can suppose is that the update to the gcc suite now causes the compiler to take a look in /usr/local/include whereas before it didn't, or if it did and couldn't, it went on gracefully. If my guess is correct, this seems like a bug since there's nothing needed in /usr/local/include.
-Sherman