I recently noticed that `gkrellm` was not showing the temperatures for my drives (SATA SSD in my case).
It turns out that the 6.10 kernel updates fixed something to spec that thus broke some stuff that was depending on a non-spec output that had been going on for some time.
The kernel "regression" is shown here:
https://lore.kernel.org/all/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
The symptom for hddtemp is that it will show "drive is sleeping" unless you give the -w switch to "wake up the drive". You would need to use that each time you query it. I guess the issue shows up in hdparm also and some other stuff. So they might back this out, but the proper fix should be to the stuff that is depending on non-spec output. Personally I am running the last 6.9 kernel and watching the updates that show up to see if a new kernel reverts this or if a new version of hdparm or hddtemp will fix it.
Posting note: Hope this does not double post. My first try was not from my fedora e-mail address.
Zero chance they back that out. It is not a regression in the kernel, a valid fix exposed bad code in user space.
The defect is in user space commands not checking what the kernel returned. The bug is not in the kernel. The solution will be to fix the user space programs and/or require the extra option to spin the disk up.
And having a hdtemp command/smartctl command spin up a drive to ask its temp is a power wasting command. The drive stays spun up for several minutes and uses around 5w more while it is spun up. So one cannot argue with it returning a "disk is sleeping" result.
On Fri, Aug 16, 2024 at 11:18 AM Doug H. fedoraproject.org@wombatz.com wrote:
I recently noticed that `gkrellm` was not showing the temperatures for my drives (SATA SSD in my case).
It turns out that the 6.10 kernel updates fixed something to spec that thus broke some stuff that was depending on a non-spec output that had been going on for some time.
The kernel "regression" is shown here:
https://lore.kernel.org/all/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
The symptom for hddtemp is that it will show "drive is sleeping" unless you give the -w switch to "wake up the drive". You would need to use that each time you query it. I guess the issue shows up in hdparm also and some other stuff. So they might back this out, but the proper fix should be to the stuff that is depending on non-spec output. Personally I am running the last 6.9 kernel and watching the updates that show up to see if a new kernel reverts this or if a new version of hdparm or hddtemp will fix it.
Posting note: Hope this does not double post. My first try was not from my fedora e-mail address.
-- Doug H. -- _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
Once upon a time, Roger Heflin rogerheflin@gmail.com said:
Zero chance they back that out. It is not a regression in the kernel, a valid fix exposed bad code in user space.
I wouldn't say that - Linus is very adamant about not breaking existing user space code, which this change did. Since this is long-standing behavior, the fix is probably to revert the change to the existing interface and introduce a new interface to get the "correct" behavior.
This is BROKEN user space code (hdparm assumes what is being returned). And broken userspace code is different.
He does not like to break correct userspace code.
And this code is in what I would classify as a non-critical path, this breaks some monitoring tools/utilities but does not impact critical functionality and/or change results that customers/clients would be depending on outside of the monitoring path.
On Fri, Aug 16, 2024 at 12:37 PM Chris Adams linux@cmadams.net wrote:
Once upon a time, Roger Heflin rogerheflin@gmail.com said:
Zero chance they back that out. It is not a regression in the kernel, a valid fix exposed bad code in user space.
I wouldn't say that - Linus is very adamant about not breaking existing user space code, which this change did. Since this is long-standing behavior, the fix is probably to revert the change to the existing interface and introduce a new interface to get the "correct" behavior.
-- Chris Adams linux@cmadams.net -- _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
On Fri, 2024-08-16 at 12:30 -0500, Roger Heflin wrote:
And having a hdtemp command/smartctl command spin up a drive to ask its temp is a power wasting command. The drive stays spun up for several minutes and uses around 5w more while it is spun up. So one cannot argue with it returning a "disk is sleeping" result.
You also want to consider what do you want the temperature for.
Surely a sleeping drive oughtn't to be a worry about whether it needed cooling? Unless it's packed into some RAID frame, in which case *that* ought to have continuous cooling running.
On 16 Aug 2024, at 17:17, Doug H. fedoraproject.org@wombatz.com wrote:
It turns out that the 6.10 kernel updates fixed something to spec that thus broke some stuff that was depending on a non-spec output that had been going on for some time.
I can get the temp for my nvme drive with smartctl:
$ smartctl -A /dev/nvme0n1 smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.3-200.fc40.x86_64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF SMART DATA SECTION === SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 32 Celsius Available Spare: 100% Available Spare Threshold: 10% ... Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 32 Celsius Temperature Sensor 2: 41 Celsius
And also for my SATA SSD:
$ smartctl -A /dev/sda smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.10.3-200.fc40.x86_64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0 ... 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 34 ... 241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 348646 242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 46275 243 NAND_Writes_32MiB 0x0032 100 100 000 Old_age Always - 513546
This is with kernel:
$ uname -r 6.10.3-200.fc40.x86_64
Barry
On Fri, Aug 16, 2024, at 1:22 PM, Barry Scott wrote:
I can get the temp for my nvme drive with smartctl:
Yes, that works for my SATA drives also. The problem for me is that I like using `gkrellm` to keep track of drive temp and to alert at my preferred setting of "too hot".
I can also just use `hddtemp -w` to show the temp, but again, it is `gkrellm` that I want to work and I don't see a way to get it to add that "-w" when it checks.
On 16 Aug 2024, at 21:48, Doug Herr fedoraproject.org@wombatz.com wrote:
I can also just use `hddtemp -w` to show the temp, but again, it is `gkrellm` that I want to work and I don't see a way to get it to add that "-w" when it checks.
Try contacting the author, there is an email address at the bottom of the project page. Here http://gkrellm.srcbox.net/
Barry
On Fri, Aug 16, 2024, at 11:57 PM, Barry wrote:
Try contacting the author, there is an email address at the bottom of the project page. Here http://gkrellm.srcbox.net/
I checked the source of gkrellm and they are simply doing a query to the hddtemp daemon. I was not able to get that to work with the -w switch. That switch seems to only work when calling hddtemp directly from the command line.
On Fri, Aug 16, 2024, at 9:17 AM, Doug H. wrote:
I recently noticed that `gkrellm` was not showing the temperatures for my drives (SATA SSD in my case).
It turns out that the 6.10 kernel updates fixed something to spec that thus broke some stuff that was depending on a non-spec output that had been going on for some time.
The kernel "regression" is shown here:
https://lore.kernel.org/all/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
The symptom for hddtemp is that it will show "drive is sleeping" unless you give the -w switch to "wake up the drive". You would need to use that each time you query it. I guess the issue shows up in hdparm also and some other stuff. So they might back this out, but the proper fix should be to the stuff that is depending on non-spec output. Personally I am running the last 6.9 kernel and watching the updates that show up to see if a new kernel reverts this or if a new version of hdparm or hddtemp will fix it.
I think the below is showing that this is being reverted:
https://lore.kernel.org/stable/20240813131900.1285842-2-cassel@kernel.org/T/
On Sat, 2024-08-17 at 07:45 -0700, Doug Herr wrote:
On Fri, Aug 16, 2024, at 9:17 AM, Doug H. wrote:
I recently noticed that `gkrellm` was not showing the temperatures for my drives (SATA SSD in my case).
It turns out that the 6.10 kernel updates fixed something to spec that thus broke some stuff that was depending on a non-spec output that had been going on for some time.
The kernel "regression" is shown here:
https://lore.kernel.org/all/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
The symptom for hddtemp is that it will show "drive is sleeping" unless you give the -w switch to "wake up the drive". You would need to use that each time you query it. I guess the issue shows up in hdparm also and some other stuff. So they might back this out, but the proper fix should be to the stuff that is depending on non-spec output. Personally I am running the last 6.9 kernel and watching the updates that show up to see if a new kernel reverts this or if a new version of hdparm or hddtemp will fix it.
I think the below is showing that this is being reverted:
https://lore.kernel.org/stable/20240813131900.1285842-2-cassel@kernel.org/T/
Yup, it works again...
uname -a
Linux wombat.wombatz.com 6.10.5-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 14 15:49:44 UTC 2024 x86_64 GNU/Linux
hddtemp
/dev/sda: CT1000MX500SSD1: 25°C /dev/sdb: CT1000MX500SSD1: 26°C
That kernel is still in testing, installed it via:
sudo dnf --enablerepo=updates-testing upgrade kernel