Just did a new installation of the basic fedora 34 desktop the one which provides a bunch of basic window managers using a netinstall iso.
The boot is interrupted for a longtime (2 minutes) while a start job runs for wait for udev to complete initialization.
The timeout is three minutes and it almost reaches that point before continuing.
I have looked through the syslogs but the only obvious problem is a core dump by alsa control which I don't understand because prior to this running Fedora 33 nothing similar happened.
Need some advice on how to debug this.
On 6/11/21 8:16 PM, Frank McCormick wrote:
The boot is interrupted for a longtime (2 minutes) while a start job runs for wait for udev to complete initialization.
The timeout is three minutes and it almost reaches that point before continuing.
Run systemd-analyze blame and take a look at the first items listed.
On Fri, 2021-06-11 at 20:26 -0600, Joe Zeff wrote:
On 6/11/21 8:16 PM, Frank McCormick wrote:
The boot is interrupted for a longtime (2 minutes) while a start job runs for wait for udev to complete initialization.
The timeout is three minutes and it almost reaches that point before continuing.
Run systemd-analyze blame and take a look at the first items listed.
I don't want to hijack Frank's thread, but I have a similar situation, apparently caused by a very slow initialization of an external USB drive. However the drive is not automatically mounted (it's controlled by an automount) so I don't see why the system needs to wait for it.
I've posted the output of "systemd-analyse plot" if anyone one to take a look. The culprit is obvious at a glance. The only thing special about this device is that it's an external dock with two disks configured as a BTRFS RAID1.
See https://paste.centos.org/view/7f4216fa
poc
Systemd-analyze blame pinpoints "systemd-udev-settle.service" more than a 2 minute wait!!
Both problems need logs. It's quite a bit over kill but these boot parameters will help provide enough info.
systemd.log_level=debug udev.log-priority=debug rd.debug x-systemd.timeout=180
The debug options are resource inventive and slow down boot by allot. The point of the timeout is hopefully avoiding the dracut shell. But better to get the shell than an indefinite hang.
journalctl -b -o short-monotonic --no-hostname > journal.log
Copy that out to a file sharing service, and post the URL.
Chris Murphy
I found and fixed the problem. The key was in my systemd log.
1.825245] udevadm[363]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in
Personally I think this should be spread more...it seems there are quite a few with this problem
On Sat, Jun 12, 2021 at 9:32 AM Frank McCormick beacon@videotron.ca wrote:
I found and fixed the problem. The key was in my systemd log.
1.825245] udevadm[363]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in
Interesting; on my F34 system, the message mentions nm-run.service rather than nm-initrd.service. Neither nm-run nor nm-initrd exist as services, but something is calling systemd-udev-settle although it only takes 2 seconds to run and exit, so I never noticed this before.
--Greg
On 6/12/21 6:32 AM, Frank McCormick wrote:
Systemd-analyze blame pinpoints "systemd-udev-settle.service" more than a 2 minute wait!!
According to https://github.com/openzfs/zfs/issues/10891 has been depricated since last year. Disable and mask it and that should fix your issue.
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891 has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
On Sat, 2021-06-12 at 07:25 -0600, Chris Murphy wrote:
Both problems need logs. It's quite a bit over kill but these boot parameters will help provide enough info.
systemd.log_level=debug udev.log-priority=debug rd.debug x-systemd.timeout=180
The debug options are resource inventive and slow down boot by allot. The point of the timeout is hopefully avoiding the dracut shell. But better to get the shell than an indefinite hang.
journalctl -b -o short-monotonic --no-hostname > journal.log
Copy that out to a file sharing service, and post the URL.
https://paste.centos.org/view/3932b04e
poc
On Sat, 2021-06-12 at 11:31 -0400, Frank McCormick wrote:
I found and fixed the problem. The key was in my systemd log.
1.825245] udevadm[363]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in
Personally I think this should be spread more...it seems there are quite a few with this problem
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
poc
On Sat, 2021-06-12 at 18:54 +0100, Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 11:31 -0400, Frank McCormick wrote:
I found and fixed the problem. The key was in my systemd log.
1.825245] udevadm[363]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in
Personally I think this should be spread more...it seems there are quite a few with this problem
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
I spoke too soon. A grep on the journal log (posted earlier) shows:
$ grep settle journal.log [ 1.763070] udevadm[412]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. [ 32.655548] systemd[1]: systemd-udev-settle.service: Deactivated successfully. [ 34.383171] udevadm[626]: systemd-udev-settle.service is deprecated. Please fix multipathd.service not to pull it in.
I have no idea what this means. What does NM have to do with anything and how do I "fix" it? I also don't have an /etc/multipath.conf file. Could that be the problem?
poc
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891 has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead) since Sun 2021-06-13 00:14:30 BST; 29min left Main PID: 413 (code=exited, status=0/SUCCESS) CPU: 0
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
poc
On 13/06/2021 06:46, Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead) since Sun 2021-06-13 00:14:30 BST; 29min left Main PID: 413 (code=exited, status=0/SUCCESS) CPU: 0
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
But, does your plot show a difference?
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
On 13/06/2021 06:57, Ed Greshko wrote:
But, does your plot show a difference?
Speaking of your plot.....
Don't you think the time between
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.2.device and dev-disk-by\x2dpath-pci\x2d0000:00:14.0\x2dusb\x2d0:3:1.0\x2dscsi\x2d0:0:0:1.device
worth looking into?
On 13/06/2021 07:06, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
Since it is being "pulled in" by other services I think you may expect to see this.
FWIW, I masked that service and it still shows up in systemd-analyse but the times have changed from several seconds (I think it was 6ish) to 542ms.
On 2021-06-12 4:14 p.m., Ed Greshko wrote:
On 13/06/2021 07:06, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
Since it is being "pulled in" by other services I think you may expect to see this.
From the systemctl man page describing the "mask" command: Mask one or more units, as specified on the command line. This will link these unit files to /dev/null, making it impossible to start them. This is a stronger version of disable, since it prohibits all kinds of activation of the unit, including enablement and manual activation.
"impossible to start them" and "prohibits all kinds of activation of the unit"
And from the description of what the "masked" status means: Completely disabled, so that any start operation on it fails.
But as shown from the quoted log lines, "udevadm" is definitely getting called still.
On 13/06/2021 07:22, Samuel Sieb wrote:
On 2021-06-12 4:14 p.m., Ed Greshko wrote:
On 13/06/2021 07:06, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
> According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been > depricated since last year. Disable and mask it and that should > fix > your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
Since it is being "pulled in" by other services I think you may expect to see this.
From the systemctl man page describing the "mask" command: Mask one or more units, as specified on the command line. This will link these unit files to /dev/null, making it impossible to start them. This is a stronger version of disable, since it prohibits all kinds of activation of the unit, including enablement and manual activation.
Yes, but I would think there is still a period of time associated with calling the service, discovering the link to /dev/null and then understanding the service has completed.
It isn't as if the service file no longer exits.
"impossible to start them" and "prohibits all kinds of activation of the unit"
And from the description of what the "masked" status means: Completely disabled, so that any start operation on it fails.
But as shown from the quoted log lines, "udevadm" is definitely getting called still.
On 6/12/21 4:29 PM, Ed Greshko wrote:
On 13/06/2021 07:22, Samuel Sieb wrote:
On 2021-06-12 4:14 p.m., Ed Greshko wrote:
On 13/06/2021 07:06, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote: > On Sat, 12 Jun 2021 10:44:19 -0600 > Joe Zeff wrote: > >> According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been >> depricated since last year. Disable and mask it and that should >> fix >> your issue. > > It is listed as a "static" service on my system. Can those be > disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in.
From the systemctl man page describing the "mask" command: Mask one or more units, as specified on the command line. This will link these unit files to /dev/null, making it impossible to start them. This is a stronger version of disable, since it prohibits all kinds of activation of the unit, including enablement and manual activation.
Yes, but I would think there is still a period of time associated with calling the service, discovering the link to /dev/null and then understanding the service has completed.
It isn't as if the service file no longer exits.
"impossible to start them" and "prohibits all kinds of activation of the unit"
And from the description of what the "masked" status means: Completely disabled, so that any start operation on it fails.
But as shown from the quoted log lines, "udevadm" is definitely getting called still.
Did you miss this line? "udevadm settle" is still getting called according to that log, so the original service file is still getting run for some reason. From your earlier message, 542ms is still kind of long, but not unreasonable. Do you still get the message from udevadm in the log, though?
On 13/06/2021 07:36, Samuel Sieb wrote:
Yes, but I would think there is still a period of time associated with calling the service, discovering the link to /dev/null and then understanding the service has completed.
It isn't as if the service file no longer exits.
"impossible to start them" and "prohibits all kinds of activation of the unit"
And from the description of what the "masked" status means: Completely disabled, so that any start operation on it fails.
But as shown from the quoted log lines, "udevadm" is definitely getting called still.
Did you miss this line? "udevadm settle" is still getting called according to that log, so the original service file is still getting run for some reason. From your earlier message, 542ms is still kind of long, but not unreasonable. Do you still get the message from udevadm in the log, though?
No, I didn't. And, yes, it is my log too.
Isn't nm-initrd.service added to the initramfs?
So, that contains
Wants=systemd-udev-settle.service After=systemd-udev-settle.service After=dracut-cmdline.service Before=network.target
So, it would still be referenced, attempted to be started, but and then it takes a bit of time to determine not to use the service file in /lib/systemd/system but the one in /etc/systemd/system read the "commands" from /dev/null and then exit.
since /usr/lib/dracut/modules.d/35network-manager/nm-initrd.service contains that service file wouldn't it make sense to request that changes are made to dracut-network which provides that?
On 2021-06-12 4:59 p.m., Ed Greshko wrote:
On 13/06/2021 07:36, Samuel Sieb wrote:
Yes, but I would think there is still a period of time associated with calling the service, discovering the link to /dev/null and then understanding the service has completed.
It isn't as if the service file no longer exits.
"impossible to start them" and "prohibits all kinds of activation of the unit"
And from the description of what the "masked" status means: Completely disabled, so that any start operation on it fails.
But as shown from the quoted log lines, "udevadm" is definitely getting called still.
Did you miss this line? "udevadm settle" is still getting called according to that log, so the original service file is still getting run for some reason. From your earlier message, 542ms is still kind of long, but not unreasonable. Do you still get the message from udevadm in the log, though?
No, I didn't. And, yes, it is my log too.
Isn't nm-initrd.service added to the initramfs?
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev-settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm-initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
On Sun, 2021-06-13 at 07:09 +0800, Ed Greshko wrote:
On 13/06/2021 06:57, Ed Greshko wrote:
But, does your plot show a difference?
Speaking of your plot.....
Don't you think the time between
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6- 1\x2d1.6.2.device and dev-disk-by\x2dpath- pci\x2d0000:00:14.0\x2dusb\x2d0:3:1.0\x2dscsi\x2d0:0:0:1.device
worth looking into?
Of course. That's precisely the issue I'm concerned about. I don't see what's causing it. My working hypothesis is that it's somehow related to the fact that the external dock supports two drives in a BTRFS RAID1 configuration and that the kernel is verifying them when it starts up, even though the drives are not being mounted (they have an automount unit but nothing in /etc/fstab).
Why it would delay the rest of the system startup while this is happening is something I don't understand. The delay is very visible (I get three dots on a blank screen while it's happening).
poc
On Sat, 2021-06-12 at 16:06 -0700, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
$ ls /etc/systemd/system/systemd-udev-settle.service /etc/systemd/system/systemd-udev-settle.service
poc
On Sun, 2021-06-13 at 09:02 +0800, Ed Greshko wrote:
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev-settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm- initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
Interesting. Wouldn't that count as a bug in nm-initrd.service?
poc
On 13/06/2021 19:07, Patrick O'Callaghan wrote:
On Sun, 2021-06-13 at 09:02 +0800, Ed Greshko wrote:
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev-settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm- initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
Interesting. Wouldn't that count as a bug in nm-initrd.service?
Well, to me, it would mean a bugzilla should be written against the dracut-network package requesting the file be modified to eliminate the log entries.
On Sat, 2021-06-12 at 18:19 -0600, Joe Zeff wrote:
On 6/12/21 4:46 PM, Patrick O'Callaghan wrote:
I've masked it and rebooted. Makes no difference to boot time in my case.
Have you checked with systend-analyze blame again to see what else might be delaying your boot?
I'm 99% certain it's being caused by my external USB dock starting up. See my reply to Ed. The dock is not mounted at boot, but has a BTRFS filesystem that (possibly) the kernel insists on checking before the rest of the startup can proceed. This is speculation at the moment.
systemd-analyze blame shows a long delay in a unit I created to automatically power down the dock if it's not mounted:
$ systemd-analyze blame|head 4min 18.016s dock-watch.service 30.517s systemd-udev-settle.service 15.273s logrotate.service 6.274s NetworkManager-wait-online.service 5.765s raid.mount 5.452s plymouth-quit-wait.service 5.038s akmods.service 4.541s upower.service 4.427s sssd.service
I've uploaded the dock-watch unit and the scripts it calls, together with the automount unit, to:
https://drive.google.com/drive/folders/1BT5w4u7TzBmWbhx97sWvfOErCIUylrik?usp...
poc
On Sun, 2021-06-13 at 19:19 +0800, Ed Greshko wrote:
On 13/06/2021 19:07, Patrick O'Callaghan wrote:
On Sun, 2021-06-13 at 09:02 +0800, Ed Greshko wrote:
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev- settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm- initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
Interesting. Wouldn't that count as a bug in nm-initrd.service?
Well, to me, it would mean a bugzilla should be written against the dracut-network package requesting the file be modified to eliminate the log entries.
You're probably right. Either way, it should probably be reported.
poc
On Sun, Jun 13, 2021 at 09:02:49AM +0800, Ed Greshko wrote:
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev-settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm-initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
Just a data point, all with the same kernel, 5.12.9-300.fc34:
My reboot times were over 2 minutes. Systemd-analyze blame showed systemd-udev-settle.service taking 2+ min. I then tried to disable the unit and mask it. Next I commented the lines noted above.
Unit state Appears in systemd-analyze blame critical-chain
unchanged Y Y disabled&masked Y N disabled&masked + N N nm-initrd unit edited
That repeats what others observed. But the reboot time change was dramatic. Here are the top lines of systemd-analyze critical-chain for the last two reboots above:
graphical.target @2min 14.541s
graphical.target @10.072s
On Sun, 2021-06-13 at 11:44 -0400, Jon LaBadie wrote:
On Sun, Jun 13, 2021 at 09:02:49AM +0800, Ed Greshko wrote:
On 13/06/2021 08:41, Samuel Sieb wrote:
So that gave me the clue that led to the real problem. The service is included in the initrd and masking it on the system doesn't change that. Even re-creating the initrd with dracut after masking didn't change anything.
And also the nm-initrd.service file is in dracut, so you would have to modify it there (not the system one) to change this.
FWIW, I had a VM which needed a kernel update.
Prior to the update it had references to systemd-udev- settle.service in the logs.
I edited /usr/lib/dracut/modules.d/35network-manager/nm- initrd.service to have
DefaultDependencies=no #Wants=systemd-udev-settle.service #After=systemd-udev-settle.service After=dracut-cmdline.service
And I installed the latest kernel.
Now....
[egreshko@f34x ~]$ systemctl status systemd-udev-settle.service ○ systemd-udev-settle.service Loaded: masked (Reason: Unit systemd-udev-settle.service is masked.) Active: inactive (dead)
And nothing in the logs about systemd-udev-settle
Just a data point, all with the same kernel, 5.12.9-300.fc34:
My reboot times were over 2 minutes. Systemd-analyze blame showed systemd-udev-settle.service taking 2+ min. I then tried to disable the unit and mask it. Next I commented the lines noted above.
Unit state Appears in systemd-analyze blame critical-chain
unchanged Y Y disabled&masked Y N disabled&masked + N N nm-initrd unit edited
That repeats what others observed. But the reboot time change was dramatic. Here are the top lines of systemd-analyze critical-chain for the last two reboots above:
graphical.target @2min 14.541s
graphical.target @10.072s
I did the same and it has made a big difference, i.e. I no longer have the very long delay waiting for usb-settle.
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
On 6/13/21 11:44 AM, Go Canes wrote:
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
Maybe pvscan wants to check the drive(s) to see if it is a LVM volume? Not sure how to confirm/refute, or how to tell LVM to ignore the USB dock if that is the case.
This gives me a thought. Boot the machine with the external dock disconnected and make sure that the boot time's as short as expected. If so, something about the dock is the issue. Assuming that that's true, plug in the external drive after everything's up and running to see how long it takes to register its existence.
On 2021-06-13 4:03 a.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 16:06 -0700, Samuel Sieb wrote:
On 2021-06-12 3:46 p.m., Patrick O'Callaghan wrote:
On Sat, 2021-06-12 at 14:43 -0700, Samuel Sieb wrote:
On 2021-06-12 10:19 a.m., Tom Horsley wrote:
On Sat, 12 Jun 2021 10:44:19 -0600 Joe Zeff wrote:
According to https://github.com/openzfs/zfs/issues/10891%C2%A0has been depricated since last year. Disable and mask it and that should fix your issue.
It is listed as a "static" service on my system. Can those be disabled or masked?
It can't be disabled, but it can be masked.
I've masked it and rebooted. Makes no difference to boot time in my case.
Jun 13 00:13:59 Bree systemd[1]: Starting Wait for udev To Complete Device Initialization... Jun 13 00:13:59 Bree udevadm[413]: systemd-udev-settle.service is deprecated. Please fix nm-initrd.service not to pull it in. Jun 13 00:14:29 Bree systemd[1]: Finished Wait for udev To Complete Device Initialization. Jun 13 00:14:30 Bree systemd[1]: systemd-udev-settle.service: Deactivated successfully. Jun 13 00:14:30 Bree systemd[1]: Stopped Wait for udev To Complete Device Initialization.
That is weird. It shouldn't get run if it's masked. Is the service file in /etc/systemd/system by any chance?
$ ls /etc/systemd/system/systemd-udev-settle.service /etc/systemd/system/systemd-udev-settle.service
I think if you use "ls -l", you'll find that it's a symlink to /dev/null. In the other thread we figured out that it's actually in the initrd as well, so masking it on the system only has a partial effect.
On 13/06/2021 23:57, Patrick O'Callaghan wrote:
I did the same and it has made a big difference, i.e. I no longer have the very long delay waiting for usb-settle.
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
I've not thought about these issues you're having in quite some time. But, while servicing a cat at 3AM I started to wonder.
If you were to disable the dock-watch.service does the boot process complete quickly?
If it does, might there be a way to start the service after the boot process is completed?
On Sat, Jun 12, 2021 at 11:20 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sat, 2021-06-12 at 07:25 -0600, Chris Murphy wrote:
Both problems need logs. It's quite a bit over kill but these boot parameters will help provide enough info.
systemd.log_level=debug udev.log-priority=debug rd.debug x-systemd.timeout=180
The debug options are resource inventive and slow down boot by allot. The point of the timeout is hopefully avoiding the dracut shell. But better to get the shell than an indefinite hang.
journalctl -b -o short-monotonic --no-hostname > journal.log
Copy that out to a file sharing service, and post the URL.
Sorry, it expired before I had a chance to look at it.
On Sun, Jun 13, 2021 at 3:56 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sun, 2021-06-13 at 07:09 +0800, Ed Greshko wrote:
On 13/06/2021 06:57, Ed Greshko wrote:
But, does your plot show a difference?
Speaking of your plot.....
Don't you think the time between
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6- 1\x2d1.6.2.device and dev-disk-by\x2dpath- pci\x2d0000:00:14.0\x2dusb\x2d0:3:1.0\x2dscsi\x2d0:0:0:1.device
worth looking into?
Of course. That's precisely the issue I'm concerned about. I don't see what's causing it. My working hypothesis is that it's somehow related to the fact that the external dock supports two drives in a BTRFS RAID1 configuration and that the kernel is verifying them when it starts up, even though the drives are not being mounted (they have an automount unit but nothing in /etc/fstab).
Why it would delay the rest of the system startup while this is happening is something I don't understand. The delay is very visible (I get three dots on a blank screen while it's happening).
Short version: Is this Btrfs raid1 listed at all in fstab? If so, add noauto,nofail to the mount options, see if that clears it up.
Long version: Dracut handles mdadm array assembly. Normal assembly (non-degraded) is done by dracut using the mdadm command; but if that fails, dracut starts a count down loop, I think 300 seconds, before it tries a degraded assembly. None of this exists for btrfs raid at all in dracut. For one, btrfs raid assembly is combined with mount. The mount command pointed to any of the member devices results in the kernel finding all the member devices automagically. If 1+ member is missing, mount fails. Since systemd only tries to mount one time, and because it's decently likely mounting a multiple device btrfs as /sysroot will fail as a result of one or more devices not yet being ready, there is a udev rule to wait for everyone to get ready:
/usr/lib/udev/rules.d/64-btrfs.rules
The gotcha is this simple rule waits indefinitely. This udev rule is there to make sure normal (non-degraded) boot doesn't incorrectly fail just because of a 1s delay with one of the devices showing up. But if a drive has actually failed, it results in a hang. Forever. You can add "x.systemd.timeout=300" boot parameter to approximate the rather long dracut wait for mdadm. And at a dracut shell, you can then just:
mount -o degraded /dev/sdXY /sysroot exit
And away you go. Of course this is non-obvious. And it needs to work better. And it will, eventually.
So the next gotcha is if /sysroot is not Btrfs. In this case there's a bug in dracut that prevents this udev rule from being put into the initramfs. That means anything that does try to mount a non-root Btrfs during boot, either fstab or gpt discoverable partitions, might possibly fail if "not all devices are ready" at the time of the mount attempt.
https://github.com/dracutdevs/dracut/issues/947
This should be fixed in dracut 055, but if you already have 055 and have an initramfs built with it and this problem you're having is a new problem, maybe we've got a regression in 055 or something? I'm not sure yet...still kinda in the dark on what's going wrong.
Also, it is possible it's not related to this btrfs file system at all, but I'm throwing it out there just as something to be aware of.
On Sun, Jun 13, 2021 at 5:26 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
I'm 99% certain it's being caused by my external USB dock starting up. See my reply to Ed. The dock is not mounted at boot, but has a BTRFS filesystem that (possibly) the kernel insists on checking before the rest of the startup can proceed. This is speculation at the moment.
systemd-analyze blame shows a long delay in a unit I created to automatically power down the dock if it's not mounted:
$ systemd-analyze blame|head 4min 18.016s dock-watch.service 30.517s systemd-udev-settle.service 15.273s logrotate.service 6.274s NetworkManager-wait-online.service 5.765s raid.mount 5.452s plymouth-quit-wait.service 5.038s akmods.service 4.541s upower.service 4.427s sssd.service
I've uploaded the dock-watch unit and the scripts it calls, together with the automount unit, to:
https://drive.google.com/drive/folders/1BT5w4u7TzBmWbhx97sWvfOErCIUylrik?usp...
dock-wait script contains: RAID=/dev/sdd
This may not be reliable because /dev nodes frequently change between reboots. You're better off using /dev/disk/by-... any of them are better than node. You can use label, uuid, wwn, whatever.
I actually use a udev rule for idle spin down:
$ cat /etc/udev/rules.d/69-hdparm.rules ACTION=="add", SUBSYSTEM=="block", \ KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL_SHORT}=="WDZ47F0A", \ RUN+="/usr/sbin/hdparm -B 100 -S 252 /dev/disk/by-id/wwn-0x5000c500a93cae8a" $
Chris Murphy
-- Chris Murphy
On Sun, 2021-06-13 at 13:44 -0400, Go Canes wrote:
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
Maybe pvscan wants to check the drive(s) to see if it is a LVM volume? Not sure how to confirm/refute, or how to tell LVM to ignore the USB dock if that is the case.
I'm not using LVM for any of my filesystems, only BTRFS (other than /boot/efi of course).
poc
On Sun, 2021-06-13 at 12:54 -0600, Joe Zeff wrote:
On 6/13/21 11:44 AM, Go Canes wrote:
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
Maybe pvscan wants to check the drive(s) to see if it is a LVM volume? Not sure how to confirm/refute, or how to tell LVM to ignore the USB dock if that is the case.
This gives me a thought. Boot the machine with the external dock If so, something about the dock is the issue. Assuming that that's see how long it takes to register its existence.
I was thinking of doing that when the usb-settle question came up. Having edited the initrd settings as Ed suggested this now seems to have improved the situation, touch wood. However I'll keep it in mind.
poc
On Sun, 2021-06-13 at 12:02 -0700, Samuel Sieb wrote:
$ ls /etc/systemd/system/systemd-udev-settle.service /etc/systemd/system/systemd-udev-settle.service
I think if you use "ls -l", you'll find that it's a symlink to /dev/null. In the other thread we figured out that it's actually in the initrd as well, so masking it on the system only has a partial effect.
As you say, it's a symlink to /dev/null.
poc
On Sun, 2021-06-13 at 13:53 -0600, Chris Murphy wrote:
On Sat, Jun 12, 2021 at 11:20 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sat, 2021-06-12 at 07:25 -0600, Chris Murphy wrote:
Both problems need logs. It's quite a bit over kill but these boot parameters will help provide enough info.
systemd.log_level=debug udev.log-priority=debug rd.debug x-systemd.timeout=180
The debug options are resource inventive and slow down boot by allot. The point of the timeout is hopefully avoiding the dracut shell. But better to get the shell than an indefinite hang.
journalctl -b -o short-monotonic --no-hostname > journal.log
Copy that out to a file sharing service, and post the URL.
Sorry, it expired before I had a chance to look at it.
I'll post another one if the situation recurs, but for now it seems to be resolved, touch wood.
poc
On Sun, 2021-06-13 at 14:21 -0600, Chris Murphy wrote:
On Sun, Jun 13, 2021 at 3:56 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sun, 2021-06-13 at 07:09 +0800, Ed Greshko wrote:
On 13/06/2021 06:57, Ed Greshko wrote:
But, does your plot show a difference?
Speaking of your plot.....
Don't you think the time between
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6- 1\x2d1.6.2.device and dev-disk-by\x2dpath- pci\x2d0000:00:14.0\x2dusb\x2d0:3:1.0\x2dscsi\x2d0:0:0:1.device
worth looking into?
Of course. That's precisely the issue I'm concerned about. I don't see what's causing it. My working hypothesis is that it's somehow related to the fact that the external dock supports two drives in a BTRFS RAID1 configuration and that the kernel is verifying them when it starts up, even though the drives are not being mounted (they have an automount unit but nothing in /etc/fstab).
Why it would delay the rest of the system startup while this is happening is something I don't understand. The delay is very visible (I get three dots on a blank screen while it's happening).
Short version: Is this Btrfs raid1 listed at all in fstab? If so, add noauto,nofail to the mount options, see if that clears it up.
It's not in /etc/fstab in any form.
[...]
So the next gotcha is if /sysroot is not Btrfs. In this case there's a bug in dracut that prevents this udev rule from being put into the initramfs. That means anything that does try to mount a non-root Btrfs during boot, either fstab or gpt discoverable partitions, might possibly fail if "not all devices are ready" at the time of the mount attempt.
https://github.com/dracutdevs/dracut/issues/947
This should be fixed in dracut 055, but if you already have 055 and have an initramfs built with it and this problem you're having is a new problem, maybe we've got a regression in 055 or something? I'm not sure yet...still kinda in the dark on what's going wrong.
Also, it is possible it's not related to this btrfs file system at all, but I'm throwing it out there just as something to be aware of.
After the usb-settle change, I'm hoping it's not BTRFS after all, but I'll bear that in mind. I'm on dracut 054 currently.
poc
On Sun, 2021-06-13 at 14:38 -0600, Chris Murphy wrote:
On Sun, Jun 13, 2021 at 5:26 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
I'm 99% certain it's being caused by my external USB dock starting up. See my reply to Ed. The dock is not mounted at boot, but has a BTRFS filesystem that (possibly) the kernel insists on checking before the rest of the startup can proceed. This is speculation at the moment.
systemd-analyze blame shows a long delay in a unit I created to automatically power down the dock if it's not mounted:
$ systemd-analyze blame|head 4min 18.016s dock-watch.service 30.517s systemd-udev-settle.service 15.273s logrotate.service 6.274s NetworkManager-wait-online.service 5.765s raid.mount 5.452s plymouth-quit-wait.service 5.038s akmods.service 4.541s upower.service 4.427s sssd.service
I've uploaded the dock-watch unit and the scripts it calls, together with the automount unit, to:
https://drive.google.com/drive/folders/1BT5w4u7TzBmWbhx97sWvfOErCIUylrik?usp...
dock-wait script contains: RAID=/dev/sdd
This may not be reliable because /dev nodes frequently change between reboots. You're better off using /dev/disk/by-... any of them are better than node. You can use label, uuid, wwn, whatever.
I actually use a udev rule for idle spin down:
$ cat /etc/udev/rules.d/69-hdparm.rules ACTION=="add", SUBSYSTEM=="block", \ KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL_SHORT}=="WDZ47F0A", \ RUN+="/usr/sbin/hdparm -B 100 -S 252 /dev/disk/by-id/wwn- 0x5000c500a93cae8a" $
Good point. I was already nervous about a hard-wired device name though it hasn't bitten me so far.
poc
On Sun, Jun 13, 2021 at 5:30 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sun, 2021-06-13 at 13:44 -0400, Go Canes wrote:
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
Maybe pvscan wants to check the drive(s) to see if it is a LVM volume? Not sure how to confirm/refute, or how to tell LVM to ignore the USB dock if that is the case.
I'm not using LVM for any of my filesystems, only BTRFS (other than /boot/efi of course).
poc
But LVM doesn't *know* you aren't using LVM until *after* it has scanned all of your disks to see if any of them are LVM PVs. Hence I was wondering if pvscan is trying to access the disks in the dock, eventually the disks spin up, pvscan sees that they are not LVM, and then the boot proceeds.
I could also be way off base here. Just trying to think of anything that in my limited experience would cause boot to (attempt to) access the disks in your USB dock.
On Sun, 2021-06-13 at 19:06 -0400, Go Canes wrote:
On Sun, Jun 13, 2021 at 5:30 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sun, 2021-06-13 at 13:44 -0400, Go Canes wrote:
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
poc
Maybe pvscan wants to check the drive(s) to see if it is a LVM volume? Not sure how to confirm/refute, or how to tell LVM to ignore the USB dock if that is the case.
I'm not using LVM for any of my filesystems, only BTRFS (other than /boot/efi of course).
poc
But LVM doesn't *know* you aren't using LVM until *after* it has scanned all of your disks to see if any of them are LVM PVs. Hence I was wondering if pvscan is trying to access the disks in the dock, eventually the disks spin up, pvscan sees that they are not LVM, and then the boot proceeds.
I could also be way off base here. Just trying to think of anything that in my limited experience would cause boot to (attempt to) access the disks in your USB dock.
OK, however spinning up the disks doesn't take more than 5-10 seconds and I assume pvscan would be almost instantaneous, so I don't think that's the problem.
poc
On Mon, 2021-06-14 at 03:42 +0800, Ed Greshko wrote:
On 13/06/2021 23:57, Patrick O'Callaghan wrote:
I did the same and it has made a big difference, i.e. I no longer have the very long delay waiting for usb-settle.
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
I've not thought about these issues you're having in quite some time. But, while servicing a cat at 3AM I started to wonder.
If you were to disable the dock-watch.service does the boot process complete quickly?
I tried 'systemctl disable dock-watch' but for some reason it's being re-enabled on boot, so that has no effect.
I physically disconnected the dock and it does make a big difference, so there's no doubt where the problem lies.
If it does, might there be a way to start the service after the boot process is completed?
If I can manage to actually disable dock-watch from starting automatically then presumably I could run it from crontab using an '@reboot' line.
poc
On 14/06/2021 17:52, Patrick O'Callaghan wrote:
On Mon, 2021-06-14 at 03:42 +0800, Ed Greshko wrote:
On 13/06/2021 23:57, Patrick O'Callaghan wrote:
I did the same and it has made a big difference, i.e. I no longer have the very long delay waiting for usb-settle.
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
I've not thought about these issues you're having in quite some time. But, while servicing a cat at 3AM I started to wonder.
If you were to disable the dock-watch.service does the boot process complete quickly?
I tried 'systemctl disable dock-watch' but for some reason it's being re-enabled on boot, so that has no effect.
Are you saying that after doing a "systemctl disable" it shows up as "enabled" after a reboot?
I wonder if getting started is due to Wants=dock-watch.service in raid.mount.
I physically disconnected the dock and it does make a big difference, so there's no doubt where the problem lies.
If it does, might there be a way to start the service after the boot process is completed?
If I can manage to actually disable dock-watch from starting automatically then presumably I could run it from crontab using an '@reboot' line.
For a "test" you may want to try masking the service.
On Sun, 2021-06-13 at 14:38 -0600, Chris Murphy wrote:
I actually use a udev rule for idle spin down:
$ cat /etc/udev/rules.d/69-hdparm.rules ACTION=="add", SUBSYSTEM=="block", \ KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL_SHORT}=="WDZ47F0A", \ RUN+="/usr/sbin/hdparm -B 100 -S 252 /dev/disk/by-id/wwn- 0x5000c500a93cae8a" $
BTW, the reason I don't do this is that one of the drives doesn't accept the APM spin-down command:
$ sudo hdparm -B 100 -S 24 /dev/disk/by-id/wwn-0x50014ee058f952e1
/dev/disk/by-id/wwn-0x50014ee058f952e1: setting Advanced Power Management level to 0x64 (100) HDIO_DRIVE_CMD failed: Input/output error setting standby to 24 (2 minutes) APM_level = not supported
poc
On Mon, 2021-06-14 at 18:02 +0800, Ed Greshko wrote:
On 14/06/2021 17:52, Patrick O'Callaghan wrote:
On Mon, 2021-06-14 at 03:42 +0800, Ed Greshko wrote:
On 13/06/2021 23:57, Patrick O'Callaghan wrote:
I did the same and it has made a big difference, i.e. I no longer have the very long delay waiting for usb-settle.
I still don't know why my external dock is being mounted at boot, but I can live with it for now.
I've not thought about these issues you're having in quite some time. But, while servicing a cat at 3AM I started to wonder.
If you were to disable the dock-watch.service does the boot process complete quickly?
I tried 'systemctl disable dock-watch' but for some reason it's being re-enabled on boot, so that has no effect.
Are you saying that after doing a "systemctl disable" it shows up as "enabled" after a reboot?
I wonder if getting started is due to Wants=dock-watch.service in raid.mount.
I physically disconnected the dock and it does make a big difference, so there's no doubt where the problem lies.
If it does, might there be a way to start the service after the boot process is completed?
If I can manage to actually disable dock-watch from starting automatically then presumably I could run it from crontab using an '@reboot' line.
For a "test" you may want to try masking the service.
I commented out the Wants line in raid.mount and removed the dock- watch.service file (masking it gave an error). This effectively removed the boot delay.
Now I'm considering whether to junk the whole dock-watch idea and use Chris's suggestion of a udev rule to just set the drives' standby values directly.
poc
On Mon, 2021-06-14 at 11:10 +0100, Patrick O'Callaghan wrote:
On Sun, 2021-06-13 at 14:38 -0600, Chris Murphy wrote:
I actually use a udev rule for idle spin down:
$ cat /etc/udev/rules.d/69-hdparm.rules ACTION=="add", SUBSYSTEM=="block", \ KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL_SHORT}=="WDZ47F0A", \ RUN+="/usr/sbin/hdparm -B 100 -S 252 /dev/disk/by-id/wwn- 0x5000c500a93cae8a" $
BTW, the reason I don't do this is that one of the drives doesn't accept the APM spin-down command:
$ sudo hdparm -B 100 -S 24 /dev/disk/by-id/wwn-0x50014ee058f952e1
/dev/disk/by-id/wwn-0x50014ee058f952e1: setting Advanced Power Management level to 0x64 (100) HDIO_DRIVE_CMD failed: Input/output error setting standby to 24 (2 minutes) APM_level = not supported
Turns out that the problem isn't with the -S but with the -B. I can just set the standby time directly on this drive, so I'm going to try your udev idea instead of my elaborate dock-watch workaround.
poc
On Mon, 2021-06-14 at 11:35 +0100, Patrick O'Callaghan wrote:
On Mon, 2021-06-14 at 11:10 +0100, Patrick O'Callaghan wrote:
On Sun, 2021-06-13 at 14:38 -0600, Chris Murphy wrote:
I actually use a udev rule for idle spin down:
$ cat /etc/udev/rules.d/69-hdparm.rules ACTION=="add", SUBSYSTEM=="block", \ KERNEL=="sd*[!0-9]", \ ENV{ID_SERIAL_SHORT}=="WDZ47F0A", \ RUN+="/usr/sbin/hdparm -B 100 -S 252 /dev/disk/by-id/wwn- 0x5000c500a93cae8a" $
BTW, the reason I don't do this is that one of the drives doesn't accept the APM spin-down command:
$ sudo hdparm -B 100 -S 24 /dev/disk/by-id/wwn-0x50014ee058f952e1
/dev/disk/by-id/wwn-0x50014ee058f952e1: setting Advanced Power Management level to 0x64 (100) HDIO_DRIVE_CMD failed: Input/output error setting standby to 24 (2 minutes) APM_level = not supported
Turns out that the problem isn't with the -S but with the -B. I can just set the standby time directly on this drive, so I'm going to try your udev idea instead of my elaborate dock-watch workaround.
And we're back ...
The udev idea didn't work out, i.e. made no difference. I've now done a simpler experiment, simply turning the external dock off and monitoring the journal when I turn it on again:
Jun 16 12:25:56 Bree kernel: usb 4-3: new SuperSpeed Gen 1 USB device number 5 using xhci_hcd Jun 16 12:25:56 Bree kernel: usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 Jun 16 12:25:56 Bree kernel: usb 4-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 Jun 16 12:25:56 Bree kernel: usb 4-3: Product: ASM1156-PM Jun 16 12:25:56 Bree kernel: usb 4-3: Manufacturer: ASMT Jun 16 12:25:56 Bree kernel: usb 4-3: SerialNumber: 00000000000000000000 Jun 16 12:25:56 Bree kernel: scsi host6: uas Jun 16 12:25:56 Bree mtp-probe[130155]: checking bus 4, device 5: "/sys/devices/pci0000:00/0000:00:14.0/usb4/4-3" Jun 16 12:25:56 Bree mtp-probe[130155]: bus: 4, device: 5 was not an MTP device Jun 16 12:25:57 Bree mtp-probe[130161]: checking bus 4, device 5: "/sys/devices/pci0000:00/0000:00:14.0/usb4/4-3" Jun 16 12:25:57 Bree mtp-probe[130161]: bus: 4, device: 5 was not an MTP device Jun 16 12:25:57 Bree kernel: scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 16 12:25:57 Bree kernel: scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: Attached scsi generic sg4 type 0 Jun 16 12:25:57 Bree kernel: sd 6:0:0:1: Attached scsi generic sg5 type 0 Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] 4096-byte physical blocks Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] Write Protect is off Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] Mode Sense: 43 00 00 00 Jun 16 12:25:57 Bree kernel: sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 16 12:25:57 Bree kernel: sd 6:0:0:1: [sde] 4096-byte physical blocks Jun 16 12:25:57 Bree kernel: sd 6:0:0:1: [sde] Write Protect is off Jun 16 12:25:57 Bree kernel: sd 6:0:0:1: [sde] Mode Sense: 43 00 00 00 Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 16 12:25:57 Bree kernel: sd 6:0:0:0: [sdd] Attached SCSI disk <--------------------------------------------------A Jun 16 12:26:27 Bree kernel: sd 6:0:0:1: tag#12 uas_eh_abort_handler 0 uas-tag 1 inflight: IN Jun 16 12:26:27 Bree kernel: sd 6:0:0:1: tag#12 CDB: Mode Sense(6) 1a 00 08 00 18 00 Jun 16 12:26:27 Bree kernel: scsi host6: uas_eh_device_reset_handler start Jun 16 12:26:27 Bree kernel: usb 4-3: reset SuperSpeed Gen 1 USB device number 5 using xhci_hcd Jun 16 12:26:27 Bree kernel: scsi host6: uas_eh_device_reset_handler success Jun 16 12:26:27 Bree kernel: sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 16 12:26:27 Bree kernel: sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 16 12:26:27 Bree kernel: sd 6:0:0:1: [sde] Attached SCSI disk <--------------------------------------------------B
What's clear is that the 30-second delay is due to one of the component drives taking that long to come online (both drives are WD 1TB units but with different model numbers). IOW it has nothing to do with BTRFS, nor my dock-watch unit, nor automount.
So two questions:
1) Is there an obvious error with the drive itself? 2) Is there a way to prevent the system boot from freezing while it waits for the drive to come online?
Thanks for any insights.
poc
On 16/06/2021 19:35, Patrick O'Callaghan wrote:
So two questions:
- Is there an obvious error with the drive itself?
If you've not seen errors when you're using the drive, then probably not.
- Is there a way to prevent the system boot from freezing while it
waits for the drive to come online?
How about trying to turn it into user unit/service? It would require a login. But may not delay that process.
On Wed, 2021-06-16 at 20:18 +0800, Ed Greshko wrote:
On 16/06/2021 19:35, Patrick O'Callaghan wrote:
So two questions:
- Is there an obvious error with the drive itself?
If you've not seen errors when you're using the drive, then probably not.
I haven't seen any errors, so it's probably a feature rather than a bug with this drive.
- Is there a way to prevent the system boot from freezing while it
waits for the drive to come online?
How about trying to turn it into user unit/service? It would require a login. But may not delay that process.
There's no indication it has anything to do with my existing unit(s) such as dock-watch or raid.[auto]mount, neither of which are logged in the journal extract I posted, so I don't think converting any of that into a user unit or service would make any difference.
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
poc
On Wed, 16 Jun 2021 14:46:42 +0100 Patrick O'Callaghan wrote:
I haven't seen any errors, so it's probably a feature rather than a bug with this drive.
I have two USB drives on my system, and one of them takes about 30 seconds to spin up. It is very annoying when I'm rebooting the system but it decides it has to spin up the unmounted drive before it can finish. I can understand the need to scan everything when it boots up, but can't imagine why it needs to scan everything on the way down.
On 16/06/2021 21:46, Patrick O'Callaghan wrote:
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
Well, 30sec is annoying. But I thought that it was the 4 min that getting rid of would be the 1st goal.
On Wed, 2021-06-16 at 22:36 +0800, Ed Greshko wrote:
On 16/06/2021 21:46, Patrick O'Callaghan wrote:
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
Well, 30sec is annoying. But I thought that it was the 4 min that getting rid of would be the 1st goal.
I think it was Frank who had the 4 minute delay. I don't think I said that mine was 4 minutes, just that there was a delay and I see a blank screen with three dots, but apologies if I gave that impression.
poc
On Wed, 2021-06-16 at 10:17 -0400, Tom Horsley wrote:
On Wed, 16 Jun 2021 14:46:42 +0100 Patrick O'Callaghan wrote:
I haven't seen any errors, so it's probably a feature rather than a bug with this drive.
I have two USB drives on my system, and one of them takes about 30 seconds to spin up. It is very annoying when I'm rebooting the system but it decides it has to spin up the unmounted drive before it can finish. I can understand the need to scan everything when it boots up, but can't imagine why it needs to scan everything on the way down.
I'm not aware of it scanning on the way down. My problem is that it scans a drive that it isn't going to need until something mounts it, and makes everything else wait till it finishes. There doesn't appear to be something analogous to the NFS "soft-mount" option.
poc
On 6/16/21 12:32 PM, Patrick O'Callaghan wrote:
I'm not aware of it scanning on the way down. My problem is that it scans a drive that it isn't going to need until something mounts it, and makes everything else wait till it finishes. There doesn't appear to be something analogous to the NFS "soft-mount" option.
Is the drive in question listed in /etc/fstab? If so, do the mount options include noauto, and if not, why not? If the drive isn't listed in fstab, consider putting it there, listed by UUID and including noauto and see if that helps.
On 17/06/2021 02:27, Patrick O'Callaghan wrote:
On Wed, 2021-06-16 at 22:36 +0800, Ed Greshko wrote:
On 16/06/2021 21:46, Patrick O'Callaghan wrote:
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
Well, 30sec is annoying. But I thought that it was the 4 min that getting rid of would be the 1st goal.
I think it was Frank who had the 4 minute delay. I don't think I said that mine was 4 minutes, just that there was a delay and I see a blank screen with three dots, but apologies if I gave that impression.
I no longer have it, but I think I got that from a systemd-analyse plot that was posted.
Oh, well. I've gotten to the point where 30s more of boot time doesn't bother me that much. What bothered me more was when I would go to a directory on which a subdirectory of it had an NFS mount point with a NAS whose drives were set to power-save. Even though I wasn't entering the subdirectory I suppose a look-ahead was happening so my terminal session freeze until the drives came ready.
On 6/16/21 3:11 PM, Ed Greshko wrote:
On 17/06/2021 02:27, Patrick O'Callaghan wrote:
On Wed, 2021-06-16 at 22:36 +0800, Ed Greshko wrote:
On 16/06/2021 21:46, Patrick O'Callaghan wrote:
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
Well, 30sec is annoying. But I thought that it was the 4 min that getting rid of would be the 1st goal.
I think it was Frank who had the 4 minute delay. I don't think I said that mine was 4 minutes, just that there was a delay and I see a blank screen with three dots, but apologies if I gave that impression.
No, my delay was about2 minutes. But disabling and masking that udev.settle unit took care of that.
I no longer have it, but I think I got that from a systemd-analyse plot that was posted.
Oh, well. I've gotten to the point where 30s more of boot time doesn't bother me that much. What bothered me more was when I would go to a directory on which a subdirectory of it had an NFS mount point with a NAS whose drives were set to power-save. Even though I wasn't entering the subdirectory I suppose a look-ahead was happening so my terminal session freeze until the drives came ready.
On Wed, 2021-06-16 at 11:55 -0700, Samuel Sieb wrote:
On 2021-06-16 4:35 a.m., Patrick O'Callaghan wrote:
- Is there a way to prevent the system boot from freezing while it
waits for the drive to come online?
Did you try the dracut initrd modification and possibly also masking udev settle on the system as well?
Yes, both. No effect.
poc
On Thu, 2021-06-17 at 03:11 +0800, Ed Greshko wrote:
On 17/06/2021 02:27, Patrick O'Callaghan wrote:
On Wed, 2021-06-16 at 22:36 +0800, Ed Greshko wrote:
On 16/06/2021 21:46, Patrick O'Callaghan wrote:
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
Well, 30sec is annoying. But I thought that it was the 4 min that getting rid of would be the 1st goal.
I think it was Frank who had the 4 minute delay. I don't think I said that mine was 4 minutes, just that there was a delay and I see a blank screen with three dots, but apologies if I gave that impression.
I no longer have it, but I think I got that from a systemd-analyse plot that was posted.
Oh, well. I've gotten to the point where 30s more of boot time doesn't bother me that much.
I can live with it if there's no alternative. It just seems completely unnecessary so I'd like to stop it happening if possible.
What bothered me more was when I would go to a directory on which a subdirectory of it had an NFS mount point with a NAS whose drives were set to power-save. Even though I wasn't entering the subdirectory I suppose a look-ahead was happening so my terminal session freeze until the drives came ready.
Yes, I've seen that too in other cases.
poc
On Wed, 2021-06-16 at 15:57 -0400, Frank McCormick wrote:
I think it was Frank who had the 4 minute delay. I don't think I said that mine was 4 minutes, just that there was a delay and I see a blank screen with three dots, but apologies if I gave that impression.
No, my delay was about2 minutes. But disabling and masking that udev.settle unit took care of that.
I did that too and at first thought it made a difference (in fact it probably did make some difference), but some amount of delay remains.
poc
On Wed, 2021-06-16 at 13:03 -0600, Joe Zeff wrote:
On 6/16/21 12:32 PM, Patrick O'Callaghan wrote:
I'm not aware of it scanning on the way down. My problem is that it scans a drive that it isn't going to need until something mounts it, and makes everything else wait till it finishes. There doesn't appear to be something analogous to the NFS "soft-mount" option.
Is the drive in question listed in /etc/fstab? If so, do the mount options include noauto, and if not, why not? If the drive isn't listed in fstab, consider putting it there, listed by UUID and including noauto and see if that helps.
Did that, also disabled the raid.mount and raid.automount units, and rebooted.
It didn't help.
poc
On 16Jun2021 10:17, Tom Horsley horsley1953@gmail.com wrote:
On Wed, 16 Jun 2021 14:46:42 +0100 Patrick O'Callaghan wrote:
I haven't seen any errors, so it's probably a feature rather than a bug with this drive.
I have two USB drives on my system, and one of them takes about 30 seconds to spin up. It is very annoying when I'm rebooting the system but it decides it has to spin up the unmounted drive before it can finish. I can understand the need to scan everything when it boots up, but can't imagine why it needs to scan everything on the way down.
I've seen systems need to spin up drive to umount then - to sync the fs. Even for an idle fs the kernel seems to want something, maybe formally closing a journal or something?
To test, do a sync, wait for the drive to spin down, then umount. See what happens during the umount.
Cheers, Cameron Simpson cs@cskk.id.au
On 2021-06-16 2:27 p.m., Patrick O'Callaghan wrote:
On Wed, 2021-06-16 at 11:55 -0700, Samuel Sieb wrote:
On 2021-06-16 4:35 a.m., Patrick O'Callaghan wrote:
- Is there a way to prevent the system boot from freezing while it
waits for the drive to come online?
Did you try the dracut initrd modification and possibly also masking udev settle on the system as well?
Yes, both. No effect.
Do you see the udevadm warning message in the logs? Can you see where the delay is happening in the logs?
On Wed, 2021-06-16 at 17:52 -0700, Samuel Sieb wrote:
On 2021-06-16 2:27 p.m., Patrick O'Callaghan wrote:
On Wed, 2021-06-16 at 11:55 -0700, Samuel Sieb wrote:
On 2021-06-16 4:35 a.m., Patrick O'Callaghan wrote:
- Is there a way to prevent the system boot from freezing while
it waits for the drive to come online?
Did you try the dracut initrd modification and possibly also masking udev settle on the system as well?
Yes, both. No effect.
Do you see the udevadm warning message in the logs? Can you see where the delay is happening in the logs?
I updated my system this morning and rebooted. I had made no other changes since my previous reboot, but this time there was no delay. I've preserved the log just in case and will reply again if the situation changes.
I uploaded the list of dnf changes to https://paste.centos.org/view/70eb7f29 if anyone wants to take a look.
poc
On Wed, 16 Jun 2021 at 10:47, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Wed, 2021-06-16 at 20:18 +0800, Ed Greshko wrote:
On 16/06/2021 19:35, Patrick O'Callaghan wrote:
So two questions:
- Is there an obvious error with the drive itself?
If you've not seen errors when you're using the drive, then probably not.
I haven't seen any errors, so it's probably a feature rather than a bug with this drive.
Back in the days of 5 inch SCSI drives, some systems required a manual startup sequence to avoid overloading the UPS.
Spinning up disk platters makes a spike in the power demand. A "just right" sized power supply can be more efficient, but needs some management of demand spikes, including recovery time to manage heat. This is done using "staggered spinup" https://en.wikipedia.org/wiki/Spin-up and < https://sata-io.org/system/files/member-downloads/Implementing%20Staggered%2...
.
The Wikipedia article mentions Power-Up In Standby (PUIS) -- might be worth investigation.
https://www.seagate.com/files/docs/pdf/en-GB/whitepaper/tp608-powerchoice-te... describes a series of power saving modes.
- Is there a way to prevent the system boot from freezing while it
waits for the drive to come online?
How about trying to turn it into user unit/service? It would require a login. But may not delay that process.
There's no indication it has anything to do with my existing unit(s) such as dock-watch or raid.[auto]mount, neither of which are logged in the journal extract I posted, so I don't think converting any of that into a user unit or service would make any difference.
Just to be sure, my next step will be to completely disable those units and reboot again, but I'm 99% sure that is still going to show the 30- second delay.
The technical specifications of the drives should mention startup power
management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general-purpose CPU.
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
The technical specifications of the drives should mention startup
power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general-purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
poc
On 17Jun2021 12:15, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
The technical specifications of the drives should mention startup
power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general-purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
On the subject of power as raised by George, I had trouble with too many USB bus powered drives on the home server. A powered USB 3.1 hub helps me out there. Doubtless it has its own limits.
Cheers, Cameron Simpson cs@cskk.id.au
On Fri, 2021-06-18 at 08:58 +1000, Cameron Simpson wrote:
On 17Jun2021 12:15, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
The technical specifications of the drives should mention startup
power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general-purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
On the subject of power as raised by George, I had trouble with too many USB bus powered drives on the home server. A powered USB 3.1 hub helps me out there. Doubtless it has its own limits.
The dock has its own power supply.
poc
On Fri, 18 Jun 2021 at 06:50, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Fri, 2021-06-18 at 08:58 +1000, Cameron Simpson wrote:
On 17Jun2021 12:15, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
The technical specifications of the drives should mention startup
power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general-purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
On the subject of power as raised by George, I had trouble with too many USB bus powered drives on the home server. A powered USB 3.1 hub helps me out there. Doubtless it has its own limits.
The dock has its own power supply.
Is this a full-featured dock with USB and monitor outlets or a disk only dock?
It might be useful to compare the rating of the power supply with the peak load from the drive spec sheet. Older high RPM drives had quite high startup draw. Does the power supply have multiple voltages or just one voltage (the latter requires conversion in the hub)? If the power supply can't meet the startup requirements of two drives then it must be doing some power management. The simplest approach would be to provide a delay before starting one of the drives.
Amazon has a popular Sabrent disk-only dock -- one review notes:
Cons - when using 2 drives and plugging or unplugging one drive BOTH go offline, at least temporarily. NOTE: Seems to be a common limitation to all these docks. I have yet to find one that does not behave this way. Must be the way the SATA bus controller is designed. - when plugging in 2 drives, they mount sequentially, meaning you have to wait for one to mount before the other will
On Fri, 2021-06-18 at 09:02 -0300, George N. White III wrote:
On Fri, 18 Jun 2021 at 06:50, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Fri, 2021-06-18 at 08:58 +1000, Cameron Simpson wrote:
On 17Jun2021 12:15, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
The technical specifications of the drives should mention startup
power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general- purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
On the subject of power as raised by George, I had trouble with too many USB bus powered drives on the home server. A powered USB 3.1 hub helps me out there. Doubtless it has its own limits.
The dock has its own power supply.
Is this a full-featured dock with USB and monitor outlets or a disk only dock?
It might be useful to compare the rating of the power supply with the peak load from the drive spec sheet. Older high RPM drives had quite high startup draw. Does the power supply have multiple voltages or just one voltage (the latter requires conversion in the hub)? If the power supply can't meet the startup requirements of two drives then it must be doing some power management. The simplest approach would be to provide a delay before starting one of the drives.
Interesting idea. Mine is this model:
https://www.amazon.co.uk/gp/product/B06XYJGDTH/ref=ppx_yo_dt_b_search_asin_t...
The power block says its output is 3A at 12V.
The drives are both WD model WD10EZEX, (though the label on one says it has a 64MB cache and the other doesn't). Both labels say 5VDC, 0.68A and 12VDC, 0.55A. Looks like the dock's power should be enough.
Amazon has a popular Sabrent disk-only dock -- one review notes:
Cons
- when using 2 drives and plugging or unplugging one drive BOTH
go offline, at least temporarily. NOTE: Seems to be a common limitation to all these docks. I have yet to find one that does not behave this way. Must be the way the SATA bus controller is designed.
- when plugging in 2 drives, they mount sequentially, meaning you
have to wait for one to mount before the other will
My problem is that one drive comes up almost instantly and the other takes 30 seconds. In fact I can live with that. My real gripe is that the kernel makes me wait even though the drive is not being accessed. If it just wants to make the drive available, it should be able to wait asynchronously.
poc
On Fri, Jun 18, 2021, 11:10 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
My problem is that one drive comes up almost instantly and the other takes 30 seconds. In fact I can live with that. My real gripe is that the kernel makes me wait even though the drive is not being accessed. If it just wants to make the drive available, it should be able to wait asynchronously.
Keep the hardware config the same, but boot a Fedora Live image (from USB stick or whatever). Does it still hang during boot?
The kernel shouldn't wait unless the device has put some kind of busy state on the bus shared by sysroot.
I'm suspicious that something is trying to mount it, or otherwise access it, but I haven't seen the logs.
Chris Murphy
On Fri, 18 Jun 2021 at 14:10, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Fri, 2021-06-18 at 09:02 -0300, George N. White III wrote:
On Fri, 18 Jun 2021 at 06:50, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Fri, 2021-06-18 at 08:58 +1000, Cameron Simpson wrote:
On 17Jun2021 12:15, Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Thu, 2021-06-17 at 07:20 -0300, George N. White III wrote:
> The technical specifications of the drives should mention > startup power management. There may also be some power management in the dock. I've noticed that it is becoming more difficult to find detailed documentation of add-on hardware. At one time you could open the box, identify key IC's and get the spec. sheets, but now you may find a general- purpose CPU.
It's a cheap dock so probably not easy to find any technical documentation. One thing I might try is to swap the two drives around just to see if it's always the same one that causes the delay.
On the subject of power as raised by George, I had trouble with too many USB bus powered drives on the home server. A powered USB 3.1 hub helps me out there. Doubtless it has its own limits.
The dock has its own power supply.
Is this a full-featured dock with USB and monitor outlets or a disk only dock?
It might be useful to compare the rating of the power supply with the peak load from the drive spec sheet. Older high RPM drives had quite high startup draw. Does the power supply have multiple voltages or just one voltage (the latter requires conversion in the hub)? If the power supply can't meet the startup requirements of two drives then it must be doing some power management. The simplest approach would be to provide a delay before starting one of the drives.
Interesting idea. Mine is this model:
https://www.amazon.co.uk/gp/product/B06XYJGDTH/ref=ppx_yo_dt_b_search_asin_t...
The power block says its output is 3A at 12V.
The drives are both WD model WD10EZEX, (though the label on one says it has a 64MB cache and the other doesn't). Both labels say 5VDC, 0.68A and 12VDC, 0.55A. Looks like the dock's power should be enough.
The spec sheet < https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/pu...
says 12V peak load is 2.5 A. It does take power to spin up to 7500 RPM, and I doubt your supply could survive without staggered startup.
Amazon has a popular Sabrent disk-only dock -- one review notes:
Cons
- when using 2 drives and plugging or unplugging one drive BOTH
go offline, at least temporarily. NOTE: Seems to be a common limitation to all these docks. I have yet to find one that does not behave this way. Must be the way the SATA bus controller is designed.
- when plugging in 2 drives, they mount sequentially, meaning you
have to wait for one to mount before the other will
My problem is that one drive comes up almost instantly and the other takes 30 seconds. In fact I can live with that. My real gripe is that the kernel makes me wait even though the drive is not being accessed. If it just wants to make the drive available, it should be able to wait asynchronously.
Agreed, but then you need a way to tell the kernel that it won't need anything from the external drives so it is OK to continue booting. Have you considered automounting the drives?
I make fstab mount options include:
nofail,noauto,x-systemd.automount
That way it is only mounted on demand, i.e. when the mount point is "touched".
It's also possible to use x-systemd.idle-timeout=300 which will unmount it after 5 minutes idle.
-- Chris Murphy
On 19/06/2021 09:26, Chris Murphy wrote:
I make fstab mount options include:
nofail,noauto,x-systemd.automount
That way it is only mounted on demand, i.e. when the mount point is "touched".
It's also possible to use x-systemd.idle-timeout=300 which will unmount it after 5 minutes idle.
I'll admit that I've lost track of this thread. I think it is more about poc's issue than the OP. And, I don't know if poc has given up on the systemd mounts.
While I don't have a dock, I do use systemd's automount features on some NAS mounts as I don't want/need them to be mounted all the time. They have been mounted at boot time even though I thought they should not have been. I'm pretty sure poc's systemd units were close to mine. I created my systemd units as described in various tutorials. The mount unit does contain noauto in the options. The mounting at boot time didn't bother me since they take no time.
Today I decided to make a change to my units in /etc/systemd/system.
I removed the WantedBy=multi-user.target from the mount unit. I then did a daemon-reload. I noted that there were 2 broken syslinks associated with the mounts in /etc/systemd/system/multi-user.target.wants so I deleted them. I then rebooted.
It was then I realized I'd forgotten to check that I'd enabled the automount unit. I hadn't. So, I enabled it and rebooted.
The mount no longer happens at boot time and the status of the automount unit is Active: active (waiting). And everything works as expected. Mount upon accessing the mount point and unmounting happening after the timeout has expired.
Patrick O'Callaghan:
Interesting idea. Mine is this model:
https://www.amazon.co.uk/gp/product/B06XYJGDTH/ref=ppx_yo_dt_b_search_asin_t...
The power block says its output is 3A at 12V.
The drives are both WD model WD10EZEX, (though the label on one says it has a 64MB cache and the other doesn't). Both labels say 5VDC, 0.68A and 12VDC, 0.55A. Looks like the dock's power should be enough.
George N. White III:
The spec sheet https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/internal-drives/wd-blue-hdd/product-brief-western-digital-wd-blue-pc-hdd.pdf says 12V peak load is 2.5 A. It does take power to spin up to 7500 RPM, and I doubt your supply could survive without staggered startup.
Or, the drive could do power management (if they were designed better). i.e. Not try to start the motor at full pelt, but ramp it up gently. It's not just those little docks' power supplies that might badly hiccup under sudden load.
30 seconds does seem a bit of an extreme length to delay things, though. We're not talking about the old washing-machine sized hard drives with huge platters.
On 19/06/2021 11:58, Ed Greshko wrote:
The mount no longer happens at boot time and the status of the automount unit is Active: active (waiting). And everything works as expected. Mount upon accessing the mount point and unmounting happening after the timeout has expired.
Well, I spoke too soon. And I'm currently stumped.
The changes that I noted were tested on 2 different VM's. One running F33 the other F34. Both worked fine.
Same thing on my F34 bare metal system? Nope. The NFS mounts all get mounted.
Scratching head.
On Fri, 2021-06-18 at 19:36 -0300, George N. White III wrote:
The power block says its output is 3A at 12V.
The drives are both WD model WD10EZEX, (though the label on one says
it
has a 64MB cache and the other doesn't). Both labels say 5VDC, 0.68A and 12VDC, 0.55A. Looks like the dock's power should be enough.
The spec sheet < https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/pu...
says 12V peak load is 2.5 A. It does take power to spin up to 7500 RPM, and I doubt your supply could survive without staggered startup.
I see. That makes sense I guess, but 30 seconds is still a very long time to wait for the first drive to reach speed before spinning up the second one. Unfortunately the data sheet says nothing about start time.
Amazon has a popular Sabrent disk-only dock -- one review notes:
Cons
- when using 2 drives and plugging or unplugging one drive BOTH
go offline, at least temporarily. NOTE: Seems to be a common
limitation
to all these docks. I have yet to find one that does not behave
this
way. Must be the way the SATA bus controller is designed.
- when plugging in 2 drives, they mount sequentially, meaning you
have to wait for one to mount before the other will
My problem is that one drive comes up almost instantly and the other takes 30 seconds. In fact I can live with that. My real gripe is that the kernel makes me wait even though the drive is not being accessed. If it just wants to make the drive available, it should be able to
wait
asynchronously.
Agreed, but then you need a way to tell the kernel that it won't need anything from the external drives so it is OK to continue booting. Have you considered automounting the drives?
They are automounted, and have been ever since I started using them. The drives are used almost exclusively for backup during the night, which is why I want them spun down most of the time and started on demand.
poc
On Fri, 2021-06-18 at 19:26 -0600, Chris Murphy wrote:
I make fstab mount options include:
nofail,noauto,x-systemd.automount
That way it is only mounted on demand, i.e. when the mount point is "touched".
It's also possible to use x-systemd.idle-timeout=300 which will unmount it after 5 minutes idle.
I use a systemd <device>.mount because I need a "Wants" line to trigger my power-down script. There doesn't appear to be a way to do this via /etc/fstab.
Also, there appears to be no way to set "nofail" *without* using /etc/fstab.
poc
On Sat, 2021-06-19 at 11:58 +0800, Ed Greshko wrote:
I removed the WantedBy=multi-user.target from the mount unit. I then did a daemon-reload. I noted that there were 2 broken syslinks associated with the mounts in /etc/systemd/system/multi-user.target.wants so I deleted them. I then rebooted.
AFAIK anything in the [Install] section is only processed statically by the 'enable' and 'disable' actions, i.e. not at runtime. That being the case, I don't see how the unit is ever going to be activated, but I admit that my understanding of all this is hazy at best.
poc
On Fri, 2021-06-18 at 12:15 -0600, Chris Murphy wrote:
On Fri, Jun 18, 2021, 11:10 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
My problem is that one drive comes up almost instantly and the other takes 30 seconds. In fact I can live with that. My real gripe is that the kernel makes me wait even though the drive is not being accessed. If it just wants to make the drive available, it should be able to wait asynchronously.
Keep the hardware config the same, but boot a Fedora Live image (from USB stick or whatever). Does it still hang during boot?
The kernel shouldn't wait unless the device has put some kind of busy state on the bus shared by sysroot.
I'm suspicious that something is trying to mount it, or otherwise access it, but I haven't seen the logs.
OK, I've done that and uploaded the results to: https://drive.google.com/drive/folders/1J3unPJIN-XNQwuwLZNsaPp7JqYH6M3YB?usp...
This is using Fedora-KDE-Live-x86_64-34-1.2.iso.
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Clearly the 30-second delay is not happening. There is a 14-second delay due to usb-settle, but it doesn't hold up the boot process. (There's another delay is due to checking the MD5 checksum on the thumbdrive before the boot proper starts, and which I interrupted).
poc
On 19/06/2021 19:22, Patrick O'Callaghan wrote:
On Sat, 2021-06-19 at 11:58 +0800, Ed Greshko wrote:
I removed the WantedBy=multi-user.target from the mount unit. I then did a daemon-reload. I noted that there were 2 broken syslinks associated with the mounts in /etc/systemd/system/multi-user.target.wants so I deleted them. I then rebooted.
AFAIK anything in the [Install] section is only processed statically by the 'enable' and 'disable' actions, i.e. not at runtime. That being the case, I don't see how the unit is ever going to be activated, but I admit that my understanding of all this is hazy at best.
Well, all I know at the moment is the VM's work, the bare metal system doesn't.
The other weird thing I can't explain is that I see the follwing line in the journal at the boot time before any user logs in.
Jun 20 06:42:16 meimei.greshko.com systemd[1]: aux.automount: Got automount request for /aux, triggered by 2344 (daemon-init) Jun 20 06:42:16 meimei.greshko.com systemd[1]: Mounting nfs mount aux...
I think this will remain an unsolved mystery.
On 20/06/2021 06:58, Ed Greshko wrote:
On 19/06/2021 19:22, Patrick O'Callaghan wrote:
On Sat, 2021-06-19 at 11:58 +0800, Ed Greshko wrote:
I removed the WantedBy=multi-user.target from the mount unit. I then did a daemon-reload. I noted that there were 2 broken syslinks associated with the mounts in /etc/systemd/system/multi-user.target.wants so I deleted them. I then rebooted.
AFAIK anything in the [Install] section is only processed statically by the 'enable' and 'disable' actions, i.e. not at runtime. That being the case, I don't see how the unit is ever going to be activated, but I admit that my understanding of all this is hazy at best.
Well, all I know at the moment is the VM's work, the bare metal system doesn't.
The other weird thing I can't explain is that I see the follwing line in the journal at the boot time before any user logs in.
Jun 20 06:42:16 meimei.greshko.com systemd[1]: aux.automount: Got automount request for /aux, triggered by 2344 (daemon-init) Jun 20 06:42:16 meimei.greshko.com systemd[1]: Mounting nfs mount aux...
I think this will remain an unsolved mystery.
OK. My mystery has been "solved" by observation/deduction and a log entry.
Took my blinders off, and noticed the following in the journal at boot time.
Jun 20 06:42:16 meimei.greshko.com dnsmasq[2449]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses Jun 20 06:42:16 meimei.greshko.com dnsmasq-dhcp[2449]: read /var/lib/libvirt/dnsmasq/default.hostsfile Jun 20 06:42:16 meimei.greshko.com systemd[1]: aux.automount: Got automount request for /aux, triggered by 2344 (daemon-init) Jun 20 06:42:16 meimei.greshko.com systemd[1]: Mounting nfs mount aux...
But I failed to look a bit further into the log to see.
Jun 20 06:42:19 meimei.greshko.com libvirtd[2306]: hostname: meimei.greshko.com Jun 20 06:42:19 meimei.greshko.com libvirtd[2306]: cannot open directory '/aux/linux-releases/bsd': No such file or directory Jun 20 06:42:19 meimei.greshko.com libvirtd[2306]: internal error: Failed to autostart storage pool 'bsd': cannot open directory '/aux/linux-releases/bsd': No such file or directory
I then departed from my normal routine. When I login after boot all of the NFS mounts are active. So, this time I waited for them to auto-unmount. I then started virt-manager. And, what do you know, the NFS mounts all became active again.
So, I've concluded that my NFS mounting issues are all related to running virtualization stuff.
On Sun, 2021-06-20 at 08:50 +0800, Ed Greshko wrote:
So, I've concluded that my NFS mounting issues are all related to running virtualization stuff.
I do run libvirtd (and virt-manager), though most of the time I have no VMs active. I also don't have any NFS mounts (the VM I mostly use is Windows), so the scenario is probably different from yours.
poc
On 20/06/2021 17:32, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 08:50 +0800, Ed Greshko wrote:
So, I've concluded that my NFS mounting issues are all related to running virtualization stuff.
I do run libvirtd (and virt-manager), though most of the time I have no VMs active. I also don't have any NFS mounts (the VM I mostly use is
Couple of things. The mounts would happen even if no VM's are active. I don't think it would matter the type of mount. My issue occurred based on the paths defined in the storage pools. A quick way to eliminate that as an issue would be to mask the libvirtd service. Disabling doesn't "enough". I did that to verify my conclusion. FWIW, I only just did that as I had to wait for a long job in a VM to complete.
So, at least in my case, it is good to know the cause even though I won't make the effort and make adjustments.
On Sun, 2021-06-20 at 17:53 +0800, Ed Greshko wrote:
On 20/06/2021 17:32, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 08:50 +0800, Ed Greshko wrote:
So, I've concluded that my NFS mounting issues are all related to running virtualization stuff.
I do run libvirtd (and virt-manager), though most of the time I have no VMs active. I also don't have any NFS mounts (the VM I mostly use is
Couple of things. The mounts would happen even if no VM's are active. I don't think it would matter the type of mount. My issue occurred based on the paths defined in the storage pools. A quick way to eliminate that as an issue would be to mask the libvirtd service. Disabling doesn't "enough". I did that to verify my conclusion. FWIW, I only just did that as I had to wait for a long job in a VM to complete.
I tried masking libvirtd and rebooted. The first time, the system came up quickly with no delay. When I rebooted again, the delay was back, though I had made no change in the meantime.
So I don't think libvirtd is the culprit in my case.
poc
On 20/06/2021 21:51, Patrick O'Callaghan wrote:
So I don't think libvirtd is the culprit in my case.
As long as you're issues allowed me to determine the cause of the issue I was having, that's all that matters. :-) :-)
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
poc
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
On Mon, 2021-06-21 at 02:41 +0800, Ed Greshko wrote:
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
Try it now.
poc
On 21/06/2021 02:50, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 02:41 +0800, Ed Greshko wrote:
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
Try it now.
OK, it works.
How about "systemd-analyse time" for installed?
On 21/06/2021 02:50, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 02:41 +0800, Ed Greshko wrote:
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
Try it now.
Before doing more "pillow" research, the first thing I wonder about is....
30.578s dracut-initqueue.service
On my system it is...
21ms dracut-initqueue.service
If you were to boot your installed system with the dock disconnected what would you see?
On Mon, 2021-06-21 at 03:48 +0800, Ed Greshko wrote:
On 21/06/2021 02:50, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 02:41 +0800, Ed Greshko wrote:
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
Try it now.
OK, it works.
How about "systemd-analyse time" for installed?
$ systemd-analyze time Startup finished in 9.256s (firmware) + 3.026s (loader) + 1.186s (kernel) + 31.678s (initrd) + 3min 46.464s (userspace) = 4min 31.611s graphical.target reached after 20.069s in userspace
poc
On Mon, 2021-06-21 at 03:55 +0800, Ed Greshko wrote:
On 21/06/2021 02:50, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 02:41 +0800, Ed Greshko wrote:
On 21/06/2021 00:16, Patrick O'Callaghan wrote:
On Sun, 2021-06-20 at 23:08 +0800, Ed Greshko wrote:
On 19/06/2021 20:09, Patrick O'Callaghan wrote:
There are three files:
live-blame (output of systemd-analyze blame) live-boot.svg (output of systemd-analyze plot) live-journal (output of journal after logging into KDE)
Could you also upload the complete journal from start of boot until the login screen is shown for a boot with the delay, for comparison.
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
Access denied.
Try it now.
Before doing more "pillow" research, the first thing I wonder about is....
30.578s dracut-initqueue.service
On my system it is...
21ms dracut-initqueue.service
If you were to boot your installed system with the dock disconnected what would you see?
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
poc
On 21/06/2021 05:48, Patrick O'Callaghan wrote:
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
Right. I just wanted to confirm that, it would appear, the 30sec is in that service/area.
So far I've not found a configuration file/setting which would tell it "don't look here...".
On 21/06/2021 06:02, Ed Greshko wrote:
On 21/06/2021 05:48, Patrick O'Callaghan wrote:
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
Right. I just wanted to confirm that, it would appear, the 30sec is in that service/area.
So far I've not found a configuration file/setting which would tell it "don't look here...".
This may be a bit of work, but *may* help define/narrow the issue.
https://fedoraproject.org/wiki/How_to_debug_Dracut_problems
On 21/06/2021 06:02, Ed Greshko wrote:
On 21/06/2021 05:48, Patrick O'Callaghan wrote:
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
Right. I just wanted to confirm that, it would appear, the 30sec is in that service/area.
So far I've not found a configuration file/setting which would tell it "don't look here...".
You're running software raid on the dock drives, right? Is that the only place you're using raid?
Does you system have a /etc/mdadm.conf?
I wonder if adding this
mdadmconf="{yes|no}" Include local /etc/mdadm.conf (default=yes)
and regenerating the initramfs would help.
On 6/20/21 3:48 PM, Patrick O'Callaghan wrote:
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested.
Is it always the same drive taking so long? If so, that's the culprit.
On Sun, Jun 20, 2021 at 3:48 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
dmesg will show this whole sequence: dock appearing, bus appearing, drive on bus appearing, partition map appearing. I couldn't open the previous journal log provided, it wasn't publicly visible or I'd have taken a gander.
On Mon, 2021-06-21 at 06:59 +0800, Ed Greshko wrote:
On 21/06/2021 06:02, Ed Greshko wrote:
On 21/06/2021 05:48, Patrick O'Callaghan wrote:
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
Right. I just wanted to confirm that, it would appear, the 30sec is in that service/area.
So far I've not found a configuration file/setting which would tell it "don't look here...".
You're running software raid on the dock drives, right? Is that the only place you're using raid?
I'm only using BTRFS raid.
Does you system have a /etc/mdadm.conf?
No. I did have one under F33, but since doing a clean install of F34 that's no longer the case.
poc
On Sun, 2021-06-20 at 17:40 -0600, Joe Zeff wrote:
On 6/20/21 3:48 PM, Patrick O'Callaghan wrote:
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested.
Is it always the same drive taking so long? If so, that's the culprit.
Yes, that's been established (strictly, I haven't bothered swapping the drives in the dock to see what happens, but it's immaterial). Again, the issue is not that one drive takes time to spin up, but that the system start-up waits for it when it doesn't need to.
poc
On Sun, 2021-06-20 at 18:00 -0600, Chris Murphy wrote:
On Sun, Jun 20, 2021 at 3:48 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
dmesg will show this whole sequence: dock appearing, bus appearing, drive on bus appearing, partition map appearing. I couldn't open the previous journal log provided, it wasn't publicly visible or I'd have taken a gander.
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
poc
On Mon, 2021-06-21 at 06:19 +0800, Ed Greshko wrote:
On 21/06/2021 06:02, Ed Greshko wrote:
On 21/06/2021 05:48, Patrick O'Callaghan wrote:
$ systemd-analyze blame |grep dracut-initqueue.service 486ms dracut-initqueue.service
If I power on the dock on after startup is complete, one drive appears immediately and the other takes 30 seconds or so, so the delay is not being caused by the boot process itself. It must be the hardware (the drive or the dock) taking that long for whatever reason, possibly power management as George suggested. As I've said, my goal is to convince the kernel that it doesn't need to wait for this so as to continue with the startup.
Right. I just wanted to confirm that, it would appear, the 30sec is in that service/area.
So far I've not found a configuration file/setting which would tell it "don't look here...".
This may be a bit of work, but *may* help define/narrow the issue.
Hmm. I see there's a way to breakpoint the boot process e.g. at the pre-udev stage, but I'm not sure what I can achieve by doing that.
poc
On 21/06/2021 17:20, Patrick O'Callaghan wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
How about supplying the dmesg output?
Wondering if the BTRFS raid is detected at that early stage.
On Mon, 2021-06-21 at 17:43 +0800, Ed Greshko wrote:
On 21/06/2021 17:20, Patrick O'Callaghan wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
How about supplying the dmesg output?
Wondering if the BTRFS raid is detected at that early stage.
Uploaded to the installed-boot folder.
I thought the contents of dmesg were already in the journal, but I may have misunderstood their relationship.
poc
On 21/06/2021 18:17, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 17:43 +0800, Ed Greshko wrote:
On 21/06/2021 17:20, Patrick O'Callaghan wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
How about supplying the dmesg output?
Wondering if the BTRFS raid is detected at that early stage.
Uploaded to the installed-boot folder.
I thought the contents of dmesg were already in the journal, but I may have misunderstood their relationship.
Oh, I forgot to add, could you use "dmesg -T"? :-)
On 21/06/2021 18:27, Ed Greshko wrote:
On 21/06/2021 18:17, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 17:43 +0800, Ed Greshko wrote:
On 21/06/2021 17:20, Patrick O'Callaghan wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
How about supplying the dmesg output?
Wondering if the BTRFS raid is detected at that early stage.
Uploaded to the installed-boot folder.
I thought the contents of dmesg were already in the journal, but I may have misunderstood their relationship.
Oh, I forgot to add, could you use "dmesg -T"? :-)
I just realized I may have been a bit ambiguous.
I would like the journal and dmesg -T output at the same time so date/time are the same.
On Mon, 2021-06-21 at 18:57 +0800, Ed Greshko wrote:
On 21/06/2021 18:27, Ed Greshko wrote:
On 21/06/2021 18:17, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 17:43 +0800, Ed Greshko wrote:
On 21/06/2021 17:20, Patrick O'Callaghan wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
I've included my home-grown 'dock' scripts for completeness, plus logs of a Fedora Live boot (with no delay) and my current installed system (with the delay), both with unchanged hardware.
How about supplying the dmesg output?
Wondering if the BTRFS raid is detected at that early stage.
Uploaded to the installed-boot folder.
I thought the contents of dmesg were already in the journal, but I may have misunderstood their relationship.
Oh, I forgot to add, could you use "dmesg -T"? :-)
I just realized I may have been a bit ambiguous.
I would like the journal and dmesg -T output at the same time so date/time are the same.
To avoid ambiguity, maybe you should tell me the journal options you´d like (the timestamps in the uploaded logs don´t show wallclock time).
poc
On 21/06/2021 19:48, Patrick O'Callaghan wrote:
To avoid ambiguity, maybe you should tell me the journal options you´d like (the timestamps in the uploaded logs don´t show wallclock time).
The journal times of what you've supplied seem fine. They show
Jun 20 15:44:50 for example
and dmesg -T would show
[Sun Jun 20 18:35:35 2021]
So, they are easy to match up.
On 21/06/2021 18:17, Patrick O'Callaghan wrote:
I thought the contents of dmesg were already in the journal, but I may have misunderstood their relationship.
I asked for dmesg since there is a gap in dmesg but not a corresponding gap in the journal.
[ 30.093669] logitech-hidpp-device 0003:046D:400A.0005: HID++ 2.0 device connected. [ 30.389335] rfkill: input handler enabled [ 299.942841] usb 4-3: new SuperSpeed Gen 1 USB device number 3 using xhci_hcd [ 299.955717] usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00
But I don't know what real time that happened, thus the -T.
On Mon, 2021-06-21 at 20:02 +0800, Ed Greshko wrote:
On 21/06/2021 19:48, Patrick O'Callaghan wrote:
To avoid ambiguity, maybe you should tell me the journal options you´d like (the timestamps in the uploaded logs don´t show wallclock time).
The journal times of what you've supplied seem fine. They show
Jun 20 15:44:50 for example
and dmesg -T would show
[Sun Jun 20 18:35:35 2021]
So, they are easy to match up.
I've replaced the dmesg file with the output of 'dmesg -T'.
poc
On 21/06/2021 20:31, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 20:02 +0800, Ed Greshko wrote:
On 21/06/2021 19:48, Patrick O'Callaghan wrote:
To avoid ambiguity, maybe you should tell me the journal options you´d like (the timestamps in the uploaded logs don´t show wallclock time).
The journal times of what you've supplied seem fine. They show
Jun 20 15:44:50 for example
and dmesg -T would show
[Sun Jun 20 18:35:35 2021]
So, they are easy to match up.
I've replaced the dmesg file with the output of 'dmesg -T'.
I thought I mentioned they should have been taken at the same time.
journal starts at
Jun 20 15:44:50
dmesg
Sun Jun 20 22:38:39 2021
Kinda hard to match things that way. :-)
On Mon, Jun 21, 2021 at 3:20 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
The logs are now publicly visible at: https://drive.google.com/drive/folders/1nGwVkeTJh5hz4dYRBD7Ikpx6q4MaPki9?usp...
From the live boot (the least complicated one to look at for starters), there is an anomaly:
Jun 19 08:46:41 fedora kernel: BTRFS: device label fedora_localhost-live devid 1 transid 1768306 /dev/sda3 scanned by systemd-udevd (602) Jun 19 08:46:41 fedora kernel: BTRFS: device label storage devid 1 transid 3481192 /dev/sdc1 scanned by systemd-udevd (595) Jun 19 08:46:44 fedora kernel: BTRFS: device label RAID devid 1 transid 1973 /dev/sdd scanned by systemd-udevd (595) Jun 19 08:46:56 fedora systemd[1]: Mounted /sysroot. Jun 19 12:47:14 fedora kernel: BTRFS: device label RAID devid 2 transid 1973 /dev/sde scanned by systemd-udevd (1000)
rewinding to see why this device is so much later (ignoring 8 vs 12 which is some clock artifact and not real), even though it's not holding up live boot:
Jun 19 08:46:41 fedora kernel: scsi host6: uas Jun 19 08:46:41 fedora kernel: usbcore: registered new interface driver uas Jun 19 08:46:41 fedora kernel: scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 19 08:46:41 fedora kernel: scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: Attached scsi generic sg4 type 0 Jun 19 08:46:41 fedora kernel: sd 6:0:0:1: Attached scsi generic sg5 type 0 Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] 4096-byte physical blocks Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] Write Protect is off Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] Mode Sense: 43 00 00 00 Jun 19 08:46:41 fedora kernel: sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 19 08:46:41 fedora kernel: sd 6:0:0:1: [sde] 4096-byte physical blocks Jun 19 08:46:41 fedora kernel: sd 6:0:0:1: [sde] Write Protect is off Jun 19 08:46:41 fedora kernel: sd 6:0:0:1: [sde] Mode Sense: 43 00 00 00 Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 19 08:46:41 fedora kernel: sd 6:0:0:0: [sdd] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 19 08:46:44 fedora kernel: sd 6:0:0:0: [sdd] Attached SCSI disk Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: IN Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 CDB: Mode Sense(6) 1a 00 08 00 18 00 Jun 19 12:47:12 fedora kernel: scsi host6: uas_eh_device_reset_handler start Jun 19 12:47:12 fedora kernel: usb 4-3: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd Jun 19 12:47:12 fedora kernel: scsi host6: uas_eh_device_reset_handler success Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 19 12:47:14 fedora kernel: sd 6:0:0:1: [sde] Attached SCSI disk Jun 19 12:47:14 fedora kernel: BTRFS: device label RAID devid 2 transid 1973 /dev/sde scanned by systemd-udevd (1000)
Both sd 6:0:0:0: (/dev/sdd) and sd 6:0:0:1: (/dev/sde) are found at the same time, but there's a uas related reset that happens only on /dev/sde. I don't know what this means:
Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: IN Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 CDB: Mode Sense(6) 1a 00 08 00 18 00
But that's the clue that there's some kind of communication issue between drive and kernel. If these are both in SATA USB enclosures I'd ask on the linux-usb list what these messages mean and why the file system on the one with these messages isn't recognized by the kernel until later.
You could switch cables only around and see if the problem follows the cables; then switch the drives to see if it follows the drives. I would sooner expect the drive enclosure than cable but since I've got no clue what the above two messages mean, it's just an iterative process.
I do recommend using lsusb -v in the initial email to linux-usb, maybe compare the two enclosure outputs to see if there's anything different. The make/model might be identical but it's possible a partial explanation lies with a chipset difference, or revision of the chipset. There's a bunch of usb and uas driver quirks that can be applied to work around problems like this. By reporting them, if there's enough differentiation reported by the enclosures to use a quirk, they'll apply it by default in a future kernel. If not, then it becomes something you apply every boot.
On 6/21/21 3:14 AM, Patrick O'Callaghan wrote:
Yes, that's been established (strictly, I haven't bothered swapping the drives in the dock to see what happens, but it's immaterial). Again, the issue is not that one drive takes time to spin up, but that the system start-up waits for it when it doesn't need to.
Even so, if it's always the same drive, there might be something wrong with it and if so, simply replacing that drive might be the simplest solution.
On Mon, Jun 21, 2021 at 10:58 AM Chris Murphy lists@colorremedies.com wrote:
Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: IN Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 CDB: Mode Sense(6) 1a 00 08 00 18 00
Yeah and in the install-boot log it happens again:
Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: tag#16 uas_eh_abort_handler 0 uas-tag 1 inflight: IN Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: tag#16 CDB: Mode Sense(6) 1a 00 08 00 18 00 Jun 20 15:45:20 Bree kernel: scsi host6: uas_eh_device_reset_handler start Jun 20 15:45:20 Bree kernel: usb 4-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd Jun 20 15:45:20 Bree kernel: scsi host6: uas_eh_device_reset_handler success Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Attached SCSI disk
What's new here is the explicit USB reset message.
Jun 20 15:44:50 Bree kernel: usb 4-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd Jun 20 15:44:50 Bree kernel: usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 Jun 20 15:44:50 Bree kernel: usb 4-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 Jun 20 15:44:50 Bree kernel: usb 4-3: Product: ASM1156-PM Jun 20 15:44:50 Bree kernel: usb 4-3: Manufacturer: ASMT Jun 20 15:44:50 Bree kernel: usb 4-3: SerialNumber: 00000000000000000000
Curiously there is no usb of a second ASM1156 product, even though there are two:
Jun 20 15:44:50 Bree kernel: scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 20 15:44:50 Bree kernel: scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6
Jun 20 15:44:50 Bree kernel: sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 20 15:44:50 Bree kernel: sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
And they have their own /dev node. So is one in a USB enclosure and the other isn't? Or maybe they are both just appearing as usb 4-3 even though they get different scsi id's - that's probably it. But then one of them is having some sort of issue, even if it's just confusion that results in the need for the kernel to do a reset on it. but *shrug* this is the joy of USB, it's not necessarily a hardware problem per se.
I've got one SATA USB enclosure that tries to use the uas driver if direct connected to an Intel NUC. And I get no end of grief from it (this is with a pre 5.0 kernel I'm sure, it may very well have since been fixed in the kernel). All kinds of uas related errors. Plug it into an externally powered USB hub, and it doens't use the uas driver and I don't have any problems with it, and it reads/writes at approximately the drive's spec'd performance, depending on where on the platter the head is at. As it turns out a USB hub is very much like an ethernet hub. It's not just amplifying a signal, it's reading it, parsing it, and rewriting out that entire stream, as well as any other stream from another connected device. They're a PITA but kind of an engineering marvel (from my perspective anyway). So the hub does in effect seem to normalize the command/control/data streams from myriad devices so they have a better chance of playing well together. It's almost like the idea is "we'll use crap chipsets in the devices and hosts themselves, and just let the hubs figure it all out".
And as it turns out with the well behaved SATA USB enclosures, they have transient read and write errors (one a day, then 10 times in a row) if direct connect to that same Intel NUC. With the *externally* powered (not bus powered) hub, zero problems. For years.
So if both drives are in SATA USB enclosures, and if they're both connected to ports on a dock, you might track down or borrow an externally powered hub and connect both of the drives to that hub, and the hub to the dock. And see if this problem goes away.
On Mon, 2021-06-21 at 21:18 +0800, Ed Greshko wrote:
On 21/06/2021 20:31, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 20:02 +0800, Ed Greshko wrote:
On 21/06/2021 19:48, Patrick O'Callaghan wrote:
To avoid ambiguity, maybe you should tell me the journal options you´d like (the timestamps in the uploaded logs don´t show wallclock time).
The journal times of what you've supplied seem fine. They show
Jun 20 15:44:50 for example
and dmesg -T would show
[Sun Jun 20 18:35:35 2021]
So, they are easy to match up.
I've replaced the dmesg file with the output of 'dmesg -T'.
I thought I mentioned they should have been taken at the same time.
journal starts at
Jun 20 15:44:50
dmesg
Sun Jun 20 22:38:39 2021
Kinda hard to match things that way. :-)
OK, see reply to Chris below.
poc
On Mon, 2021-06-21 at 13:13 -0600, Chris Murphy wrote:
On Mon, Jun 21, 2021 at 10:58 AM Chris Murphy lists@colorremedies.com wrote:
Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 uas_eh_abort_handler 0 uas-tag 2 inflight: IN Jun 19 12:47:12 fedora kernel: sd 6:0:0:1: tag#2 CDB: Mode Sense(6) 1a 00 08 00 18 00
Yeah and in the install-boot log it happens again:
Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: tag#16 uas_eh_abort_handler 0 uas-tag 1 inflight: IN Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: tag#16 CDB: Mode Sense(6) 1a 00 08 00 18 00 Jun 20 15:45:20 Bree kernel: scsi host6: uas_eh_device_reset_handler start Jun 20 15:45:20 Bree kernel: usb 4-3: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd Jun 20 15:45:20 Bree kernel: scsi host6: uas_eh_device_reset_handler success Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) Jun 20 15:45:20 Bree kernel: sd 6:0:0:1: [sde] Attached SCSI disk
What's new here is the explicit USB reset message.
I rebooted and took some new logs. However this time, for whatever reason, there was no 30-second delay. The logs are prefixed "nodelay-*" in the same directory.
I then rebooted and this time there was a delay. These logs are labelled "delay-*".
One interesting point is that in both cases I have to wait before getting some of these, e.g.
$ systemd-analyze blame Bootup is not yet finished (org.freedesktop.systemd1.Manager.FinishTimestampMonotonic=0). Please try again later. Hint: Use 'systemctl list-jobs' to see active jobs [poc@Bree ~]$ systemctl list-jobs JOB UNIT TYPE STATE 294 dock-watch.service start running
1 jobs listed.
IOW the system thinks that boot hasn't finished because the dock-watch unit is still running, even after I've already logged in. Probably not important.
nodelay-journal doesn't have the reset message, so that is clearly a clue to what's going on.
[...]
Curiously there is no usb of a second ASM1156 product, even though there are two:
Jun 20 15:44:50 Bree kernel: scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 Jun 20 15:44:50 Bree kernel: scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6
Jun 20 15:44:50 Bree kernel: sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) Jun 20 15:44:50 Bree kernel: sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
And they have their own /dev node. So is one in a USB enclosure and the other isn't? Or maybe they are both just appearing as usb 4-3 even though they get different scsi id's - that's probably it. But then one of them is having some sort of issue, even if it's just confusion that results in the need for the kernel to do a reset on it. but *shrug* this is the joy of USB, it's not necessarily a hardware problem per se.
There is a single dock with two slots and no other type of enclosure. The disks are internal SATA units inserted directly into the slots. The dock has a single dedicated USB-3 connection direct to the system motherboard with no intervening hub or splitter. It is independently powered via a wall socket and power block.
poc
On Mon, 2021-06-21 at 11:47 -0600, Joe Zeff wrote:
On 6/21/21 3:14 AM, Patrick O'Callaghan wrote:
Yes, that's been established (strictly, I haven't bothered swapping the drives in the dock to see what happens, but it's immaterial). Again, the issue is not that one drive takes time to spin up, but that the system start-up waits for it when it doesn't need to.
Even so, if it's always the same drive, there might be something wrong with it and if so, simply replacing that drive might be the simplest solution.
Just for kicks, I've uploaded the smartctl readings for both drives to the same cloud folder:
https://drive.google.com/drive/folders/1H4XNTtZRaEgPUlGif2w-CR4Fof5OIqE0?usp...
I don't see anything alarming but I'm no expert.
poc
On 22/06/2021 05:41, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 21:18 +0800, Ed Greshko wrote:
On 21/06/2021 20:31, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 20:02 +0800, Ed Greshko wrote:
I thought I mentioned they should have been taken at the same time.
journal starts at
Jun 20 15:44:50
dmesg
Sun Jun 20 22:38:39 2021
Kinda hard to match things that way. :-)
OK, see reply to Chris below.
Right.
Well, I think it is well understood the HW in some form is responsible for the 30 sec delay in boot times. I see 3 avenues open.
1. Continue to search for the HW cause and hopefully fix it without incurring cost. 2. Find a way to ignore the HW during boot. 3. Live with the 30 second delay.
I was looking at #2 but couldn't find a way with BTRFS raid. Maybe with software raid and ext4.
I do wonder how many brain-seconds have collectively been used in search of a solution. :-) :-)
On Tue, 2021-06-22 at 08:42 +0800, Ed Greshko wrote:
On 22/06/2021 05:41, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 21:18 +0800, Ed Greshko wrote:
On 21/06/2021 20:31, Patrick O'Callaghan wrote:
On Mon, 2021-06-21 at 20:02 +0800, Ed Greshko wrote:
I thought I mentioned they should have been taken at the same time.
journal starts at
Jun 20 15:44:50
dmesg
Sun Jun 20 22:38:39 2021
Kinda hard to match things that way. :-)
OK, see reply to Chris below.
Right.
Well, I think it is well understood the HW in some form is responsible for the 30 sec delay in boot times. I see 3 avenues open.
1. Continue to search for the HW cause and hopefully fix it without incurring cost. 2. Find a way to ignore the HW during boot. 3. Live with the 30 second delay.
I was looking at #2 but couldn't find a way with BTRFS raid. Maybe with software raid and ext4.
I do wonder how many brain-seconds have collectively been used in search of a solution. :-) :-)
Well, I'm grateful to everyone who's chipped in, especially you and Chris, but don't feel in any way obliged.
One other data point and I'll leave it unless anything else turns up: I switched the two drives in the dock and got this from dmesg:
[Tue Jun 22 10:52:03 2021] usb 4-3: USB disconnect, device number 2 [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:27 2021] usb 4-3: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [Tue Jun 22 10:52:27 2021] usb 4-3: Product: ASM1156-PM [Tue Jun 22 10:52:27 2021] usb 4-3: Manufacturer: ASMT [Tue Jun 22 10:52:27 2021] usb 4-3: SerialNumber: 00000000000000000000 [Tue Jun 22 10:52:27 2021] scsi host6: uas [Tue Jun 22 10:52:27 2021] scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: Attached scsi generic sg4 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: Attached scsi generic sg5 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Attached SCSI disk [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 uas_eh_abort_handler 0 uas-tag 2 inflight: IN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<*** [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 CDB: Mode Sense(6) 1a 00 08 00 04 00 [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler start [Tue Jun 22 10:52:58 2021] usb 4-3: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler success [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Attached SCSI disk
The uas message is again from device 6:0:0:1 as before, even though the disks have been swapped. IOW the issue definitely comes from the dock, not from the physical drives themselves.
Thanks again.
poc
On Mon, 21 Jun 2021 at 21:45, Ed Greshko ed.greshko@greshko.com wrote:
I do wonder how many brain-seconds have collectively been used in search of a solution. :-) :-)
A lot more than goes into the design of cheap docks sold on Amazon, but the search has educational value.
On 22/06/2021 18:09, George N. White III wrote:
On Mon, 21 Jun 2021 at 21:45, Ed Greshko <ed.greshko@greshko.com mailto:ed.greshko@greshko.com> wrote:
I do wonder how many brain-seconds have collectively been used in search of a solution. :-) :-)A lot more than goes into the design of cheap docks sold on Amazon, but the search has educational value.
Indeed it has. And, as I told poc, without his pain I wouldn't have traced down an issue that has bothered me for the longest time.
Does that make me a masochist or sadist? I always mix those up. :-)
On 22/06/2021 18:04, Patrick O'Callaghan wrote:
Well, I'm grateful to everyone who's chipped in, especially you and Chris, but don't feel in any way obliged.
That's good. As well you shouldn't. I only just thought about that in connection with the time I spent on my NFS mounts happening at boot. I gave up on it until this thread. And, IMO, it only because of this thread that I finally tracked down the cause. So, I'm grateful for your pain. :-) :-)
One other data point and I'll leave it unless anything else turns up: I switched the two drives in the dock and got this from dmesg:
[Tue Jun 22 10:52:03 2021] usb 4-3: USB disconnect, device number 2 [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:27 2021] usb 4-3: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [Tue Jun 22 10:52:27 2021] usb 4-3: Product: ASM1156-PM [Tue Jun 22 10:52:27 2021] usb 4-3: Manufacturer: ASMT [Tue Jun 22 10:52:27 2021] usb 4-3: SerialNumber: 00000000000000000000 [Tue Jun 22 10:52:27 2021] scsi host6: uas [Tue Jun 22 10:52:27 2021] scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: Attached scsi generic sg4 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: Attached scsi generic sg5 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Attached SCSI disk [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 uas_eh_abort_handler 0 uas-tag 2 inflight: IN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<*** [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 CDB: Mode Sense(6) 1a 00 08 00 04 00 [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler start [Tue Jun 22 10:52:58 2021] usb 4-3: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler success [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Attached SCSI disk
The uas message is again from device 6:0:0:1 as before, even though the disks have been swapped. IOW the issue definitely comes from the dock, not from the physical drives themselves.
That would be my conclusion as well.
On Tue, 2021-06-22 at 18:17 +0800, Ed Greshko wrote:
On 22/06/2021 18:04, Patrick O'Callaghan wrote:
Well, I'm grateful to everyone who's chipped in, especially you and Chris, but don't feel in any way obliged.
That's good. As well you shouldn't. I only just thought about that in connection with the time I spent on my NFS mounts happening at boot. I gave up on it until this thread. And, IMO, it only because of this thread that I finally tracked down the cause. So, I'm grateful for your pain. :-) :-)
Think nothing of it :-)
poc
On Mon, Jun 21, 2021 at 4:10 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
There is a single dock with two slots and no other type of enclosure. The disks are internal SATA units inserted directly into the slots. The dock has a single dedicated USB-3 connection direct to the system motherboard with no intervening hub or splitter. It is independently powered via a wall socket and power block.
Does the error messages I referred to happen when the system is booted with the drives attached separately? Or does it happen when only connected to a particular port on the dock?
On Tue, Jun 22, 2021 at 4:05 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
One other data point and I'll leave it unless anything else turns up: I switched the two drives in the dock and got this from dmesg:
[Tue Jun 22 10:52:03 2021] usb 4-3: USB disconnect, device number 2 [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronizing SCSI cache [Tue Jun 22 10:52:03 2021] sd 6:0:0:1: [sde] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK [Tue Jun 22 10:52:27 2021] usb 4-3: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 [Tue Jun 22 10:52:27 2021] usb 4-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [Tue Jun 22 10:52:27 2021] usb 4-3: Product: ASM1156-PM [Tue Jun 22 10:52:27 2021] usb 4-3: Manufacturer: ASMT [Tue Jun 22 10:52:27 2021] usb 4-3: SerialNumber: 00000000000000000000 [Tue Jun 22 10:52:27 2021] scsi host6: uas [Tue Jun 22 10:52:27 2021] scsi 6:0:0:0: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] scsi 6:0:0:1: Direct-Access ASMT ASM1156-PM 0 PQ: 0 ANSI: 6 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: Attached scsi generic sg4 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: Attached scsi generic sg5 type 0 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB) [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] 4096-byte physical blocks [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Write Protect is off [Tue Jun 22 10:52:28 2021] sd 6:0:0:1: [sde] Mode Sense: 43 00 00 00 [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:28 2021] sd 6:0:0:0: [sdd] Attached SCSI disk [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 uas_eh_abort_handler 0 uas-tag 2 inflight: IN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<*** [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: tag#26 CDB: Mode Sense(6) 1a 00 08 00 04 00 [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler start [Tue Jun 22 10:52:58 2021] usb 4-3: reset SuperSpeed Gen 1 USB device number 4 using xhci_hcd [Tue Jun 22 10:52:58 2021] scsi host6: uas_eh_device_reset_handler success [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [Tue Jun 22 10:52:58 2021] sd 6:0:0:1: [sde] Attached SCSI disk
The uas message is again from device 6:0:0:1 as before, even though the disks have been swapped. IOW the issue definitely comes from the dock, not from the physical drives themselves.
I don't know if it's coming from the dock's usb chipset or the usb-sata adapter on the drive. That adapter has a chipset on it also. These are exactly the sorts of problems often resolved by putting both drives on a USB hub, and then the hub into the dock.
On Tue, 2021-06-22 at 22:45 -0600, Chris Murphy wrote:
On Mon, Jun 21, 2021 at 4:10 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
There is a single dock with two slots and no other type of enclosure. The disks are internal SATA units inserted directly into the slots. The dock has a single dedicated USB-3 connection direct to the system motherboard with no intervening hub or splitter. It is independently powered via a wall socket and power block.
Does the error messages I referred to happen when the system is booted with the drives attached separately? Or does it happen when only connected to a particular port on the dock?
The only way I can connect them is through the dock. They don't have their own USB connectors.
poc
On Tue, 2021-06-22 at 22:54 -0600, Chris Murphy wrote:
The uas message is again from device 6:0:0:1 as before, even though the disks have been swapped. IOW the issue definitely comes from the dock, not from the physical drives themselves.
I don't know if it's coming from the dock's usb chipset or the usb-sata adapter on the drive. That adapter has a chipset on it also. These are exactly the sorts of problems often resolved by putting both drives on a USB hub, and then the hub into the dock.
I don't think the drives have internal USB-SATA conversion. These are internal SATA drives, so I assumed any USB stuff was being handled by the dock.
In any case, I tried plugging the dock into a USB-3 hub, but it made no difference.
I'll consider taking this to the linux-usb list as you suggested earlier.
poc
On 12/06/2021 20:45, Joe Zeff wrote:
On 6/12/21 11:54 AM, Patrick O'Callaghan wrote:
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
Ok, you can still use systemd-analyze blame to find out what's causing the delay.
On my F34 boot systemd-udev-settle.service takes only 1.921s but the real biggie is
1min 23.232s plymouth-quit-wait.service
How can I go about diagnosing and fixing that? In 2021 i7 machines should not be taking literally minutes to boot.
Thanks, Chris R.
On 24/06/2021 18:42, Christopher Ross wrote:
On 12/06/2021 20:45, Joe Zeff wrote:
On 6/12/21 11:54 AM, Patrick O'Callaghan wrote:
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
Ok, you can still use systemd-analyze blame to find out what's causing the delay.
On my F34 boot systemd-udev-settle.service takes only 1.921s but the real biggie is
1min 23.232s plymouth-quit-wait.service
How can I go about diagnosing and fixing that? In 2021 i7 machines should not be taking literally minutes to boot.
Suggest you start a fresh thread. You'd be hijacking this thread which is already a kinda hijack. So, send an email to this list with (suggestion) the Subject of
plymouth-quit-wait taking too long
On Thu, 24 Jun 2021 at 07:42, Christopher Ross fdra6390@tebibyte.org wrote:
On 12/06/2021 20:45, Joe Zeff wrote:
On 6/12/21 11:54 AM, Patrick O'Callaghan wrote:
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
Ok, you can still use systemd-analyze blame to find out what's causing the delay.
On my F34 boot systemd-udev-settle.service takes only 1.921s but the real biggie is
1min 23.232s plymouth-quit-wait.serviceHow can I go about diagnosing and fixing that? In 2021 i7 machines should not be taking literally minutes to boot.
Plymouth is not responsible for the delay -- it provides the splash screen during boot. This was discussed at: < https://askubuntu.com/questions/1119167/slow-boot-issue-due-to-plymouth-quit...
Unlike many Ubuntu discussions, this one is excellent. A good original post with lots of detail, and a response showing how to analyze what plymouth is doing.
Please start a new thread and provide more detail. Are there long pauses before the splash screen appears or between ending the splash screen and the login screen?
On Thu, 2021-06-24 at 11:42 +0100, Christopher Ross wrote:
On 12/06/2021 20:45, Joe Zeff wrote:
On 6/12/21 11:54 AM, Patrick O'Callaghan wrote:
Not my case. I don't have systemd-udev-settle.service. This is a fresh install of F34.
Ok, you can still use systemd-analyze blame to find out what's causing the delay.
On my F34 boot systemd-udev-settle.service takes only 1.921s but the real biggie is
1min 23.232s plymouth-quit-wait.service
How can I go about diagnosing and fixing that? In 2021 i7 machines should not be taking literally minutes to boot.
On my system (also an i7) that takes only 5s. Presumably something is holding it up. Try running 'systemd-analyze plot > trace.svg' followed by 'eog trace.svg' and look at the resulting chart.
poc
On 24/06/2021 12:20, Patrick O'Callaghan wrote:
On Thu, 2021-06-24 at 11:42 +0100, Christopher Ross wrote:
How can I go about diagnosing and fixing that? In 2021 i7 machines should not be taking literally minutes to boot. On my system (also an i7) that takes only 5s. Presumably something is holding it up. Try running 'systemd-analyze plot > trace.svg' followed by 'eog trace.svg' and look at the resulting chart.
I have started a new thread, per Ed and George's suggestions.
Regards, Chris R.
On Tue, 2021-06-22 at 22:54 -0600, Chris Murphy wrote:
The uas message is again from device 6:0:0:1 as before, even though the disks have been swapped. IOW the issue definitely comes from the dock, not from the physical drives themselves.
I don't know if it's coming from the dock's usb chipset or the usb-sata adapter on the drive. That adapter has a chipset on it also. These are exactly the sorts of problems often resolved by putting both drives on a USB hub, and then the hub into the dock.
Just to close this off, I did some Googling and came up with this:
https://zerosandones.co.uk/2021/06/usb-drive-issue-uas_eh_abort_handler-0-ua...
I blacklisted the dock in /etc/modprobe.d/blacklist_uas.conf, rebooted, and the delay problem disappeared.
poc