I have an external USB-3 2-disk docking station, and a script which can power up and down the drives as needed.
I have a systemd automount unit that correctly powers up the dock when accessed, then mounts the drives (thanks Ed).
After an idle time, automount unmounts the drives. A script detects when this happens and powers them down ... *at which point they immediately power up again, and remain up until I intervene manually, even though they are unmounted*.
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
All of the above is 100% reproducible.
I'm open to suggestions if anyone has any ideas.
poc
On 27/03/2021 07:20, Patrick O'Callaghan wrote:
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
Just "thinking out loud" here.
When you run the script from the command line it is after you've determined the unmount has happened. How much time do you think elapsed between the actual unmount and running of the script? Do you have a delay between the time inotifywait detects unmount and powering down the drives? If so, does it help if you extend the delay?
I have never used it for this, but wireshark does have the ability to monitor USB. Don't know if that will reveal anything or be of any help.
On Fri, Mar 26, 2021 at 5:21 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
I have an external USB-3 2-disk docking station, and a script which can power up and down the drives as needed.
I have a systemd automount unit that correctly powers up the dock when accessed, then mounts the drives (thanks Ed).
After an idle time, automount unmounts the drives. A script detects when this happens and powers them down ... *at which point they immediately power up again, and remain up until I intervene manually, even though they are unmounted*.
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
All of the above is 100% reproducible.
I'm open to suggestions if anyone has any ideas.
smartd?
I use this line in smartd.conf to keep it from waking up the drive all the time.
/dev/disk/by-id/wwn-0x5000c500a93cae8a -l selftest -s L/../15/./22 -n standby,250
Since it's unmounted, fatrace won't work, but blktrace will..
blktrace -d /dev/sdb -o - | blkparse -i -
It will generate a lot of lines but it'll also report the process that's sending commands to the drive.
On Sat, 2021-03-27 at 09:50 +0800, Ed Greshko wrote:
On 27/03/2021 07:20, Patrick O'Callaghan wrote:
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
Just "thinking out loud" here.
When you run the script from the command line it is after you've determined the unmount has happened. How much time do you think elapsed between the actual unmount and running of the script? Do you have a delay between the time inotifywait detects unmount and powering down the drives? If so, does it help if you extend the delay?
The power-down script loops as long as it detects that the drives are spun up (using hdparm -C), since they take few seconds to actually spin down. I can physically see them being spun down, then a couple of seconds later they spin up again, which can only be because something is accessing them. If I then invoke the same power-down script manually, they power down and stay down.
I've now modified the script to run an outer loop three times over the power-down-and-check sequence, which so far seems to do the trick, but it's a real kludge.
I have never used it for this, but wireshark does have the ability to monitor USB. Don't know if that will reveal anything or be of any help.
Only as a last resort. I don't fancy having to learn USB command sequences.
poc
On Fri, 2021-03-26 at 21:46 -0600, Chris Murphy wrote:
On Fri, Mar 26, 2021 at 5:21 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
I have an external USB-3 2-disk docking station, and a script which can power up and down the drives as needed.
I have a systemd automount unit that correctly powers up the dock when accessed, then mounts the drives (thanks Ed).
After an idle time, automount unmounts the drives. A script detects when this happens and powers them down ... *at which point they immediately power up again, and remain up until I intervene manually, even though they are unmounted*.
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
All of the above is 100% reproducible.
I'm open to suggestions if anyone has any ideas.
smartd?
I use this line in smartd.conf to keep it from waking up the drive all the time.
/dev/disk/by-id/wwn-0x5000c500a93cae8a -l selftest -s L/../15/./22 -n standby,250
Since it's unmounted, fatrace won't work, but blktrace will..
blktrace -d /dev/sdb -o - | blkparse -i -
It will generate a lot of lines but it'll also report the process that's sending commands to the drive.
OK, that looks like it's worth investigating.
poc
Wireshark won't help as it won't tell you what process is doing. And decoding the commands will be very difficult.
You probably should send the power down and wait a few seconds before starting the loop. It would seem to be pretty likely that if in the middle of the spin-down *ANY* command that is received causes it to power the disk fully back up and aborts the spin-down. But once the disk is completely spun-down that specific command does not spin the disk back up. In the middle of the spin-down it would appear that the disk is not yet in the certain commands do not cause a spin-up state. This would assume the disk has 2 operational states. When in the spun-up state any command causes a spin-up, once in the spun-down state certain commands don't cause a spin-up. And until the spin-down is completed it would appear it is still in the first state.
On Sat, Mar 27, 2021 at 7:18 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sat, 2021-03-27 at 09:50 +0800, Ed Greshko wrote:
On 27/03/2021 07:20, Patrick O'Callaghan wrote:
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
Just "thinking out loud" here.
When you run the script from the command line it is after you've determined the unmount has happened. How much time do you think elapsed between the actual unmount and running of the script? Do you have a delay between the time inotifywait detects unmount and powering down the drives? If so, does it help if you extend the delay?
The power-down script loops as long as it detects that the drives are spun up (using hdparm -C), since they take few seconds to actually spin down. I can physically see them being spun down, then a couple of seconds later they spin up again, which can only be because something is accessing them. If I then invoke the same power-down script manually, they power down and stay down.
I've now modified the script to run an outer loop three times over the power-down-and-check sequence, which so far seems to do the trick, but it's a real kludge.
I have never used it for this, but wireshark does have the ability to monitor USB. Don't know if that will reveal anything or be of any help.
Only as a last resort. I don't fancy having to learn USB command sequences.
poc _______________________________________________ users mailing list -- users@lists.fedoraproject.org To unsubscribe send an email to users-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
On Sat, 2021-03-27 at 07:59 -0500, Roger Heflin wrote:
Wireshark won't help as it won't tell you what process is doing. And decoding the commands will be very difficult.
Yes, that's what I thought.
You probably should send the power down and wait a few seconds before starting the loop. It would seem to be pretty likely that if in the middle of the spin-down *ANY* command that is received causes it to power the disk fully back up and aborts the spin-down. But once the disk is completely spun-down that specific command does not spin the disk back up. In the middle of the spin-down it would appear that the disk is not yet in the certain commands do not cause a spin-up state. This would assume the disk has 2 operational states. When in the spun-up state any command causes a spin-up, once in the spun-down state certain commands don't cause a spin-up. And until the spin-down is completed it would appear it is still in the first state.
Although that seems very reasonable, it doesn't explain why the *exact same script* works when used from the command line, and doesn't work when called from the systemd unit. All the looping, delaying, checking etc. is within the script itself.
poc
On Sat, Mar 27, 2021 at 9:35 AM Patrick O'Callaghan pocallaghan@gmail.com wrote:
On Sat, 2021-03-27 at 07:59 -0500, Roger Heflin wrote:
Wireshark won't help as it won't tell you what process is doing. And decoding the commands will be very difficult.
Yes, that's what I thought.
You probably should send the power down and wait a few seconds before starting the loop. It would seem to be pretty likely that if in the middle of the spin-down *ANY* command that is received causes it to power the disk fully back up and aborts the spin-down. But once the disk is completely spun-down that specific command does not spin the disk back up. In the middle of the spin-down it would appear that the disk is not yet in the certain commands do not cause a spin-up state. This would assume the disk has 2 operational states. When in the spun-up state any command causes a spin-up, once in the spun-down state certain commands don't cause a spin-up. And until the spin-down is completed it would appear it is still in the first state.
Although that seems very reasonable, it doesn't explain why the *exact same script* works when used from the command line, and doesn't work when called from the systemd unit. All the looping, delaying, checking etc. is within the script itself.
poc
So on the command line the script never exits? And neither does it exit in systemd? So in both cases it is always running.
In the command line case you leave it running all of the time so in either case it should be noticing the umount at the same timing?
IE you don't run it when you detect the umount it is running and always watching? What is it using to detect the umount? Is the command like script already fully running when the automount happens or do you manually start it and the disk was already umounted?
I have had issues before with the notifies triggering a create event and running the associated script so fast that the triggered script copied an empty file. In my case it was triggering on a file create and the script was starting and copying the file before the file was even written to. And the code writing to the file should have been a quick open/write/close that had very little delay in it but the notify on the file create + calling a script was beating the file being finished. I had to put a short sleep in it to give it a bit to complete the file write/close.
On Sat, 2021-03-27 at 11:29 -0500, Roger Heflin wrote:
You probably should send the power down and wait a few seconds before starting the loop. It would seem to be pretty likely that if in the middle of the spin-down *ANY* command that is received causes it to power the disk fully back up and aborts the spin-down. But once the disk is completely spun-down that specific command does not spin the disk back up. In the middle of the spin-down it would appear that the disk is not yet in the certain commands do not cause a spin- up state. This would assume the disk has 2 operational states. When in the spun-up state any command causes a spin-up, once in the spun- down state certain commands don't cause a spin-up. And until the spin-down is completed it would appear it is still in the first state.
Although that seems very reasonable, it doesn't explain why the *exact same script* works when used from the command line, and doesn't work when called from the systemd unit. All the looping, delaying, checking etc. is within the script itself.
poc
So on the command line the script never exits? And neither does it exit in systemd? So in both cases it is always running.
Not at all. The script was exiting because it thought it had successfully spun down the drive. The problem was that the drive would immediately spin up again (after the script had exited). In the most recent version, with extra looping and checks, it now seems to be working correctly in both cases (systemd and command line).
In the command line case you leave it running all of the time so in either case it should be noticing the umount at the same timing?
It's not running all the time. It starts when automount triggers it, waits for an inotifywait unmount event, spins down, then exits.
IE you don't run it when you detect the umount it is running and always watching? What is it using to detect the umount? Is the command like script already fully running when the automount happens or do you manually start it and the disk was already umounted?
I have had issues before with the notifies triggering a create event and running the associated script so fast that the triggered script copied an empty file. In my case it was triggering on a file create and the script was starting and copying the file before the file was even written to. And the code writing to the file should have been a quick open/write/close that had very little delay in it but the notify on the file create + calling a script was beating the file being finished. I had to put a short sleep in it to give it a bit to complete the file write/close.
The script has sleeps at various points to prevent this kind of race. That isn't the issue. The script logs what it's doing and it works correctly. The unexplained spin-up is being caused by something else.
As mentioned previously, I've now added extra checks to the script to detect if such a spin-up occurs and repeat the spin-down command. So far it seems to need this three times before it settles.
For now I'm going to leave it at that for a day or two to see if it keeps working. I'll come back to it if there's anything new to report.
poc
On 27/03/2021 20:18, Patrick O'Callaghan wrote:
but it's a real kludge
Previously you had said that getting all of this to work was a "hobby". Not nice to call the output of your hobby a "kludge". :-) :-)
But, you do have to admit that you're putting things together to do something that probably wasn't their original intention.
And, no coffee, you're using USB devices, right? USB is not in my wheelhouse. Does "something" scan USB buses from time to time?
On Sun, 2021-03-28 at 07:46 +0800, Ed Greshko wrote:
On 27/03/2021 20:18, Patrick O'Callaghan wrote:
but it's a real kludge
Previously you had said that getting all of this to work was a "hobby". Not nice to call the output of your hobby a "kludge". :-) :-)
Much worse if it was a kludge and not a hobby :-)
But, you do have to admit that you're putting things together to do something that probably wasn't their original intention.
Isn't that what hacking is all about?
And, no coffee, you're using USB devices, right? USB is not in my wheelhouse. Does "something" scan USB buses from time to time?
No idea. I'll look at what Chris Murphy suggested but right now I'm more in the "if it works, don't fix it" mode.
poc
On Fri, 2021-03-26 at 21:46 -0600, Chris Murphy wrote:
On Fri, Mar 26, 2021 at 5:21 PM Patrick O'Callaghan pocallaghan@gmail.com wrote:
I have an external USB-3 2-disk docking station, and a script which can power up and down the drives as needed.
I have a systemd automount unit that correctly powers up the dock when accessed, then mounts the drives (thanks Ed).
After an idle time, automount unmounts the drives. A script detects when this happens and powers them down ... *at which point they immediately power up again, and remain up until I intervene manually, even though they are unmounted*.
This never happens if I run the script directly from the command line (i.e. the drives power down and stay down).
Clearly the docking unit isn't just doing this flakily on its own. Something is making it happen, and I've no idea how to discover what it is except that it seems to be correlated with systemd in some way.
All of the above is 100% reproducible.
I'm open to suggestions if anyone has any ideas.
smartd?
I use this line in smartd.conf to keep it from waking up the drive all the time.
/dev/disk/by-id/wwn-0x5000c500a93cae8a -l selftest -s L/../15/./22 -n standby,250
Since it's unmounted, fatrace won't work, but blktrace will..
blktrace -d /dev/sdb -o - | blkparse -i -
It will generate a lot of lines but it'll also report the process that's sending commands to the drive.
Just to close this off: I looked briefly at blktrace but haven´t had the energy to follow it up (the output need a fair amount of decoding, even with blkparse). I have since switched from MD/ext4 to BTRFS/Raid1, and the dock now powers down the first time and doesn´t come back up again. I´m assuming that MD was the culprit all along for some reason.
poc