Testing for proper disk mount & dismount
by pmkellly@frontier.com
In the QA meeting today, Adam thought we can discuss this here.
Back in late 2018, Alan Jenkins posted a note to this list concerning a
problem he had observed with drives not being properly dismounted. No
one seemed to reply and I had seen the same problem. I swapped a few
e'mails with Alan to get particulars and wrote a proposal for a test to
be added to our basic test matrix. I sent the proposal to this list and
after some discussion, I found out that a similar test was in the
matrix, but had been removed. There were comments about discussing it in
our weekly meetings, but I got busy and forgot to request an agenda
item. I just remembered it and thought I would bring it up to see if we
want to discuss this or drop it. The proposed test is below. I haven't
seen the problem reoccur, but I've only been doing (Journalctl -b
/usr/lib/systemd/systemd-fsck) after restarts.
How to test:
1. On a running system, change to a virtual console by pressing
Ctrl+Alt+F2
Result: A virtual console appears with a login prompt.
2. At the virtual console, login as the root user
Result: Login accepted
3. Halt the system by running the command: “halt”
Result: The `halt` is accepted and halts the system. The screen
is left powered on, showing the final shutdown messages. No system
filesystem / LVM device is left mounted / active when the system finally
halts. In some cases you might see a number of retries. This is okay as
long as the last retry is successful.
4. Read the on-screen messages.
Result: Check for messages indicating failures. Things like
“journal recovery” are a problem.
5. You now need to manually re-boot the system. On most hardware
(which complies with ACPI), you can manually power off by holding the
power button down for five seconds. Then press the power button to power
on again.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error. All
expected disk partitions are cleanly mounted. Check boot logs to see
that they do not show any”fsck” (filesystem repair) operations, or
“recovering journal” (ext3/4 journal recovery. The boot logs only need
to be checked after one shutdown - reboot cycle. The logs can be checked
using the command “journalctl -b /usr/lib/systemd/systemd-fsck”. A
result similar to the following indicates clean mounting:
“-- Logs begin at Mon 2018-11-19 13:52:18 EST, end at Sat 2019-01-12
12:27:48 ES>
Jan 12 08:37:25 localhost.localdomain systemd-fsck[503]:
/dev/mapper/fedora-roo>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[745]:
/dev/mapper/fedora-hom>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[743]: /dev/sda1:
clean, 412/>”
6. After the system boots, again change to a virtual console by
pressing Ctrl+Alt+F2.
Result: Virtual console appears
7. At the virtual console, login as the root user
Result: Login successful
8. Reboot the system by running the command: “reboot”
Result: The `reboot` is accepted and initiates a system reboot.
The system reboots with no additional user interaction. Note: Manually
booting the system may be required if the previous step fails.
9. After the system boots, once again change to a virtual console
by pressing Ctrl+Alt+F2.
Result: Virtual console appears.
10. At the virtual console, login as a non-root user. If no
non-root user accounts are available, you can create a new user account
as follows: Login as the root user and use the command: “useradd” to add
a non-root user. Logout of root and login as the new non-root user.
Result: User creation successful if used. Non-root login successful.
11. Power off the system by running the shutdown command. Consult
the man page for different acceptable [TIME] values. For example, to
power off the system immediately, type the following command: “shutdown now”
Result: The shutdown is accepted and powers off the system
without error.
12. Lastly, power on the system. Check that it boots successfully.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error, and all
expected disk partitions are cleanly mounted.
Have a Great Day!
Pat (tablepc)
4 years, 7 months
Re: Testing for proper disk mount & dismount
by pmkellly@frontier.com
On 9/9/19 14:34, Adam Williamson wrote:
> On Mon, 2019-09-09 at 14:09 -0400, pmkellly(a)frontier.com wrote:
>> In the QA meeting today, Adam thought we can discuss this here.
>>
>> Back in late 2018, Alan Jenkins posted a note to this list concerning a
>> problem he had observed with drives not being properly dismounted. No
>> one seemed to reply and I had seen the same problem. I swapped a few
>> e'mails with Alan to get particulars and wrote a proposal for a test to
>> be added to our basic test matrix. I sent the proposal to this list and
>> after some discussion, I found out that a similar test was in the
>> matrix, but had been removed.
>
> Do you have a specific reference for this part? The test itself and the
> old version of the matrix should still be accessible via page history
> etc. It'd be useful to see them. Thanks!
>
Sorry, here is the original test case:
https://fedoraproject.org/wiki/QA:Testcase_base_shutdown/reboot
I couldn't find the version of the matrix that included this case.
This is what I used as the base for the proposal and just added a couple
things around checking the disks. The recommendation is that this test
only be done on bare metal installs.
Have a Great Day!
Pat (tablepc)
4 years, 7 months
Proposed test case for disk dismount
by pmkellly@frontier.com
Back in late 2018, Alan Jenkins posted a note to this list concerning a
problem he had observed with drives not being properly dismounted on
restarts or power down - Startup. No one seemed to reply and I had seen
the same problem. I swapped a few e'mails with Alan to get particulars
and wrote a proposal for a test to be added to our basic test matrix. I
sent the proposal to this list and after some discussion, I found out
that a similar test was in the matrix, but had been removed.
I tried resubmitting an updated version to the list, but got no
responce. Probably because F30 release was close.
Here is the original test case:
https://fedoraproject.org/wiki/QA:Testcase_base_shutdown/reboot
I couldn't find the version of the matrix that included this case.
This is what I used as the base for the proposal and just added a couple
things around checking the disks. The recommendation is that this test
only be done on bare metal installs.
Here is the proposed test process:
Version 1.1
How to test:
1. On a running system, change to a virtual console by pressing
Ctrl+Alt+F2
Result: A virtual console appears with a login prompt.
2. At the virtual console, login as the root user
Result: Login accepted
3. Halt the system by running the command: “halt”
Result: The `halt` is accepted and halts the system. The screen
is left powered on, showing the final shutdown messages. No system
filesystem / LVM device is left mounted / active when the system finally
halts. In some cases you might see a number of retries. This is okay as
long as the last retry is successful.
4. Read the on-screen messages.
Result: Check for messages indicating failures. Things like
“journal recovery” are a problem.
5. You now need to manually re-boot the system. On most hardware
(which complies with ACPI), you can manually power off by holding the
power button down for five seconds. Then press the power button to power
on again.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error. All
expected disk partitions are cleanly mounted. Check boot logs to see
that they do not show any”fsck” (filesystem repair) operations, or
“recovering journal” (ext3/4 journal recovery. The boot logs only need
to be checked after one shutdown - reboot cycle. The logs can be checked
using the command “journalctl -b /usr/lib/systemd/systemd-fsck”. A
result similar to the following indicates clean mounting:
“-- Logs begin at Mon 2018-11-19 13:52:18 EST, end at Sat 2019-01-12
12:27:48 ES>
Jan 12 08:37:25 localhost.localdomain systemd-fsck[503]:
/dev/mapper/fedora-roo>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[745]:
/dev/mapper/fedora-hom>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[743]: /dev/sda1:
clean, 412/>”
6. After the system boots, again change to a virtual console by
pressing Ctrl+Alt+F2.
Result: Virtual console appears
7. At the virtual console, login as the root user
Result: Login successful
8. Reboot the system by running the command: “reboot”
Result: The `reboot` is accepted and initiates a system reboot.
The system reboots with no additional user interaction. Note: Manually
booting the system may be required if the previous step fails.
9. After the system boots, once again change to a virtual console
by pressing Ctrl+Alt+F2.
Result: Virtual console appears.
10. At the virtual console, login as a non-root user. If no
non-root user accounts are available, you can create a new user account
as follows: Login as the root user and use the command: “useradd” to add
a non-root user. Logout of root and login as the new non-root user.
Result: User creation successful if used. Non-root login successful.
11. Power off the system by running the shutdown command. Consult
the man page for different acceptable [TIME] values. For example, to
power off the system immediately, type the following command: “shutdown now”
Result: The shutdown is accepted and powers off the system
without error.
12. Lastly, power on the system. Check that it boots successfully.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error, and all
expected disk partitions are cleanly mounted.
Have a Great Day!
Pat (tablepc)
4 years, 6 months
Proposed test case for disk dismount (2nd try to send)
by pmkellly@frontier.com
Back in late 2018, Alan Jenkins posted a note to this list concerning a
problem he had observed with drives not being properly dismounted on
restarts or power down - Startup. No one seemed to reply and I had seen
the same problem. I swapped a few e'mails with Alan to get particulars
and wrote a proposal for a test to be added to our basic test matrix. I
sent the proposal to this list and after some discussion, I found out
that a similar test was in the matrix, but had been removed.
I tried resubmitting an updated version to the list, but got no
responce. Probably because F30 release was close.
Here is the original test case:
https://fedoraproject.org/wiki/QA:Testcase_base_shutdown/reboot
I couldn't find the version of the matrix that included this case.
This is what I used as the base for the proposal and just added a couple
things around checking the disks. The recommendation is that this test
only be done on bare metal installs.
Here is the proposed test process:
Version 1.1
How to test:
1. On a running system, change to a virtual console by pressing
Ctrl+Alt+F2
Result: A virtual console appears with a login prompt.
2. At the virtual console, login as the root user
Result: Login accepted
3. Halt the system by running the command: “halt”
Result: The `halt` is accepted and halts the system. The screen
is left powered on, showing the final shutdown messages. No system
filesystem / LVM device is left mounted / active when the system finally
halts. In some cases you might see a number of retries. This is okay as
long as the last retry is successful.
4. Read the on-screen messages.
Result: Check for messages indicating failures. Things like
“journal recovery” are a problem.
5. You now need to manually re-boot the system. On most hardware
(which complies with ACPI), you can manually power off by holding the
power button down for five seconds. Then press the power button to power
on again.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error. All
expected disk partitions are cleanly mounted. Check boot logs to see
that they do not show any”fsck” (filesystem repair) operations, or
“recovering journal” (ext3/4 journal recovery. The boot logs only need
to be checked after one shutdown - reboot cycle. The logs can be checked
using the command “journalctl -b /usr/lib/systemd/systemd-fsck”. A
result similar to the following indicates clean mounting:
“-- Logs begin at Mon 2018-11-19 13:52:18 EST, end at Sat 2019-01-12
12:27:48 ES>
Jan 12 08:37:25 localhost.localdomain systemd-fsck[503]:
/dev/mapper/fedora-roo>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[745]:
/dev/mapper/fedora-hom>
Jan 12 08:37:36 localhost.localdomain systemd-fsck[743]: /dev/sda1:
clean, 412/>”
6. After the system boots, again change to a virtual console by
pressing Ctrl+Alt+F2.
Result: Virtual console appears
7. At the virtual console, login as the root user
Result: Login successful
8. Reboot the system by running the command: “reboot”
Result: The `reboot` is accepted and initiates a system reboot.
The system reboots with no additional user interaction. Note: Manually
booting the system may be required if the previous step fails.
9. After the system boots, once again change to a virtual console
by pressing Ctrl+Alt+F2.
Result: Virtual console appears.
10. At the virtual console, login as a non-root user. If no
non-root user accounts are available, you can create a new user account
as follows: Login as the root user and use the command: “useradd” to add
a non-root user. Logout of root and login as the new non-root user.
Result: User creation successful if used. Non-root login successful.
11. Power off the system by running the shutdown command. Consult
the man page for different acceptable [TIME] values. For example, to
power off the system immediately, type the following command: “shutdown now”
Result: The shutdown is accepted and powers off the system
without error.
12. Lastly, power on the system. Check that it boots successfully.
Result: When the system boots, either after a halt, reboot or
shutdown operation, the system successfully boots without error, and all
expected disk partitions are cleanly mounted.
Have a Great Day!
Pat (tablepc)
4 years, 6 months
Re: Link to test case for drive dismount
by pmkellly@frontier.com
On 11/4/19 15:34, Chris Murphy wrote:
> On Mon, Nov 4, 2019 at 7:04 PM pmkellly(a)frontier.com
> <pmkellly(a)frontier.com> wrote:
>>
>> Here is the link to the draft:
>>
>> https://fedoraproject.org/wiki/User:Tablepc/Draft_testcase_reboot
>>
>>
>
> Why is "Be sure to reclaim all disk space" important for this test
> case? It really shouldn't matter what the layout is.
>
Chris,
I just wanted to be sure it was a clean install so there would be enough
space for the install and there wouldn't be any dual boot or other stuff
hanging around that might make trouble. If you think it will be all
right I will defer to you on this.
Have a Great Day!
Pat (tablepc)
4 years, 5 months
Re: Link to test case for drive dismount
by Chris Murphy
On Mon, Nov 4, 2019 at 10:26 PM pmkellly(a)frontier.com
<pmkellly(a)frontier.com> wrote:
>
>
> On 11/4/19 15:34, Chris Murphy wrote:
> > On Mon, Nov 4, 2019 at 7:04 PM pmkellly(a)frontier.com
> > <pmkellly(a)frontier.com> wrote:
> >>
> >> Here is the link to the draft:
> >>
> >> https://fedoraproject.org/wiki/User:Tablepc/Draft_testcase_reboot
> >>
> >>
> >
> > Why is "Be sure to reclaim all disk space" important for this test
> > case? It really shouldn't matter what the layout is.
> >
>
> Chris,
>
> I just wanted to be sure it was a clean install so there would be enough
> space for the install and there wouldn't be any dual boot or other stuff
> hanging around that might make trouble. If you think it will be all
> right I will defer to you on this.
Let's see what Adam thinks.
My opinion is that no matter what the layout is, everything should
either be cleanly unmounted, or remounted ro, or FIFREEZE() followed
by FITHAW() and in all three of those cases the next boot's system log
will show clean file systems that do not have dirty bits set, and do
not need log replay (file system journal recovery). The information
that's needed if any of that is not true: the next boot's system log
(shows if anything is dirty and needs replay/fixup), the prior boot's
system log, and potentially the shutdown-log.txt produced from
following these instructions:
Shutdown Completes Eventually
https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1
The more interesting part of this is whether we can block release on
such a bug. A dirty file system isn't always a form of corruption, and
we also aren't using atomic updates or file systems by default. I'm
not sure what release criterion would apply.
From basic:
"It must be possible to trigger a clean system shutdown using standard
console commands."
Does "clean" refer to file systems at all? And if so which ones? If
/home is not cleanly unmounted, there might be a bug, but it also
might be a benign problem because there's nothing on /home we need for
booting, and when it's mounted at the next startup, its fs journal
will be replayed. Maybe this could apply to the EFI system partition
and boot volumes? Does shutdown also apply to reboots? I think this
criterion is mostly about making sure a complete shutdown+poweroff is
possible by CLI.
From beta:
"A system installed without a graphical package set must boot to a
working login prompt without any unintended user intervention, and all
virtual consoles intended to provide a working login prompt must do
so. "
Really only applies following installation, I think. Not after a bunch
of packages are installed, customizations made, and dozens or hundreds
of reboots later, such a problem manifests.
From final:
"All known bugs that can cause corruption of user data must be fixed
or documented at Common F31 bugs."
It's not a bug that GRUB and most other bootloaders, can't read file
system journals, and therefore demand that at least one of: umount,
remount ro, FIFREEZE() then FITHAW()
--
Chris Murphy
4 years, 5 months
Re: Link to test case for drive dismount
by Chris Murphy
On Tue, Nov 5, 2019 at 6:45 PM Chris Murphy <lists(a)colorremedies.com> wrote:
>
> On Mon, Nov 4, 2019 at 10:26 PM pmkellly(a)frontier.com
> <pmkellly(a)frontier.com> wrote:
> >
> >
> > On 11/4/19 15:34, Chris Murphy wrote:
> > > On Mon, Nov 4, 2019 at 7:04 PM pmkellly(a)frontier.com
> > > <pmkellly(a)frontier.com> wrote:
> > >>
> > >> Here is the link to the draft:
> > >>
> > >> https://fedoraproject.org/wiki/User:Tablepc/Draft_testcase_reboot
> > >>
> > >>
> > >
> > > Why is "Be sure to reclaim all disk space" important for this test
> > > case? It really shouldn't matter what the layout is.
> > >
> >
> > Chris,
> >
> > I just wanted to be sure it was a clean install so there would be enough
> > space for the install and there wouldn't be any dual boot or other stuff
> > hanging around that might make trouble. If you think it will be all
> > right I will defer to you on this.
>
> Let's see what Adam thinks.
>
> My opinion is that no matter what the layout is, everything should
> either be cleanly unmounted, or remounted ro, or FIFREEZE() followed
> by FITHAW() and in all three of those cases the next boot's system log
> will show clean file systems that do not have dirty bits set, and do
> not need log replay (file system journal recovery). The information
> that's needed if any of that is not true: the next boot's system log
> (shows if anything is dirty and needs replay/fixup), the prior boot's
> system log, and potentially the shutdown-log.txt produced from
> following these instructions:
>
> Shutdown Completes Eventually
> https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1
>
> The more interesting part of this is whether we can block release on
> such a bug. A dirty file system isn't always a form of corruption, and
> we also aren't using atomic updates or file systems by default. I'm
> not sure what release criterion would apply.
>
> From basic:
> "It must be possible to trigger a clean system shutdown using standard
> console commands."
>
> Does "clean" refer to file systems at all? And if so which ones? If
> /home is not cleanly unmounted, there might be a bug, but it also
> might be a benign problem because there's nothing on /home we need for
> booting, and when it's mounted at the next startup, its fs journal
> will be replayed. Maybe this could apply to the EFI system partition
> and boot volumes? Does shutdown also apply to reboots? I think this
> criterion is mostly about making sure a complete shutdown+poweroff is
> possible by CLI.
>
> From beta:
> "A system installed without a graphical package set must boot to a
> working login prompt without any unintended user intervention, and all
> virtual consoles intended to provide a working login prompt must do
> so. "
>
> Really only applies following installation, I think. Not after a bunch
> of packages are installed, customizations made, and dozens or hundreds
> of reboots later, such a problem manifests.
>
> From final:
> "All known bugs that can cause corruption of user data must be fixed
> or documented at Common F31 bugs."
>
> It's not a bug that GRUB and most other bootloaders, can't read file
> system journals, and therefore demand that at least one of: umount,
> remount ro, FIFREEZE() then FITHAW()
(suddenly it sent before I was done)
... at least one of: umount, remount ro, FIFREEZE() then FITHAW() must
succeed. It is a known bootloader deficiency. Any of those three
things will ensure journal entries are flushed to the file system
before a reboot/shutdown, and therefore GRUB will have an accurate
view of file system state.
--
Chris Murphy
4 years, 5 months
Re: Link to test case for drive dismount
by pmkellly@frontier.com
On 11/5/19 13:48, Chris Murphy wrote:
> On Tue, Nov 5, 2019 at 6:45 PM Chris Murphy <lists(a)colorremedies.com> wrote:
>>
>> On Mon, Nov 4, 2019 at 10:26 PM pmkellly(a)frontier.com
>> <pmkellly(a)frontier.com> wrote:
>>>
>>>
>>> On 11/4/19 15:34, Chris Murphy wrote:
>>>> On Mon, Nov 4, 2019 at 7:04 PM pmkellly(a)frontier.com
>>>> <pmkellly(a)frontier.com> wrote:
>>>>>
>>>>> Here is the link to the draft:
>>>>>
>>>>> https://fedoraproject.org/wiki/User:Tablepc/Draft_testcase_reboot
>>>>>
>>>>>
>>>>
>>>> Why is "Be sure to reclaim all disk space" important for this test
>>>> case? It really shouldn't matter what the layout is.
>>>>
>>>
>>> Chris,
>>>
>>> I just wanted to be sure it was a clean install so there would be enough
>>> space for the install and there wouldn't be any dual boot or other stuff
>>> hanging around that might make trouble. If you think it will be all
>>> right I will defer to you on this.
>>
>> Let's see what Adam thinks.
>>
>> My opinion is that no matter what the layout is, everything should
>> either be cleanly unmounted, or remounted ro, or FIFREEZE() followed
>> by FITHAW() and in all three of those cases the next boot's system log
>> will show clean file systems that do not have dirty bits set, and do
>> not need log replay (file system journal recovery). The information
>> that's needed if any of that is not true: the next boot's system log
>> (shows if anything is dirty and needs replay/fixup), the prior boot's
>> system log, and potentially the shutdown-log.txt produced from
>> following these instructions:
>>
>> Shutdown Completes Eventually
>> https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1
>>
>> The more interesting part of this is whether we can block release on
>> such a bug. A dirty file system isn't always a form of corruption, and
>> we also aren't using atomic updates or file systems by default. I'm
>> not sure what release criterion would apply.
>>
>> From basic:
>> "It must be possible to trigger a clean system shutdown using standard
>> console commands."
>>
>> Does "clean" refer to file systems at all? And if so which ones? If
>> /home is not cleanly unmounted, there might be a bug, but it also
>> might be a benign problem because there's nothing on /home we need for
>> booting, and when it's mounted at the next startup, its fs journal
>> will be replayed. Maybe this could apply to the EFI system partition
>> and boot volumes? Does shutdown also apply to reboots? I think this
>> criterion is mostly about making sure a complete shutdown+poweroff is
>> possible by CLI.
>>
>> From beta:
>> "A system installed without a graphical package set must boot to a
>> working login prompt without any unintended user intervention, and all
>> virtual consoles intended to provide a working login prompt must do
>> so. "
>>
>> Really only applies following installation, I think. Not after a bunch
>> of packages are installed, customizations made, and dozens or hundreds
>> of reboots later, such a problem manifests.
>>
>> From final:
>> "All known bugs that can cause corruption of user data must be fixed
>> or documented at Common F31 bugs."
>>
>> It's not a bug that GRUB and most other bootloaders, can't read file
>> system journals, and therefore demand that at least one of: umount,
>> remount ro, FIFREEZE() then FITHAW()
> (suddenly it sent before I was done)
>
> ... at least one of: umount, remount ro, FIFREEZE() then FITHAW() must
> succeed. It is a known bootloader deficiency. Any of those three
> things will ensure journal entries are flushed to the file system
> before a reboot/shutdown, and therefore GRUB will have an accurate
> view of file system state.
>
>
Can we please have "Test case for drive dismount" as an official agenda
item for our next QA meeting? Discussing this via the list is not working.
Chris Murphy was the only respondent and he brought up some points that
seem like they would be good for group discussion. The link to the draft is:
https://fedoraproject.org/wiki/User:Tablepc/Draft_testcase_reboot
Have a Great Day!
Pat (tablepc)
4 years, 5 months