Greetings.
For a long while we were plagued by a issue where our buildvm-armv7-01/02/03 vm's would get 'stuck' on nightly rawhide and branched composes.
Laura found a workaround for this issue where we set a obscure sysctl and then they don't hang anymore. However, it would be nice if they wouldn't hang in cases like this for everyone else out of the box, so upstream kernel developers would like us to do some more debugging of the issue and send them that to try and get the default case working right.
So, what I would like to do:
* Take buildvm-armv7-03 and install https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 on it and reboot it into that kernel.
* Disable the sysctl we set in ansible for it:
diff --git a/roles/koji_builder/tasks/main.yml b/roles/koji_builder/tasks/main.yml index 6b6ecf0..75c427c 100644 --- a/roles/koji_builder/tasks/main.yml +++ b/roles/koji_builder/tasks/main.yml @@ -276,4 +276,4 @@ sysctl: name=vm.highmem_is_dirtyable value=1 state=present sysctl_set=yes reload=yes tags: - koji_builder - when: inventory_hostname.startswith(('buildvm-armv7-01.arm', 'buildvm-armv7-02', 'buildvm-armv7-03')) + when: inventory_hostname.startswith(('buildvm-armv7-01.arm', 'buildvm-armv7-02'))
* Wait for a compose to hang on it and gather the information needed, then put everything back the way it was before.
Note that this causes no compose failures, just a delay as the compose waits for that job to finish, and rebooting the box causes the job to restart and complete fine on that box.
We could hold off and do this after freeze, but I have a feeling freeze is going to be long and I'd prefer to get the info to upstream while they are still interested in looking into it.
Thoughts? +1s? -1s? rotten fruit?
kevin
On Mon, Mar 12, 2018 at 9:35 PM, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
For a long while we were plagued by a issue where our buildvm-armv7-01/02/03 vm's would get 'stuck' on nightly rawhide and branched composes.
Laura found a workaround for this issue where we set a obscure sysctl and then they don't hang anymore. However, it would be nice if they wouldn't hang in cases like this for everyone else out of the box, so upstream kernel developers would like us to do some more debugging of the issue and send them that to try and get the default case working right.
So, what I would like to do:
- Take buildvm-armv7-03 and install
https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 on it and reboot it into that kernel.
- Disable the sysctl we set in ansible for it:
diff --git a/roles/koji_builder/tasks/main.yml b/roles/koji_builder/tasks/main.yml index 6b6ecf0..75c427c 100644 --- a/roles/koji_builder/tasks/main.yml +++ b/roles/koji_builder/tasks/main.yml @@ -276,4 +276,4 @@ sysctl: name=vm.highmem_is_dirtyable value=1 state=present sysctl_set=yes reload=yes tags:
- koji_builder
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02', 'buildvm-armv7-03'))
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02'))
- Wait for a compose to hang on it and gather the information needed,
then put everything back the way it was before.
Note that this causes no compose failures, just a delay as the compose waits for that job to finish, and rebooting the box causes the job to restart and complete fine on that box.
We could hold off and do this after freeze, but I have a feeling freeze is going to be long and I'd prefer to get the info to upstream while they are still interested in looking into it.
Thoughts? +1s? -1s? rotten fruit?
+1 and an apple.
kevin
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
On Mon, Mar 12, 2018 at 9:40 PM, Patrick Uiterwijk puiterwijk@redhat.com wrote:
On Mon, Mar 12, 2018 at 9:35 PM, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
For a long while we were plagued by a issue where our buildvm-armv7-01/02/03 vm's would get 'stuck' on nightly rawhide and branched composes.
Laura found a workaround for this issue where we set a obscure sysctl and then they don't hang anymore. However, it would be nice if they wouldn't hang in cases like this for everyone else out of the box, so upstream kernel developers would like us to do some more debugging of the issue and send them that to try and get the default case working right.
So, what I would like to do:
- Take buildvm-armv7-03 and install
https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 on it and reboot it into that kernel.
- Disable the sysctl we set in ansible for it:
diff --git a/roles/koji_builder/tasks/main.yml b/roles/koji_builder/tasks/main.yml index 6b6ecf0..75c427c 100644 --- a/roles/koji_builder/tasks/main.yml +++ b/roles/koji_builder/tasks/main.yml @@ -276,4 +276,4 @@ sysctl: name=vm.highmem_is_dirtyable value=1 state=present sysctl_set=yes reload=yes tags:
- koji_builder
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02', 'buildvm-armv7-03'))
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02'))
- Wait for a compose to hang on it and gather the information needed,
then put everything back the way it was before.
Note that this causes no compose failures, just a delay as the compose waits for that job to finish, and rebooting the box causes the job to restart and complete fine on that box.
We could hold off and do this after freeze, but I have a feeling freeze is going to be long and I'd prefer to get the info to upstream while they are still interested in looking into it.
Thoughts? +1s? -1s? rotten fruit?
+1 and an apple.
(non-rotten)
Also, you're hopeful!
kevin
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
On Mon, Mar 12, 2018 at 8:35 PM, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
For a long while we were plagued by a issue where our buildvm-armv7-01/02/03 vm's would get 'stuck' on nightly rawhide and branched composes.
Laura found a workaround for this issue where we set a obscure sysctl and then they don't hang anymore. However, it would be nice if they wouldn't hang in cases like this for everyone else out of the box, so upstream kernel developers would like us to do some more debugging of the issue and send them that to try and get the default case working right.
So, what I would like to do:
- Take buildvm-armv7-03 and install
https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 on it and reboot it into that kernel.
- Disable the sysctl we set in ansible for it:
diff --git a/roles/koji_builder/tasks/main.yml b/roles/koji_builder/tasks/main.yml index 6b6ecf0..75c427c 100644 --- a/roles/koji_builder/tasks/main.yml +++ b/roles/koji_builder/tasks/main.yml @@ -276,4 +276,4 @@ sysctl: name=vm.highmem_is_dirtyable value=1 state=present sysctl_set=yes reload=yes tags:
- koji_builder
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02', 'buildvm-armv7-03'))
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02'))
- Wait for a compose to hang on it and gather the information needed,
then put everything back the way it was before.
Note that this causes no compose failures, just a delay as the compose waits for that job to finish, and rebooting the box causes the job to restart and complete fine on that box.
We could hold off and do this after freeze, but I have a feeling freeze is going to be long and I'd prefer to get the info to upstream while they are still interested in looking into it.
Thoughts? +1s? -1s? rotten fruit?
Definitely +1 and no rotten fruit, sounds reasonable and would be fantastic to get this issue closed out once and for all.
Agreed +1. I don't have any rotten fruit.
On 12 March 2018 at 16:35, Kevin Fenzi kevin@scrye.com wrote:
Greetings.
For a long while we were plagued by a issue where our buildvm-armv7-01/02/03 vm's would get 'stuck' on nightly rawhide and branched composes.
Laura found a workaround for this issue where we set a obscure sysctl and then they don't hang anymore. However, it would be nice if they wouldn't hang in cases like this for everyone else out of the box, so upstream kernel developers would like us to do some more debugging of the issue and send them that to try and get the default case working right.
So, what I would like to do:
- Take buildvm-armv7-03 and install
https://koji.fedoraproject.org/koji/taskinfo?taskID=25509848 on it and reboot it into that kernel.
- Disable the sysctl we set in ansible for it:
diff --git a/roles/koji_builder/tasks/main.yml b/roles/koji_builder/tasks/main.yml index 6b6ecf0..75c427c 100644 --- a/roles/koji_builder/tasks/main.yml +++ b/roles/koji_builder/tasks/main.yml @@ -276,4 +276,4 @@ sysctl: name=vm.highmem_is_dirtyable value=1 state=present sysctl_set=yes reload=yes tags:
- koji_builder
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02', 'buildvm-armv7-03'))
- when: inventory_hostname.startswith(('buildvm-armv7-01.arm',
'buildvm-armv7-02'))
- Wait for a compose to hang on it and gather the information needed,
then put everything back the way it was before.
Note that this causes no compose failures, just a delay as the compose waits for that job to finish, and rebooting the box causes the job to restart and complete fine on that box.
We could hold off and do this after freeze, but I have a feeling freeze is going to be long and I'd prefer to get the info to upstream while they are still interested in looking into it.
Thoughts? +1s? -1s? rotten fruit?
kevin
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org
infrastructure@lists.fedoraproject.org