Hi,
I finally managed to reproduce the error on a local box. After doing the reboot like in [1], the tool can not ssh back into the vm. When I tried the same on debug mode on, it still fails for some time, and then randomly allows to ssh again.
I could not reproduce this using the same images over our OpenStack cloud. Any tips will be helpful to find the cause.
[1] https://apps.fedoraproject.org/autocloud/jobs/1845/output#264
Kushal
On 01/09/2017 08:31 AM, Kushal Das wrote:
Hi,
I finally managed to reproduce the error on a local box. After doing the reboot like in [1], the tool can not ssh back into the vm. When I tried the same on debug mode on, it still fails for some time, and then randomly allows to ssh again.
I could not reproduce this using the same images over our OpenStack cloud. Any tips will be helpful to find the cause.
Can we get together and debug this?
On Mon, Jan 09, 2017 at 10:16:51AM -0500, Dusty Mabe wrote:
I finally managed to reproduce the error on a local box. After doing the reboot like in [1], the tool can not ssh back into the vm. When I tried the same on debug mode on, it still fails for some time, and then randomly allows to ssh again. I could not reproduce this using the same images over our OpenStack cloud. Any tips will be helpful to find the cause.
Can we get together and debug this?
Is this a key-generation entropy problem?
On 09/01/17, Matthew Miller wrote:
On Mon, Jan 09, 2017 at 10:16:51AM -0500, Dusty Mabe wrote:
I finally managed to reproduce the error on a local box. After doing the reboot like in [1], the tool can not ssh back into the vm. When I tried the same on debug mode on, it still fails for some time, and then randomly allows to ssh again. I could not reproduce this using the same images over our OpenStack cloud. Any tips will be helpful to find the cause.
Can we get together and debug this?
Is this a key-generation entropy problem?
It can be an entropy problem, tried many things for the last few hours. Nothing helped till now. Still looking.
Kushal
On 09/01/17, Matthew Miller wrote:
On Mon, Jan 09, 2017 at 10:16:51AM -0500, Dusty Mabe wrote:
I finally managed to reproduce the error on a local box. After doing the reboot like in [1], the tool can not ssh back into the vm. When I tried the same on debug mode on, it still fails for some time, and then randomly allows to ssh again. I could not reproduce this using the same images over our OpenStack cloud. Any tips will be helpful to find the cause.
Can we get together and debug this?
Is this a key-generation entropy problem?
Finally managed to isolate the issue. If we boot the image with only one CPU, the error comes up. If we boot with 2 or more CPU(s), no issues at all. Now the question is if we should make local testing on Autocloud with 2 CPU(s) or get this issue fixed somehow?
Kushal
On Wed, Jan 18, 2017 at 10:05:41AM +0530, Kushal Das wrote:
Finally managed to isolate the issue. If we boot the image with only one CPU, the error comes up. If we boot with 2 or more CPU(s), no issues at all. Now the question is if we should make local testing on Autocloud with 2 CPU(s) or get this issue fixed somehow?
Kushal
Fedora Cloud Engineer CPython Core Developer https://kushaldas.in https://dgplug.org
I think we need to find out what the issue is and fix it.
On Wed, Jan 18, 2017 at 10:05:41AM +0530, Kushal Das wrote:
Finally managed to isolate the issue. If we boot the image with only one CPU, the error comes up. If we boot with 2 or more CPU(s), no issues at all. Now the question is if we should make local testing on Autocloud with 2 CPU(s) or get this issue fixed somehow?
I think we need to find out what the issue is and fix it.
+1. I wonder if this is actually a widespread problem that we've just noticed in this way.
On Wed, 2017-01-18 at 09:28 -0500, Matthew Miller wrote:
On Wed, Jan 18, 2017 at 10:05:41AM +0530, Kushal Das wrote:
Finally managed to isolate the issue. If we boot the image with only one CPU, the error comes up. If we boot with 2 or more CPU(s), no issues at all. Now the question is if we should make local testing on Autocloud with 2 CPU(s) or get this issue fixed somehow?
I think we need to find out what the issue is and fix it.
+1. I wonder if this is actually a widespread problem that we've just noticed in this way.
Someone was actually asking me yesterday about a bug where they couldn't ssh into a freshly-booted system for several minutes. Wonder if it might possibly be the same thing...
On Thu, Jan 19, 2017 at 08:48:20AM -0800, Adam Williamson wrote:
Someone was actually asking me yesterday about a bug where they couldn't ssh into a freshly-booted system for several minutes. Wonder if it might possibly be the same thing... -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net
It was Kushal, iirc and it is the same issue.
On 19/01/17, Mike Ruckman wrote:
On Thu, Jan 19, 2017 at 08:48:20AM -0800, Adam Williamson wrote:
Someone was actually asking me yesterday about a bug where they couldn't ssh into a freshly-booted system for several minutes. Wonder if it might possibly be the same thing... -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net
It was Kushal, iirc and it is the same issue.
It seems I created a cyclic graph through my question in different mediums :)
Kushal
On Thu, Jan 19, 2017 at 4:48 PM, Adam Williamson adamwill@fedoraproject.org wrote:
On Wed, 2017-01-18 at 09:28 -0500, Matthew Miller wrote:
On Wed, Jan 18, 2017 at 10:05:41AM +0530, Kushal Das wrote:
Finally managed to isolate the issue. If we boot the image with only one CPU, the error comes up. If we boot with 2 or more CPU(s), no issues at all. Now the question is if we should make local testing on Autocloud with 2 CPU(s) or get this issue fixed somehow?
I think we need to find out what the issue is and fix it.
+1. I wonder if this is actually a widespread problem that we've just noticed in this way.
Someone was actually asking me yesterday about a bug where they couldn't ssh into a freshly-booted system for several minutes. Wonder if it might possibly be the same thing...
That sounds like some form of DNS failure and a lookup waiting to timeout before it falls back