On 9/26/18 5:03 AM, Neal Becker wrote:
Rick Stevens wrote:
> On 9/25/18 12:32 PM, Neal Becker wrote:
>> I'm using f28 cloud on AWS as a compute farm. It seems that instances
>> randomly shutdown within hours of starting. An example log:
>>
>> ...
>> Fedora 28 (Cloud Edition)
>> Kernel 4.16.3-301.fc28.x86_64 on an x86_64 (ttyS0)
>>
>> Stopping Restore /run/initramfs on shutdown...
>> [[0;32m OK [0m] Removed slice system-sshd\x2dkeygen.slice.
>> Stopping User Manager for UID 1000...
>> ...
>>
>> In this case after about 4 hours it seems to have spontaneously shutdown.
>> This happens with high probability - maybe 2/10 instances I start
>> spontaneously shutdown.
>>
>> Any ideas what's going on? I'm just wondering if this is something
>> specific to fedora cloud edition, because it doesn't seem to be a common
>> complaint on AWS (most of which is ubuntu).
>
> Are you getting emails from AWS that they're shutting down your
> instance? AWS does some testing and, should your instance fail their
> tests, they will shut it down "to protect others sharing the hardware".
> If this is what's happening, you should get an email about it (we get
> one perhaps 20% of the time) and if not, check the AWS admin portal
> under "Events" right after a restart. There should be a record about it.
> That record goes away after a while (not sure how long it hangs around).
>
> In my experience, AWS is rather vague as to just _what_ tests they use
> to determine if your instance is dangerous so it can be difficult to fix
> your code. We've got some AWS stuff that's been up for well over a year,
> but others they shut down because they fail these mysterious tests.
>
> If you're using instance store disks, the disk image is purged when you
> restart your instance so your logs probably don't contain why the system
> shut down the last time. The only way to hang onto that stuff is to use
> persistent (EBC) storage for your machine--at least for the logs (I'd
> recommend st1-type storage for logs). Persistent storage at AWS can get
> expensive depending on how big it is, but it may be necessary to sort
> this out. Once figured out, you can get rid of the EBS storage to
> minimize costs.
>
> This may be a Fedora Cloud issue. It may be something you're doing in an
> application. It may be AWS protecting itself. Hard to tell.
Shutdowns occur with very high probability within few hours. Like, maybe
20% of my machines shutdown within a few hours. I suspect machines with
high load average shutdown. But that's not behavior I'd expect from fedora
workstation! I'm wondering if there's something about the fedora cloud
setup causing this?
Please check the AWS portal and see if they're killing your machines or
if they're shutting down of their own accord. And as I said before,
you may need to set up an EBC st1 storage volume and mount it at
/var/log to persist logs across reboots so you can examine them when you
bring the machine back up.
It might an idea to set up a small AWS instance with the EBC storage at
/var/log as a log server and have all your other instances log to it.
You'd be able to capture any of your AWS instance logs that way on a
single EBC storage volume.
----------------------------------------------------------------------
- Rick Stevens, Systems Engineer, AllDigital ricks(a)alldigital.com -
- AIM/Skype: therps2 ICQ: 226437340 Yahoo: origrps2 -
- -
- Politicians are the opposite of pickpockets because you never see -
- them take their hand out of your pocket. -
- -- Larry Fine -
----------------------------------------------------------------------