vote for systemd: Nay (now working but still Voting Nay)

Tue Jul 2 20:57:47 UTC 2013

Thanks Michal,
your answer was really positive and encourage me to proceed further.

So I have now an FC18 running within a container under an EL6.4 HOST  
with kernel 3.9.4 (big smile).

Problems starts to unlock themselves as I decided to bypass  
network.service altogether
starting network and sshd manually (ifup lo; ifup eth0;  
/usr/sbin/sshd). Now able to
work in a quiet room with multiple screens available to poke around  
and catch fast scrolling log messages.
(you should never forget about the poor sysadmin freezing in front of  
the servers room console
when your software is reporting a problem and not able to run :-}).

As expected the problem stand on a very small detail (within /etc/fstab)

Not working
/vzgot		/		ext4	defaults	0 0
proc		/proc		proc	defaults	0 0
sysfs		/sys		sysfs	defaults	0 0
devpts		/dev/pts		devpts	defaults	0 0
tmpfs		/dev/shm		tmpfs	defaults	0 0

Working
#/vzgot		/		ext4	defaults	0 0
proc		/proc		proc	defaults	0 0
sysfs		/sys		sysfs	defaults	0 0
devpts		/dev/pts		devpts	defaults	0 0
tmpfs		/dev/shm		tmpfs	defaults	0 0

The fact systemd was not able to cope with this /etc/fstab is quite  
acceptable,
(even if upstart and init have no problem with it), The fact such  
small trouble
drives systemd to an emergency state without reporting clearly is another
question. When the last prominent line before asking for maintenance
password is about, "Not able to exec /bin/plymouth, <no such file>"
you are asking yourself in what mess am I in.
The fact that the line just below says, "Please see journal" but  
journal is not available (empty)
just compound the effect.

  Once I was able to log via remote SSH in emergency.service mode, I  
played with different services,
trying to "ignore-dependencies" but never got a clear message about  
what was missing.
Success was more a lucky guess than the result from a structured approach.

So, no, sorry, systemd doesn't grade "production level" (not yet? or never?).

May I propose some way to improve it.
- journal should be accessible regardless of systemd status or trouble.
- when list-dependencies service is displayed, you should mark dependencies
    already running (or not successfully started?), think about the  
poor sysadmin!.
- You should have a way to proceed in a 'step by step' boot mode
    (avoiding in parallel fast scrolling report)

- On a more philosophical side:
    * linking PID1 and systemd seems to me a problem (why it is  
mandatory still escape me),
      you are limiting your trouble shooting context (double check  
your design).
    * the fact systemd is catching more and more functionality to be
      working should trigger a loud alarm signal about your design  
(did I understand
      today's mail correctly?, you can't use logrotate to  
expire/archive journal.... :/ )

Bug:
- After a very quick check, there is maybe a bug the way systemd is  
handling 'int reboot(int cmd);',
    I have the strong feeling systemd is not feeding WTERMSIG(status),  
but it is very
    preliminary, I could be wrong....

As your request,I can provide you with "vzgot", my container  
application (which flavor/distribution RPM do you want?
src.rpm is available too). While not a fork of LXC, I think vzgot is  
very close to LXC about the
way the container is started, difference is more about container  
definition, with vzgot, you just need
a DNS resolution (for the container's IPs) and a config_list, linking  
container name to a
distribution name, a template name and an architecture. With that  
data, vzgot is
able to create a running container by itself. I tried to have the  
container setup as lean,
simple and flexible as possible.

I put that project in sleep mode, because a trouble I reported 3 years ago
(a  syslog+printk cross leakage between HOST and containers) seems to
be very difficult to address within the kernel. But!... very good
news yesterday!, problem is fixed within kernel 3.10.0, maybe
it is time to work on vzgot again?.

Quoting Michal Schmidt <mschmidt at redhat.com>:

> On 07/02/2013 04:08 PM, Jean-Marc Pigeon wrote:
>> I was not expecting to have it fully working at the first attempt in my
>> own container design,
>
> Would you be willing to provide some details about your container  
> design? Ideally including the code to allow others to reproduce the  
> problems you saw.
>
> Have you seen these recommendations?:
> http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface/
>
>> but I was expecting systemd (using systemctl very detailed status) to give
>>  me a very good insight about issues which could occur.
>>
>> The real goal was to learn how to use systemd components to diagnose an "in
>>  trouble" real system, a kind of flight simulator exercise, so that we
>> would be ready in the future to do quick diagnosis if one of our server
>> in a rack had trouble to boot or reboot with EL7.
>
> Interesting excersise, but I am afraid by running it in a custom  
> container design and running under a host that itself is not using  
> systemd you uncovered an entirely different class of problems than  
> what can happen when running it on the host.
>
>> This small exercise turned out very ugly very quickly, I worked very hard
>> trying all the tricks and bypass I could think about to collect data. To
>> my dismay I
>> was unable to get a predictable behaviour, nor reliable data from
>> systemd, even in the emergency.service mode.
>> After a while, I was forced to face it, systemd won't help me, not even
>> start the system in a minimal mode,
>> I was not able to go beyond kernel level with systemd in control,
>> services started were a total mess and container was totaly lock up,
>> with no exploitable data provided.
>
> Not sure how much of it relates to container environments, but have  
> you seen this?:
> http://freedesktop.org/wiki/Software/systemd/Debugging/
>
> My first goal when debugging issues like this would be to make sure  
> I can see the debugging output of systemd itself (i.e. with  
> log_level set to debug and log_target to something I can read -  
> probably "console" in the case of a container).
>
>> (Quickly: we had interesting situation within the noisy and cold server
>> room using the emergency.service console
>> such as:
>> $ systemctl start systemd-journald.service
>> --> "unable to comply!" a dependency job for systemd-journald.service
>> failed, see journactl -xn.
>
> This is when logging to "kmsg" (the dmesg buffer) or "console" can  
> really help find out the problem.
>
>> I ended up asking myself 'what part of this puzzle am I missing?',
>> I digged around in Google about systemd and I was stunned by results, I
>> found
>> my concerns were already expressed multiple time with more talented
>> words than mine
>> and this as early as 2010. Since that time it is my understanding
>> systemd continuously try to resolve problems
>> by increasing its complexity and extending its dependencies and its
>> centrality.
>>
>> this is wrong, this is very very wrong.
>> A program as complex as systemd can't be a mandatory PID1 in an open
>> environment as UNIX.
>
> From the above paragraphs I get the feeling you may be missing the  
> fact that not all of "systemd" runs in PID1. There are more  
> components in the "systemd" project, such as journald, logind, ...   
> - they run as separate processes. There is some ambiguity when  
> talking about "systemd". Sometimes it refers only to the service  
> manager (PID1), and sometimes to the whole suite.
>
>> BTW and to go a little bit beyond the systemd case, since 1991,
>> FC18 is the very first distribution I was NOT successful in
>> installing on a plain hardware
>
> I heard F19 was released today with an improved Anaconda :-)
>
> Michal
-- 
A bientôt
===========================================================
Jean-Marc Pigeon                        E-Mail: jmp at safe.ca
SAFE Inc.                             Phone: (514) 493-4280
   Clement, 'a kiss solution' to get rid of SPAM (at last)
      Clement' Home base <"http://www.clement.safe.ca">
===========================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5919 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.fedoraproject.org/pipermail/devel/attachments/20130702/28506580/attachment.p7s>