New subject: vote for systemd: Nay (now working but still Voting Nay)

2 Jul 2013


      A few weeks ago, I received a bug report regarding a Fedora package of ours,
it was a request to have its init configuration migrated to systemd. A
quick search within our Fedora repo shows systemd has become available
starting with FC14, I guess it is about time we adapt our package. So we
did so. Service definition is simple enough and the documentation is well
done, it was really easy to use systemd to start our application daemon.
There is a small lack of functionality within service definition to do exactly
what we want at the installation configuration phase, but we've found
   a solution within systemd (which, while not perfect, works).
We now have our main RPM requiring a secondary sysvinit or systemd RPM
according to distribution flavor.
Nice and easy.
Reading about systemd features, I told myself, it could be the right tool
to wake up an old project of mine exploiting containers kernel features and
have the last Fedora (FC18) running within a container under a fresh  
kernel (3.9.4).
This little project gave satisfactory results with various distributions when
I designed and tested it 2 years ago. First I checked it with a standard
   EL6.4 template (400 Megs) under this new kernel (3.9.4,  HOST  
EL6.4) to see if my tool was
still operational. Everything went perfectly. I was ready to test  
FC18. The selected
FC18 template is a very standard one (a 939 MBytes tgz file) which
(and this is a key factor) was proved to be fully working "as is" in an openvz
container (kernel 2.6.32-042stab076.8). "as is" means that Template  
was never taylored
to be on openvz container (template is used out of the box in openvz  
container) and could
be used to seed a working HOST too.
I was not expecting to have it fully working at the first attempt in  
my own container design,
but I was expecting systemd (using systemctl very detailed status) to give
   me a very good insight about issues which could occur.
The real goal was to learn how to use systemd components to diagnose an "in
   trouble" real system, a kind of flight simulator exercise, so that we
would be ready in the future to do quick diagnosis if one of our  
server in a rack
had trouble to boot or reboot with EL7.
If this exercise result is positive enough, why not try to install systemd
within our current deployment as systemd is sysvinit compatible?
The exercise will be considered a success if I was able to log in a  
FC18 container
from a remote location via SSH, the SSH port protected by the  
container own iptables
(a very minimal number of services started, a "safe haven" mode to  
recover a system from trouble).
This small exercise turned out very ugly very quickly, I worked very hard
trying all the tricks and bypass I could think about to collect data.  
To my dismay I
was unable to get a predictable behaviour, nor reliable data from  
systemd, even in the emergency.service mode.
After a while, I was forced to face it, systemd won't help me, not  
even start the system in a minimal mode,
I was not able to go beyond kernel level with systemd in control,  
services started were a total mess and container was totaly lock up,  
with no exploitable data provided.
(Quickly: we had interesting situation within the noisy and cold  
server room using the emergency.service console
such as:
$ systemctl start systemd-journald.service
--> "unable to comply!" a dependency job for systemd-journald.service  
failed, see journactl -xn.
$ journalctl -xn
--> "unable to comply!" No journal files were found
)
let's be blunt... from what I have seen:
In a perfect world, systemd is obviously a nice gadget,
in a real world, systemd is the perfect tools to transform a small  
problem in a terminal "cascading failure" event.
I sent a private email to Lennart about my 'little concern', giving  
more details and trying to explain as well as I could,
suggesting solutions (mainly for brainstorming purpose).
Lennart answered quickly, and rejected my "worries" with a wave of the hand.
To summarize, his answer was:
"systemd can work only as PID1, you are out of spec, we do not support  
openvz, good luck".
Obviously, he didn't understand I wasN'T trying to run systemd on an  
openvz kernel,
but rather on a plain 3.9.4 kernel neither was I requesting help to  
have FC18 running inside the container,
I was rather pointing difficulties with systemd not able to cope with
"hostiles conditions" init process duty. Troubles are by definition  
always 'out of spec'.
The part about "systemd can only works as PID1" increased my concerns  
by an order of magnitude.
I ended up asking myself 'what part of this puzzle am I missing?',
I digged around in Google about systemd and I was stunned by results, I found
my concerns were already expressed multiple time with more talented  
words than mine
and this as early as 2010. Since that time it is my understanding  
systemd continuously try to resolve problems
by increasing its complexity and extending its dependencies and its  
centrality.
this is wrong, this is very very wrong.
A program as complex as systemd can't be a mandatory PID1 in an open  
environment as UNIX.
We just defined a new oxymoron: "PID1 systemd".
This next paragraph in this email is dedicated to the RedHat person  
reading this mailing list as
part of its "technology watch" duty.
===--
It is my understanding EL7 will include systemd as init process. In
the actual working state of systemd and if included within EL7 as  
mandatory PID1, we won't
deploy EL7 within our servers racks.
Either we'll stay with EL6 or we'll move to another distribution (or  
another OS).
Adding a kernel type program over a kernel is just moving big trouble  
troubleshooting process
from a 4 solutions matrix (hardware+kernel) to a 8 solutions matrix
(hardware+kernel+systemd) needed to be resolved before to be able to  
access and work on the system.
Reading, via Google, tell me I am not the only one contemplating this  
very dilemma.
--===----
BTW and to go a little bit beyond the systemd case, since 1991,
FC18 is the very first distribution I was NOT successful in
installing on a plain hardware (not speaking about container here,  
rather very plain hardware with
RAID software disks. On the same hardware, same configuration  
parameters, EL6.X, Magia-3 and
slackware-14.0 install is A_OK).
I am starting to wonder if we (this "we" include dev contributors and  
myself too) could be on the wrong
path in the way we implement software in Fedora.
To summarize, It is very easy to write code for an open platform, far
more difficult to write code keeping the platform open.
(but this is another story, maybe another time...:-}}).
That's all folks.... :-} did I say "Nay" to systemd?.
-- 
A bientôt
===========================================================
Jean-Marc Pigeon                        E-Mail: jmp@safe.ca
SAFE Inc.                             Phone: (514) 493-4280
   Clement, 'a kiss solution' to get rid of SPAM (at last)
      Clement' Home base <"http://www.clement.safe.ca%22%3E
===========================================================

vote for systemd: Nay.