Is this proof that systemd is completely broken?
Sam Varshavchik
mrsam at courier-mta.com
Sat Jul 12 14:00:45 UTC 2014
Now that I have your attention, the background is as follows. This is a
server with only statically configured network interfaces. NetworkManager is
not installed. All network interfaces are statically configured via
/etc/sysconfig/network-scripts.
The server is regularly updated to current Fedora packages. For the last
month, or so, the server has failed to come up in a sane state, reliably.
After it responds to pings, after ssh-ing in, and examining the aftermath,
the logs of all network services are consistent, in that they claim that
each network service – which includes: named-chroot, httpd, dhcpd, and
privoxy – their boot logs claim that no network interfaces were up at the
time they're started.
After finally getting pissed about having to manually re-brain the server,
each time it boots, I attached a console monitor, and observed that the boot
goes /very/ quickly, and the console login prompt comes up about 20-30
seconds before the server even starts responding to pings. Looks like the
multi-user target is reached way long before networking even comes up.
Last week, I've commented on the following curiosity: after sifting through
systemd's documentation, their documentation claims that "network.target"
gets reached only after basic networking is up, and "network-online.target"
gets reached only after all network interfaces are initialized.
Problem number one is that all servers specify "After=network.target", when,
according to how I interpret this, they should all really specify
"After=network-online.target".
After that, it came to my attention that there's a NetworkManager optional
subpackage that installs a service that waits for network interfaces to come
up, and it's specified as "Before=network.target network-online.target". It
seems fairly obvious to me that it should really be "Before=network-
online.target" and "After=network.target", with all other services that
require a functioning network specifying "After=network-online.target". That
made logical sense to me, but it seems that this confusing arrangement makes
logical sense to someone else, so, whatever. I do not have NetworkManager
installed, but, I figure, why not take a crack at whipping up a dirty hack
that basically does the same thing?
But the unexpected result from the hack is that it seems to provide solid
proof that systemd's dependency resolution is not working, but before I
Bugzilla this (as little hope one might have from getting anything useful
done by Bugzillaing this), I'd like to hear some consensus that I am
interpreting the following data right. Who knows, I might actually have made
a mistake, somewhere.
Let's take a look at what named-chroot.service says:
[Unit]
Description=Berkeley Internet Name Domain (DNS)
Wants=nss-lookup.target
Before=nss-lookup.target
After=network.target
Are we all in agreement that named-chroot.service should only be started
after network.target gets reached? Ok.
Now, here's my hack, which is basically a clone of that NetworkManager
subpackage:
# cat /etc/systemd/system/wait-for-network.service
[Unit]
Description=Wait for network ports to be initialized
Before=network.target network-online.target
[Service]
Type=oneshot
ExecStart=/root/bin/wait-for-network
[Install]
WantedBy=multi-user.target
Are we all in agreement that:
1) This is a one-shot service, and according to systemd's documentation,
systemd must wait until this script is complete, before it's considered
started.
2) Until it's complete, network.target isn't reached.
3) Therefore, this script must finish before systemd should start named-
chroot.service
Yet, after testing this script, then activating it, the server still came up
utterly brainless after the reboot. The results:
systemctl status named-chroot.service reports:
named-chroot.service - Berkeley Internet Name Domain (DNS)
Loaded: loaded (/usr/lib/systemd/system/named-chroot.service; enabled)
Active: active (running) since Sat 2014-07-12 09:24:29 EDT; 3min 28s ago
…
So, systemd started named-chroot.service at 09:24:29.
My script logs the current timestamp. The output from /root/bin/wait-for-
network was as follows:
Sat Jul 12 09:24:27 2014
Interface: lo is up
Sat Jul 12 09:24:32 2014
Interface: lan0 is up
Interface: lo is up
Interface: wan0 is down
Sat Jul 12 09:24:37 2014
Interface: lan0 is up
Interface: lo is up
Interface: wan0 is up
systemd started this script at 09:24:27. This script spun its wheels until
09:24:37, at which time all network interfaces finally came up. I'm happy to
post the contents of this short script; however I don't think that it's
relevant here, because the problem is that this script was running when
systemd decided to run named-chroot.service, even though, according to the
above, this should not happen.
So, either I'm misreading the description of "oneshot" in
systemd.service(5); and "Before" and "After" in systemd.unit(5), or systemd
is broken completely. I think that my understanding of systemd's
documentation is very reasonable. So, either systemd is broken, or, if it's
supposedly working how it should be working, its documentation is crap, and
is impossible to follow. I see no other possibilities.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20140712/83a2881a/attachment.sig>
More information about the users
mailing list