Is this proof that systemd is completely broken?

Sam Varshavchik mrsam at courier-mta.com
Sat Jul 12 14:00:45 UTC 2014


Now that I have your attention, the background is as follows. This is a  
server with only statically configured network interfaces. NetworkManager is  
not installed. All network interfaces are statically configured via  
/etc/sysconfig/network-scripts.

The server is regularly updated to current Fedora packages. For the last  
month, or so, the server has failed to come up in a sane state, reliably.  
After it responds to pings, after ssh-ing in, and examining the aftermath,  
the logs of all network services are consistent, in that they claim that  
each network service – which includes: named-chroot, httpd, dhcpd, and  
privoxy – their boot logs claim that no network interfaces were up at the  
time they're started.

After finally getting pissed about having to manually re-brain the server,  
each time it boots, I attached a console monitor, and observed that the boot  
goes /very/ quickly, and the console login prompt comes up about 20-30  
seconds before the server even starts responding to pings. Looks like the  
multi-user target is reached way long before networking even comes up.

Last week, I've commented on the following curiosity: after sifting through  
systemd's documentation, their documentation claims that "network.target"  
gets reached only after basic networking is up, and "network-online.target"  
gets reached only after all network interfaces are initialized.

Problem number one is that all servers specify "After=network.target", when,  
according to how I interpret this, they should all really specify  
"After=network-online.target".

After that, it came to my attention that there's a NetworkManager optional  
subpackage that installs a service that waits for network interfaces to come  
up, and it's specified as "Before=network.target network-online.target". It  
seems fairly obvious to me that it should really be "Before=network- 
online.target" and "After=network.target", with all other services that  
require a functioning network specifying "After=network-online.target". That  
made logical sense to me, but it seems that this confusing arrangement makes  
logical sense to someone else, so, whatever. I do not have NetworkManager  
installed, but, I figure, why not take a crack at whipping up a dirty hack  
that basically does the same thing?

But the unexpected result from the hack is that it seems to provide solid  
proof that systemd's dependency resolution is not working, but before I  
Bugzilla this (as little hope one might have from getting anything useful  
done by Bugzillaing this), I'd like to hear some consensus that I am  
interpreting the following data right. Who knows, I might actually have made  
a mistake, somewhere.

Let's take a look at what named-chroot.service says:

[Unit]
Description=Berkeley Internet Name Domain (DNS)
Wants=nss-lookup.target
Before=nss-lookup.target
After=network.target

Are we all in agreement that named-chroot.service should only be started  
after network.target gets reached? Ok.

Now, here's my hack, which is basically a clone of that NetworkManager  
subpackage:

# cat /etc/systemd/system/wait-for-network.service
[Unit]
Description=Wait for network ports to be initialized
Before=network.target network-online.target

[Service]
Type=oneshot
ExecStart=/root/bin/wait-for-network

[Install]
WantedBy=multi-user.target

Are we all in agreement that:

1) This is a one-shot service, and according to systemd's documentation,  
systemd must wait until this script is complete, before it's considered  
started.

2) Until it's complete, network.target isn't reached.

3) Therefore, this script must finish before systemd should start named- 
chroot.service

Yet, after testing this script, then activating it, the server still came up  
utterly brainless after the reboot. The results:

systemctl status named-chroot.service reports:

named-chroot.service - Berkeley Internet Name Domain (DNS)
   Loaded: loaded (/usr/lib/systemd/system/named-chroot.service; enabled)
   Active: active (running) since Sat 2014-07-12 09:24:29 EDT; 3min 28s ago
…

So, systemd started named-chroot.service at 09:24:29.

My script logs the current timestamp. The output from /root/bin/wait-for- 
network was as follows:

Sat Jul 12 09:24:27 2014
Interface: lo is up
Sat Jul 12 09:24:32 2014
Interface: lan0 is up
Interface: lo is up
Interface: wan0 is down
Sat Jul 12 09:24:37 2014
Interface: lan0 is up
Interface: lo is up
Interface: wan0 is up

systemd started this script at 09:24:27. This script spun its wheels until  
09:24:37, at which time all network interfaces finally came up. I'm happy to  
post the contents of this short script; however I don't think that it's  
relevant here, because the problem is that this script was running when  
systemd decided to run named-chroot.service, even though, according to the  
above, this should not happen.

So, either I'm misreading the description of "oneshot" in  
systemd.service(5); and "Before" and "After" in systemd.unit(5), or systemd  
is broken completely. I think that my understanding of systemd's  
documentation is very reasonable. So, either systemd is broken, or, if it's  
supposedly working how it should be working, its documentation is crap, and  
is impossible to follow. I see no other possibilities.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20140712/83a2881a/attachment.sig>


More information about the users mailing list