Network availability systemd dependency failure at boot

Sam Varshavchik mrsam at courier-mta.com
Sat Jul 5 17:00:29 UTC 2014


Ed Greshko writes:

> On 07/05/14 20:13, Sam Varshavchik wrote:
> > So, how should this mess get fixed? Start filing bugs against all these  
> packages, requesting a change to their systemd service file, to state a  
> dependency on network-online.target?
>
> FWIW, I'm running a fully updated F20 system and not seeing any problems for  
> httpd and named

Neither did I, until either the last, or the next to last, systemd update.

> I also run with NetworkManager-wait-online.service enabled.  There was a  
> specific reason I started running with that enabled....don't remember why.   
> But, you may want to check that.

The server with dhcp, httpd, named, and privoxy does not have NetworkManager  
installed. Both the WAN and the LAN ports are configured as static IPs.

The server with innd installed has NetworkManager, so I could theoretically  
enable it there.

http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ documents an  
alternative target, systemd-networkd-wait-online.service, which does not  
appear to actually exist anywhere, and is not installed by any package.

The more I dig into the config files, the bigger the clusterfark this  
appears to be.

The starting point is the above documentation for network.target and network- 
online.target. The above is supposed to be the authoritative documentation,  
directly referenced from the man pages. Starting with that, I look at what  
network-online.target actually says:

[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget
After=network.target

It doesn't do anything, it's just a symbolic target. That's fine, so intent  
is that stuff that actually needs network connections should declare  
"After=network-online.target". Then, whatever system service is responsible  
for initializing the static network connections would declare both  
"After=network.target" and "Before=network-online.target", so it runs after  
basic networking is up. Once it succeeds in initializing the network  
connections, it terminates, network-online.target now gets reached, and all  
the services that depend on established network connections can now run.  
That seems to be the desired strategy.

Sounds great. This is actually not a such a bad plan of action. It might  
actually make sense, presuming that all servers that depend on established  
network connections would specify "After=network-online.target", and not  
"After=network.target", as they do now. Of course, as I discovered, only  
kdump.service actually does this. So, this is the first thing that goes off  
the rails. But the rest of the train quickly follows:

Now, given the initial design, one would automatically assume that  
NetworkManager-wait-online.service would follow the master plan, and specify  
"After=network.target" and "Before=network-online.target", putting all the  
jigsaw pieces in the correct order. But no, this is what NetworkManager-wait- 
online.service actually says:

[Unit]
Description=Network Manager Wait Online
Requisite=NetworkManager.service
After=NetworkManager.service
Wants=network.target
Before=network.target network-online.target

It specifies that it should be reached /before/ *both* network.target and  
network-online.target, rather than after network.target, and before network- 
online.target.

This really looks like somebody just said "eh, I'm just too lazy to fix all  
services that should really be executed after reaching network- 
online.target, I'm just going to fix this by executing NetworkManager-wait- 
online.service before network.target is reached, and before all the servers  
that currently require network.target get forked off".

Brilliant.

So, enabling NetworkManager-wait-online.service is required on servers that  
run dhcp, named, httpd, and other servers. If it's not enabled, a roll of  
the dice will determine whether any of them will come up properly. And I'll  
bet none of these RPMs enable it, which is needed for this hack to work.  
And, if NetworkManager is not enabled, with all network interfaces being  
initialized to static IPs in /etc/sysconfig/network-scripts, I don't see a  
way to get this right. It may or may not work, depending on the order  
systemd chooses to execute scripts, and how long they take. Even the kernel  
version could be a factor – how long the kernel takes to initialize each  
network interface.

And the documented alternative, "systemd-networkd-wait-online.service", is  
still nowhere to be found. yum whatprovides comes up empty.

It should be fun watching all of this implode from the sidelines, as all  
servers running DHCP and httpd get updated to RHEL 7. Some of them will be  
fine. Some of them will randomly fail to come up fully. Those that do manage  
to work initially, at some point later a systemd update, or a kernel update,  
will subtly change the order in which stuff gets forked off from systemd,  
and suddenly break it.

Lots of fun.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20140705/dfeee360/attachment.sig>


More information about the users mailing list