Network availability systemd dependency failure at boot
Sam Varshavchik
mrsam at courier-mta.com
Sat Jul 5 17:00:29 UTC 2014
Ed Greshko writes:
> On 07/05/14 20:13, Sam Varshavchik wrote:
> > So, how should this mess get fixed? Start filing bugs against all these
> packages, requesting a change to their systemd service file, to state a
> dependency on network-online.target?
>
> FWIW, I'm running a fully updated F20 system and not seeing any problems for
> httpd and named
Neither did I, until either the last, or the next to last, systemd update.
> I also run with NetworkManager-wait-online.service enabled. There was a
> specific reason I started running with that enabled....don't remember why.
> But, you may want to check that.
The server with dhcp, httpd, named, and privoxy does not have NetworkManager
installed. Both the WAN and the LAN ports are configured as static IPs.
The server with innd installed has NetworkManager, so I could theoretically
enable it there.
http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ documents an
alternative target, systemd-networkd-wait-online.service, which does not
appear to actually exist anywhere, and is not installed by any package.
The more I dig into the config files, the bigger the clusterfark this
appears to be.
The starting point is the above documentation for network.target and network-
online.target. The above is supposed to be the authoritative documentation,
directly referenced from the man pages. Starting with that, I look at what
network-online.target actually says:
[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget
After=network.target
It doesn't do anything, it's just a symbolic target. That's fine, so intent
is that stuff that actually needs network connections should declare
"After=network-online.target". Then, whatever system service is responsible
for initializing the static network connections would declare both
"After=network.target" and "Before=network-online.target", so it runs after
basic networking is up. Once it succeeds in initializing the network
connections, it terminates, network-online.target now gets reached, and all
the services that depend on established network connections can now run.
That seems to be the desired strategy.
Sounds great. This is actually not a such a bad plan of action. It might
actually make sense, presuming that all servers that depend on established
network connections would specify "After=network-online.target", and not
"After=network.target", as they do now. Of course, as I discovered, only
kdump.service actually does this. So, this is the first thing that goes off
the rails. But the rest of the train quickly follows:
Now, given the initial design, one would automatically assume that
NetworkManager-wait-online.service would follow the master plan, and specify
"After=network.target" and "Before=network-online.target", putting all the
jigsaw pieces in the correct order. But no, this is what NetworkManager-wait-
online.service actually says:
[Unit]
Description=Network Manager Wait Online
Requisite=NetworkManager.service
After=NetworkManager.service
Wants=network.target
Before=network.target network-online.target
It specifies that it should be reached /before/ *both* network.target and
network-online.target, rather than after network.target, and before network-
online.target.
This really looks like somebody just said "eh, I'm just too lazy to fix all
services that should really be executed after reaching network-
online.target, I'm just going to fix this by executing NetworkManager-wait-
online.service before network.target is reached, and before all the servers
that currently require network.target get forked off".
Brilliant.
So, enabling NetworkManager-wait-online.service is required on servers that
run dhcp, named, httpd, and other servers. If it's not enabled, a roll of
the dice will determine whether any of them will come up properly. And I'll
bet none of these RPMs enable it, which is needed for this hack to work.
And, if NetworkManager is not enabled, with all network interfaces being
initialized to static IPs in /etc/sysconfig/network-scripts, I don't see a
way to get this right. It may or may not work, depending on the order
systemd chooses to execute scripts, and how long they take. Even the kernel
version could be a factor – how long the kernel takes to initialize each
network interface.
And the documented alternative, "systemd-networkd-wait-online.service", is
still nowhere to be found. yum whatprovides comes up empty.
It should be fun watching all of this implode from the sidelines, as all
servers running DHCP and httpd get updated to RHEL 7. Some of them will be
fine. Some of them will randomly fail to come up fully. Those that do manage
to work initially, at some point later a systemd update, or a kernel update,
will subtly change the order in which stuff gets forked off from systemd,
and suddenly break it.
Lots of fun.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.fedoraproject.org/pipermail/users/attachments/20140705/dfeee360/attachment.sig>
More information about the users
mailing list