Wiki -> https://fedoraproject.org/wiki/Changes/Enable_IPv4_Address_Conflict_Detectio...
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.
== Summary ==
Enable IPv4 Address Conflict Detection by default in NetworkManager.
== Owner ==
* Name: [[User:bengal| Beniamino Galvani]], [[User:ihuguet| Íñigo Huguet ]]
* Email: bgalvani@redhat.com, ihuguet@redhat.com
== Detailed Description == A common source of networking issues is the presence of duplicate IPv4 addresses in the same physical network. Such problems are quite common, and at the same time hard to diagnose for users.
To the rescue comes [https://www.rfc-editor.org/rfc/rfc5227 RFC 5227] (“IPv4 Address Conflict Detection”) which provides a mechanism to detect address conflicts. A host implementing Address Conflict Detection (from now on “ACD”) sends ARP probes for each IP address it wants to use; if another host replies, the address is already in use and can’t be configured on the interface.
Note that this mechanism applies to both static and DHCP addresses. It might seem unnecessary for DHCP, as a well-behaving server should give out unique leases; however, there could be hosts on the network not using DHCP. Indeed, [https://www.rfc-editor.org/rfc/rfc2131 RFC 2131] (Dynamic Host Configuration Protocol) specifies that the client should probe the newly received address and should send a DHCPDECLINE to the DHCP server if the address is already in use.
In Fedora 39, ACD is disabled by default; it can be enabled by setting property “ipv4.dad-timeout” to a positive value in a connection profile. The property name contains “DAD” which stands for “duplicate address detection” and is another name of ACD. The property specifies the maximum timeout in milliseconds used to check for the presence of duplicate IP addresses on the network. If a duplicate is found, a warning is logged; in the DHCP case, NetworkManager tries to get a different lease, while in the static case, the address is just skipped.
This change aims at enabling ACD by default in Fedora 40, by setting the default value to 3000ms. Note that this change is only about IPV4; IPv6 always performs a duplicate check for each address that is configured, as specified by RFC 4862.
== Benefit to Fedora == NetworkManager will not configure IPv4 addresses that are detected as duplicate. This will save users from having to debug weird connectivity issues. Instead, NetworkManager will report an error and will indicate the MAC of the conflicting host.
== Scope == * Proposal owners: change the default value, test that no regression is seen in the upstream test suite.
* Other developers: N/A (not needed for this Change)
* Release engineering: [https://pagure.io/releng/issues #Releng issue number]
* Policies and guidelines: N/A (not needed for this Change)
* Trademark approval: N/A (not needed for this Change)
== Upgrade/compatibility impact == The change in default behavior will affect all users that install or upgrade to the new Fedora release.
== How To Test == To test the effect of the change on F39, add the following configuration snippet to file `/etc/NetworkManager/conf.d/20-ipv4-dad.conf` and then restart the NetworkManager service:
[connection-dad-default] ipv4.dad-timeout=3000
To trigger a conflict, configure the local machine with a static address that is already in use by another host. When bringing up the connection, it will fail and report an address conflict.
== User Experience == Enabling ACD will cause an additional delay when bringing up interfaces, because NetworkManager needs first to probe the address. The delay is between 1.5 and 3 seconds, because RFC 5227 requires that the probe interval is randomized. The delay will affect both static and DHCP connections.
In case users want to avoid this delay, ACD can be disabled for the specific connection profile by setting property `ipv4.dad-timeout=0`, or globally by adding the following configuration snippet to `/etc/NetworkManager/conf.d/20-ipv4-dad.conf`:
[connection-dad-default] ipv4.dad-timeout=0
Apart from this small delay, the big advantage of this change is that users will be able to discover the potential conflict immediately. If the address is static, the activation will fail and report an error. For DHCP, NetworkManager will send a DHCPDECLINE message to the server and it will try to get a different lease. In all cases, the conflicting address will be skipped and the network will not be brought in an inconsistent state.
== Dependencies == N/A
== Contingency Plan == * Contingency mechanism: Revert the change, try again the next Fedora release. * Contingency deadline: Beta freeze * Blocks release? No
== Documentation == The “nm-settings” man page will indicate the new default value. No other documentation changes are required.
== Release Notes == The change needs to be mentioned in the release notes.
Once upon a time, Aoife Moloney amoloney@redhat.com said:
Enable IPv4 Address Conflict Detection by default in NetworkManager.
Huh, I didn't realize NM didn't already do this... ye olde network-scripts did.
To the rescue comes [https://www.rfc-editor.org/rfc/rfc5227 RFC 5227] (“IPv4 Address Conflict Detection”) which provides a mechanism to detect address conflicts. A host implementing Address Conflict Detection (from now on “ACD”) sends ARP probes for each IP address it wants to use; if another host replies, the address is already in use and can’t be configured on the interface.
How does NM handle a duplicate address if there are multiple addresses configured on the interface? Does it continue with the non-dupe addresses or deconfigure the whole interface?
When there are multiple addresses configured, does NM run DAD in series or parallel?
This change aims at enabling ACD by default in Fedora 40, by setting the default value to 3000ms.
3 seconds seems kind of high (IIRC network-scripts used 1 second).
On Wed, Dec 20, 2023 at 01:51:01PM -0600, Chris Adams wrote:
Once upon a time, Aoife Moloney amoloney@redhat.com said:
Enable IPv4 Address Conflict Detection by default in NetworkManager.
Huh, I didn't realize NM didn't already do this... ye olde network-scripts did.
To the rescue comes [https://www.rfc-editor.org/rfc/rfc5227 RFC 5227] (“IPv4 Address Conflict Detection”) which provides a mechanism to detect address conflicts. A host implementing Address Conflict Detection (from now on “ACD”) sends ARP probes for each IP address it wants to use; if another host replies, the address is already in use and can’t be configured on the interface.
How does NM handle a duplicate address if there are multiple addresses configured on the interface? Does it continue with the non-dupe addresses or deconfigure the whole interface?
It continues with only the non-duplicate addresses. A warning will be visible in the journal telling what address(es) failed ACD, and what is the MAC address of the conflicting host(s).
If all the IPv4 addresses are found to be duplicate, the IPv4 address family fails. Normally, NetworkManager also tries IPv6, but that depends on other connection parameters such as 'ipv6.method', 'ipv4.may-fail'.
When there are multiple addresses configured, does NM run DAD in series or parallel?
The probe is done in parallel for all addresses at the same time.
This change aims at enabling ACD by default in Fedora 40, by setting the default value to 3000ms.
3 seconds seems kind of high (IIRC network-scripts used 1 second).
network-scripts do [1]:
/sbin/arping -c 2 -w ${ARPING_WAIT:-3} -D -I ${REALDEVICE} ${ipaddr[$idx]}
which waits 2 seconds by default.
In the original RFC, the duration of the ACD process is between 4 and 7 seconds (depending on randomization), which is clearly too long on modern hardware.
In the Fedora change proposal, the default ACD interval in NM is set to up to 3 seconds and is subject to the same randomization; in practice it would be between ~1.7 and 3 seconds. Perhaps that's still too much, and we can safely decrease it to e.g. 1 second max to reduce the activation delay.
Beniamino
[1] https://github.com/fedora-sysv/initscripts/blob/10.19/network-scripts/ifup-e...
Once upon a time, Beniamino Galvani bgalvani@redhat.com said:
network-scripts do [1]:
/sbin/arping -c 2 -w ${ARPING_WAIT:-3} -D -I ${REALDEVICE} ${ipaddr[$idx]}
which waits 2 seconds by default.
Ahh, sorry, that's what I get for depending on memory. :)
In the original RFC, the duration of the ACD process is between 4 and 7 seconds (depending on randomization), which is clearly too long on modern hardware.
Definitely agree.
In the Fedora change proposal, the default ACD interval in NM is set to up to 3 seconds and is subject to the same randomization; in practice it would be between ~1.7 and 3 seconds. Perhaps that's still too much, and we can safely decrease it to e.g. 1 second max to reduce the activation delay.
Yeah, I think sending 2-3 requests separated by maybe 0.2 seconds, and waiting another 0.2 seconds for a reply (so a total of 0.8 seconds) is sufficient for modern networks. A number of DHCP servers do a ping before issue as well (although there's no good way for a DHCP client to tell), so it's just adding to the amount of time before the network becomes usable.
DAD/ACD is a good thing, I'd just like to see the impact minimized. The time taken at boot is not a big deal (as users have to log in and start applications and such), but the time taken on resume from sleep is more noticable (open the notebook lid, unlock, then... wait).
Thinking about servers... this would happen before network-online.target is triggered, right? Any services that try to bind to configured IPs or the like need to still work.
On Thu, Dec 21, 2023 at 10:35:45AM -0600, Chris Adams wrote:
Once upon a time, Beniamino Galvani bgalvani@redhat.com said:
network-scripts do [1]:
/sbin/arping -c 2 -w ${ARPING_WAIT:-3} -D -I ${REALDEVICE} ${ipaddr[$idx]}
which waits 2 seconds by default.
Ahh, sorry, that's what I get for depending on memory. :)
In the original RFC, the duration of the ACD process is between 4 and 7 seconds (depending on randomization), which is clearly too long on modern hardware.
Definitely agree.
In the Fedora change proposal, the default ACD interval in NM is set to up to 3 seconds and is subject to the same randomization; in practice it would be between ~1.7 and 3 seconds. Perhaps that's still too much, and we can safely decrease it to e.g. 1 second max to reduce the activation delay.
Yeah, I think sending 2-3 requests separated by maybe 0.2 seconds, and waiting another 0.2 seconds for a reply (so a total of 0.8 seconds) is sufficient for modern networks.
Right, by setting a maximum duration of e.g. 0.8 seconds, we get 3 probes spaced between 90ms and 270ms and a final wait time of 180ms, according to the algorithm from RFC 5227.
A number of DHCP servers do a ping before issue as well (although there's no good way for a DHCP client to tell), so it's just adding to the amount of time before the network becomes usable.
DAD/ACD is a good thing, I'd just like to see the impact minimized. The time taken at boot is not a big deal (as users have to log in and start applications and such), but the time taken on resume from sleep is more noticable (open the notebook lid, unlock, then... wait).
Thinking about servers... this would happen before network-online.target is triggered, right? Any services that try to bind to configured IPs or the like need to still work.
The short answer is: enabling ACD will not affect services that bind to configured IPs, because ACD is done before the connection becomes activated, which is a pre-requisite for network-online.target.
In practice, it's a bit more complex than that. network-online.target is emitted after all NM connections succeed. The meaning of "success" depends on properties "ipv4.may-fail" and "ipv6.may-fail" of the connection profile. Normally they are both set to "yes" and this means that just one of IPv4 and IPv6 is enough to reach the activated state.
If the connection has static IPv4 addresses and "auto" IPv6 (i.e. SLAAC plus optionally DHCPv6), before enabling ACD it was guaranteed that IPv4 addresses were added before reaching network-online. After enabling ACD, both IPv4 ACD and IPv6 SLAAC are started in parallel and the first that completes will make the connection succeed. However, in practice IPv6 also requires DAD and the timeout is longer than the IPv4 ACD timeout; so, services that bind to static IPv4 addresses can still rely on the addresses being present after network-online.target is reached.
Of course, in case of services that bind IPv4 to addresses, the best solution is to set "ipv4.may-fail=no" (or for IPv6 addresses, "ipv6.may-fail=no") in the connection profile. That is required when using "auto" methods, in order to avoid the situation where the connection succeeds after the "other" address family completes.
Beniamino
Once upon a time, Beniamino Galvani bgalvani@redhat.com said:
In practice, it's a bit more complex than that. network-online.target is emitted after all NM connections succeed. The meaning of "success" depends on properties "ipv4.may-fail" and "ipv6.may-fail" of the connection profile. Normally they are both set to "yes" and this means that just one of IPv4 and IPv6 is enough to reach the activated state.
If the connection has static IPv4 addresses and "auto" IPv6 (i.e. SLAAC plus optionally DHCPv6), before enabling ACD it was guaranteed that IPv4 addresses were added before reaching network-online. After enabling ACD, both IPv4 ACD and IPv6 SLAAC are started in parallel and the first that completes will make the connection succeed. However, in practice IPv6 also requires DAD and the timeout is longer than the IPv4 ACD timeout; so, services that bind to static IPv4 addresses can still rely on the addresses being present after network-online.target is reached.
Of course, in case of services that bind IPv4 to addresses, the best solution is to set "ipv4.may-fail=no" (or for IPv6 addresses, "ipv6.may-fail=no") in the connection profile. That is required when using "auto" methods, in order to avoid the situation where the connection succeeds after the "other" address family completes.
Thanks for that detailed explanation! I had't seen that level of what network-online.target actually means.
On Wed, Dec 20, 2023 at 7:51 PM Chris Adams linux@cmadams.net wrote:
Once upon a time, Aoife Moloney amoloney@redhat.com said:
Enable IPv4 Address Conflict Detection by default in NetworkManager.
Huh, I didn't realize NM didn't already do this... ye olde network-scripts did.
As I recall, depending on configuration(s), systemd-networkd has done so for quite some time. Off hand I do not recall its various values, but it might make sense to align the settings.
On Thu, Dec 21, 2023 at 04:51:37PM +0000, Gary Buhrmaster wrote:
On Wed, Dec 20, 2023 at 7:51 PM Chris Adams linux@cmadams.net wrote:
Once upon a time, Aoife Moloney amoloney@redhat.com said:
Enable IPv4 Address Conflict Detection by default in NetworkManager.
Huh, I didn't realize NM didn't already do this... ye olde network-scripts did.
As I recall, depending on configuration(s), systemd-networkd has done so for quite some time. Off hand I do not recall its various values, but it might make sense to align the settings.
systemd-networkd supports DAD for both static and dynamic addresses, and it's disabled by default. For static addresses, DAD must be enabled via:
[Address] Address=192.168.1.1/24 DuplicateAddressDetection=ipv4
For DHCP, it is enabled by setting Dhcpv4.SendDecline=true
In both cases, the maximum timeout is the one from RFC 5227 (9 seconds) and it's not configurable. I have filed an issue to make the value configurable:
https://github.com/systemd/systemd/issues/30724
Beniamino