Currently I am connecting to a VPN that provides a few DNS search entries. One of these domains on the search path is having DNS resolution problems. This is not per se the the problem I am writing this email for.
The problem is that starting Firefox and Thunderbird take a long time, it took time to detect the DNS resolution problem was the origin of these timeouts. I am not using that domain that is having resolution problems.
The real culprit is the default `fedora` hostname, instead of localhost. Starting a Wireshark capture there are DNS searches for fedora.domain_failing.tld, when starting Firefox and Thunderbird. The presence of the search path on generated /etc/resolv.conf isn't the cause of these DNS searches, I edited them out while the VPN was still active.
Even 'ping fedora' start doing these searches with the search paths appended. 'ping localhost' doesn't do that. The only workaround to this issue is to add fedora to the localhost entries on /etc/hosts.
This in some way is a DNS leak, even on a VPN with perfectly working DNS resolution, the fedora name should not be searched on these domains until I am using the fedora full hostname on these domains. Even worse when simply starting applications like Firefox o Thunderbird.
Maybe changing the default hostname to fedora wasn't a good idea after all, or at least fedora should be added to the default /etc/hosts.
Hi,
I have a couple different ideas of what could be going wrong. Let's test a few things. First, please run:
$ cat /etc/nsswitch.conf | grep hosts | tail -1
If it is our default configuration, it should say:
hosts: files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] myhostname dns
Now, see what happens if you disable systemd-resolved:
$ sudo systemctl stop systemd-resolved.service
Does the bug go away? If so, it's likely a systemd-resolved bug to be fixed. (Reenable systemd-resolved with 'sudo systemctl start systemd-resolved.service'.)
If the bug does NOT go away, then let's test one more thing: please edit /etc/authselect/user-nsswitch.conf as root and change the hosts line to look like this:
hosts: files myhostname mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns
Then run:
$ sudo authselect apply-changes
Does the bug go away? I think that should almost certainly "fix" it. If so, you have a good workaround, and we know the problem must be caused by avahi, and we should reconsider our NSS configuration. But if the bug does not go away after this big hammer, then it must be a Firefox/Thunderbird bug, because if they try to resolve anything that doesn't exactly match the local hostname, then of course we have to do some DNS.
I'm interested to see the your results,
Michael
On 3/24/21 11:26 PM, Michael Catanzaro wrote:
Hi,
I have a couple different ideas of what could be going wrong. Let's test a few things. First, please run:
$ cat /etc/nsswitch.conf | grep hosts | tail -1
If it is our default configuration, it should say:
hosts: files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] myhostname dns
Exactly the same output, nsswitch.conf is pointing to /etc/authselect/nsswitch.conf default
Now, see what happens if you disable systemd-resolved:
$ sudo systemctl stop systemd-resolved.service
This doesn't properly disable systemd-resolved, There is a DNS resolution error or two and then the service is autostarted (probably socket activation)
I entirely disabled it by changing dns=default in NetworkManager and renaming the /etc/resolv.conf symlink to another name.
Does the bug go away? If so, it's likely a systemd-resolved bug to be fixed. (Reenable systemd-resolved with 'sudo systemctl start systemd-resolved.service'.)
No, the bug dosn't go away. The fedora name is still searched on all search domains (traced bu wireshark) and not a simple direct local response like happens with localhost
If the bug does NOT go away, then let's test one more thing: please edit /etc/authselect/user-nsswitch.conf as root and change the hosts line to look like this:
hosts: files myhostname mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns
Then run:
$ sudo authselect apply-changes
With this the bug goes away.
Does the bug go away? I think that should almost certainly "fix" it. If so, you have a good workaround, and we know the problem must be caused by avahi, and we should reconsider our NSS configuration. But if the bug does not go away after this big hammer, then it must be a Firefox/Thunderbird bug, because if they try to resolve anything that doesn't exactly match the local hostname, then of course we have to do some DNS.
Notice that it isn't a Firefox and Thunderbird issue. 'ping fedora' have these long DNS timeouts looking fedora on the search domains. I agree that it is weird that these applications are doing lookups with the hostname, but ping should not be doing these either with fedora, exactly like localhost doesn't ends up as queries on the search domains.
I'm interested to see the your results,
Michael
OK, so then the problem here is avahi, or more specifically, that nss-mdns4_minimal is listed before nss-resolve and nss-myhostname. We need nss-myhostname to come before nss-mdns4_minimal. Drat. We spent a long time thinking about the order the NSS modules should be listed, but then made a last-minute change to move nss-mdns4_minimal forward in order to work around a bug with systemd-resolved not handling mDNS properly: https://bugzilla.redhat.com/show_bug.cgi?id=1867830. When we moved nss-mdns4_minimal, we should have moved nss-myhostname too.
This is a little hard to fix because the scriptlets that write this configuration are fragile, maintained in multiple places (systemd and avahi), and depend on assumptions about the previous state of the file (created by authselect normally, but possibly also by the glibc package). So the way our RPMs configure this line is really quite fragile unfortunately. We will have to discuss the best way to fix things.
For now, keep nss-myhostname at the start of the line, right after files. We will probably need to find a way to either (a) fix systemd-resolved to handle mDNS properly, so we can move it after nss-resolve, where it really belongs, or (b) move nss-myhostname in front of nss-mdns4_minimal.
Michael
On Thu, Mar 25 2021 at 09:26:19 AM -0500, Michael Catanzaro mcatanzaro@gnome.org wrote:
We spent a long time thinking about the order the NSS modules should be listed, but then made a last-minute change to move nss-mdns4_minimal forward in order to work around a bug with systemd-resolved not handling mDNS properly: https://bugzilla.redhat.com/show_bug.cgi?id=1867830.
Looking at that bug, it seems to be a NetworkManager issue. I'm not familiar with mDNS and actually don't know how it's supposed to work....
On Thu, Mar 25 2021 at 09:26:19 AM -0500, Michael Catanzaro mcatanzaro@gnome.org wrote:
For now, keep nss-myhostname at the start of the line, right after files. We will probably need to find a way to either (a) fix systemd-resolved to handle mDNS properly, so we can move it after nss-resolve, where it really belongs, or (b) move nss-myhostname in front of nss-mdns4_minimal.
OK, I've reported https://bugzilla.redhat.com/show_bug.cgi?id=1943199. It requires further discussion though, to see if the systemd package maintainers agree.
On Thu, Mar 25, 2021 at 09:54:11AM -0500, Michael Catanzaro wrote:
On Thu, Mar 25 2021 at 09:26:19 AM -0500, Michael Catanzaro mcatanzaro@gnome.org wrote:
For now, keep nss-myhostname at the start of the line, right after files. We will probably need to find a way to either (a) fix systemd-resolved to handle mDNS properly, so we can move it after nss-resolve, where it really belongs, or (b) move nss-myhostname in front of nss-mdns4_minimal.
OK, I've reported https://bugzilla.redhat.com/show_bug.cgi?id=1943199. It requires further discussion though, to see if the systemd package maintainers agree.
Yeah, I think that's the way to go.
Zbyszek
I've been having problems with DNS resolution in F33 as well: I use F5 VPN (work requirement).
I tried your nsswitch recipe, but got some errors:
authselect apply-changes [error] [/etc/nsswitch.conf] is not a symbolic link! [error] [/etc/nsswitch.conf] was not created by authselect! [error] Unexpected changes to the configuration were detected. [error] Refusing to activate profile unless those changes are removed or overwrite is requested. Some unexpected changes to the configuration were detected. Use 'select' command instead.
and the DNS still returns 'Name or service not known'
I've been successfully fixing my problem by running explicit
sudo resolvectl dns eno1 <my.DNS.server.ip>
sudo resolvectl domain eno1 <my.domain>
As to the issues with F5, I see that it rewrites /etc/hosts
#F5 Networks Inc. :File modified by VPN process 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
12.34.56.78 f5.server.mycompany.com
BTW, why isn't RPM seeing that change?
rpm -qf /etc/hosts setup-2.13.7-2.fc33.noarch
rpm -q --verify setup .M....... c /etc/fstab S.5....T. c /etc/printcap .M....G.. g /var/log/lastlog
On Fri, Mar 26 2021 at 01:24:35 PM -0400, przemek klosowski via devel devel@lists.fedoraproject.org wrote:
As to the issues with F5, I see that it rewrites /etc/hosts
You can ask them to fix their software according to my instructions here:
https://blogs.gnome.org/mcatanzaro/2020/12/17/understanding-systemd-resolved...
under the heading "Custom VPN Software." Your problem is unrelated to this thread, so if you want to discuss it further, please create a new topic rather than respond further here.
Hi,
I would guess your domainname is not (none), and hostname -f value is fedora.domain_failing.tld. One of fixes might be to change hostname of the machine to not contain domains suffix. Then only explicitly configured search would apply.
On 3/25/21 2:51 AM, Robert Marcano via devel wrote:
Currently I am connecting to a VPN that provides a few DNS search entries. One of these domains on the search path is having DNS resolution problems. This is not per se the the problem I am writing this email for.
The problem is that starting Firefox and Thunderbird take a long time, it took time to detect the DNS resolution problem was the origin of these timeouts. I am not using that domain that is having resolution problems.
Would dig fedora.domain_failing.tld take long before VPN is estabilished? Does it timeout when connecting or after connected? Timeout might mean some of connection provided servers does not respond or route to it does not work. Even searches should mean just more packets, not visibly longer delay.
The real culprit is the default `fedora` hostname, instead of localhost. Starting a Wireshark capture there are DNS searches for fedora.domain_failing.tld, when starting Firefox and Thunderbird. The presence of the search path on generated /etc/resolv.conf isn't the cause of these DNS searches, I edited them out while the VPN was still active.
Try not commenting it out, but override default system value in /etc/resolv.conf: search .
Even 'ping fedora' start doing these searches with the search paths appended. 'ping localhost' doesn't do that. The only workaround to this issue is to add fedora to the localhost entries on /etc/hosts.
That would be likely because localhost is in /etc/hosts, read by files in nsswitch. But dns queries (if systemd-resolved is disabled) are configured by /etc/resolv.conf.
This in some way is a DNS leak, even on a VPN with perfectly working DNS resolution, the fedora name should not be searched on these domains until I am using the fedora full hostname on these domains. Even worse when simply starting applications like Firefox o Thunderbird.
Are you sure you do not have hostname set to FQDN? Have you tried setting it to relative name (no dots)?
Maybe changing the default hostname to fedora wasn't a good idea after all, or at least fedora should be added to the default /etc/hosts.
It should not be necessary unless fqdn is used as a hostname. "fedora" value should be completely ok. But I guess even when connecting to VPN, it should not timeout. DNS settings should be changed only after VPN is connected and ready to forward packets. Are you sure no IP range conflicts with used DNS servers?
Cheers, Petr
On 3/25/21 7:30 AM, Petr Menšík wrote:
Hi,
I would guess your domainname is not (none), and hostname -f value is fedora.domain_failing.tld. One of fixes might be to change hostname of the machine to not contain domains suffix. Then only explicitly configured search would apply.
No:
# hostname -f fedora
On 3/25/21 2:51 AM, Robert Marcano via devel wrote:
Currently I am connecting to a VPN that provides a few DNS search entries. One of these domains on the search path is having DNS resolution problems. This is not per se the the problem I am writing this email for.
The problem is that starting Firefox and Thunderbird take a long time, it took time to detect the DNS resolution problem was the origin of these timeouts. I am not using that domain that is having resolution problems.
Would dig fedora.domain_failing.tld take long before VPN is estabilished? Does it timeout when connecting or after connected? Timeout might mean some of connection provided servers does not respond or route to it does not work. Even searches should mean just more packets, not visibly longer delay.
It doesn't take long because fedora.domain_failing.tld fails fast on the default network DNS, domain_failing.tld is a domain only available on the VPN
The real culprit is the default `fedora` hostname, instead of localhost. Starting a Wireshark capture there are DNS searches for fedora.domain_failing.tld, when starting Firefox and Thunderbird. The presence of the search path on generated /etc/resolv.conf isn't the cause of these DNS searches, I edited them out while the VPN was still active.
Try not commenting it out, but override default system value in /etc/resolv.conf: search .
Even 'ping fedora' start doing these searches with the search paths appended. 'ping localhost' doesn't do that. The only workaround to this issue is to add fedora to the localhost entries on /etc/hosts.
That would be likely because localhost is in /etc/hosts, read by files in nsswitch. But dns queries (if systemd-resolved is disabled) are configured by /etc/resolv.conf.
This in some way is a DNS leak, even on a VPN with perfectly working DNS resolution, the fedora name should not be searched on these domains until I am using the fedora full hostname on these domains. Even worse when simply starting applications like Firefox o Thunderbird.
Are you sure you do not have hostname set to FQDN? Have you tried setting it to relative name (no dots)?
Maybe changing the default hostname to fedora wasn't a good idea after all, or at least fedora should be added to the default /etc/hosts.
It should not be necessary unless fqdn is used as a hostname. "fedora" value should be completely ok. But I guess even when connecting to VPN, it should not timeout. DNS settings should be changed only after VPN is connected and ready to forward packets. Are you sure no IP range conflicts with used DNS servers?
Cheers, Petr
On 3/24/21 9:51 PM, Robert Marcano wrote:
Currently I am connecting to a VPN that provides a few DNS search entries. One of these domains on the search path is having DNS resolution problems. This is not per se the the problem I am writing this email for.
The problem is that starting Firefox and Thunderbird take a long time, it took time to detect the DNS resolution problem was the origin of these timeouts. I am not using that domain that is having resolution problems.
The real culprit is the default `fedora` hostname, instead of localhost. Starting a Wireshark capture there are DNS searches for fedora.domain_failing.tld, when starting Firefox and Thunderbird. The presence of the search path on generated /etc/resolv.conf isn't the cause of these DNS searches, I edited them out while the VPN was still active.
Even 'ping fedora' start doing these searches with the search paths appended. 'ping localhost' doesn't do that. The only workaround to this issue is to add fedora to the localhost entries on /etc/hosts.
This in some way is a DNS leak, even on a VPN with perfectly working DNS resolution, the fedora name should not be searched on these domains until I am using the fedora full hostname on these domains. Even worse when simply starting applications like Firefox o Thunderbird.
Maybe changing the default hostname to fedora wasn't a good idea after all, or at least fedora should be added to the default /etc/hosts.
About the default fedora transient hostname nchange. This has caused more problems that really solved.
Sometime ago the default HOSTNAME environment variable was changed to use in /etc/profile
HOSTNAME=`/usr/bin/hostnamectl --transient`
This didn't cause any problems initially because the the default was localhost.localdomain, but now that is fedora. If you reach the desktop before plugin in your laptop to the network and your network DHCP server assigns you a hostname, you get a entire session where the HOSTNAME isn't resolvable, because fedora is only resolvable when the transient host name was set as fedora, but it was overriden by the DHCP server.
Tilix was one of the programs with problems with this, you get an annoying warning. I solved this by adding HOSTNAME=`hostname` to .bashrc
IMHO the fedora name should be always resolvable the same way as localhost or just remove it. It is not right thsat fedora is being resolved only while the DHCP server isn't assigning you a new hostname. You never know when a DHCP server will decide to send you one, especially if you move around many WiFi hotspots
On Thu, Mar 25 2021 at 08:37:03 AM -0400, Robert Marcano via devel devel@lists.fedoraproject.org wrote:
IMHO the fedora name should be always resolvable the same way as localhost or just remove it. It is not right thsat fedora is being resolved only while the DHCP server isn't assigning you a new hostname. You never know when a DHCP server will decide to send you one, especially if you move around many WiFi hotspots
The problem is not specific to the name "fedora" though. Our intended behavior is to resolve the local hostname locally, without doing any DNS, regardless of what its name is. This broke in Fedora 33 and you're just the first person to notice and complain afaik.
So editing /etc/hosts is not the right solution. We need to change our NSS configuration. I'll send another mail about this.
On 3/25/21 10:21 AM, Michael Catanzaro wrote:
On Thu, Mar 25 2021 at 08:37:03 AM -0400, Robert Marcano via devel devel@lists.fedoraproject.org wrote:
IMHO the fedora name should be always resolvable the same way as localhost or just remove it. It is not right thsat fedora is being resolved only while the DHCP server isn't assigning you a new hostname. You never know when a DHCP server will decide to send you one, especially if you move around many WiFi hotspots
The problem is not specific to the name "fedora" though. Our intended behavior is to resolve the local hostname locally, without doing any DNS, regardless of what its name is. This broke in Fedora 33 and you're just the first person to notice and complain afaik.
Thank for looking at it. Without the broken VPN DNS and Firefox doing some weird lookup at startup for the hostname I would have never noticed it.
I wonder if the Firefox thing has to do with their policies. Apparently they have to be processed very early at startup and maybe there is something in there slowing things down, but that would be another bug than this one.
So editing /etc/hosts is not the right solution. We need to change our NSS configuration. I'll send another mail about this.
On Wed, Mar 24, 2021 at 7:51 PM Robert Marcano via devel devel@lists.fedoraproject.org wrote:
Maybe changing the default hostname to fedora wasn't a good idea after all, or at least fedora should be added to the default /etc/hosts.
Note that setting the hostname to "fedora" also led to log spam, for me at least:
https://bugzilla.redhat.com/show_bug.cgi?id=1893223
I am interested to learn whether that happened to anybody else.