default local DNS caching name server

Sun Apr 13 16:16:35 UTC 2014

On Sun, 13 Apr 2014, William Brown wrote:

>> Yes. It depends on the "trustworthiness" of the network and or
>> preconfiguration of some of your own networks you join.
>
> Not really: Every network you join, you have to semi-trust. If you don't
> trust it, why did you join it?

You don't always control which networks your device roams on. If I agree
to starbucks at my street, my phone will connect to any network named
starbucks, even if it is yours. So to draw the line between the user
_knowingly_ joining a network, we drew the line at "plug in physically
or provided the authentication credentials".

>> Works reasonably well with unbound+dnssec-triger, could use better NM
>> integration for captive portals.
>
> But you can't account for every captive portal in the world. This is why
> the cache is a bad idea, because you can't possibly account for every
> system that is captive like this.

Yes we can by monitoring for "captivity signs" when a new network is
joined. Again, please yum install dnssec-trigger on your laptop and
start the dnssec-trigger applet once, and go have a coffee outside.
Let us know your experience.

>>> Case 2: Moderate home user. They have a little knowledge of DNS, and
>>> have setup a system like OpenWRT or gargoyle on their router. They have
>>> their own zone, .local . This means that their DHCP provides the DNS ip
>>> of the router to clients.
>>
>> Same if their wifi is closed (eg WPA2), will need an exception in NM if
>> their wifi is open for the .local forward.
>
> What if I call my network .concrete. Or .starfish. Or any other weird
> thing I have seen on personal networks. Again, you cannot bypass the
> local network DNS as the forwarder. You must respect it.

We will! If your DHCP has:

 			option domain-name-servers	10.1.2.3;
 			option domain-name "starfish";

Then unbound would get a forward configured to use 10.1.2.3 for the domain .starfish,
basically calling:

sudo unbound-control forward_add starfish 10.1.2.3.
sudo unbound-control flush starfish
sudo unbound-control flush_requestlist

When you leave the network, forward_remove is called.

sudo unbound-control forward_remove starfish
sudo unbound-control flush startfish
sudo unbound-control flush_requestlist

>> When connecting to their LAN or secure wifi, same as above for one one
>> forwarding zone. Multiple forwarding zones would need to be configured.
>> It if is an enterprise, they might need their corporate CAs as well as
>> their zones configuration, so a corporate rpm package would make sense.
>
> How do you plan to make this work? You can't magically discover all the
> DNS zones hosted in an enterprise. At my work we run nearly 100 zones,
> and they are all based at different points (IE, a.com, b.com, c.com.)
> You cannot assume a business has just "a.com" and you can forward all
> quieries for subtree.a.com to that network server.

If you are that large a business, you should really have a corporate
build rpm package with your enterprise information such as local CA,
local zones, etc. DNS forwarder zones can be dropped into
/etc/unbound/*.d/ currently. I would expect we would make this software
neutral via NM integration, where an NM unbound plugin would use those
directories. We could add a per-network option that specifies to use a
forward for "." (everything) instead of just the DHCP specified domain,
or perhaps even do this for trusted (see above) networks.

However, that should not be the default for open wifi networks for
security reasons.

> Again, you *must* respect the DHCP provided DNS server as the forwarder
> else you will savagely break things.

And not doing anything will cause people to have insecure DNS. So I think
the question should be turned around a little bit. There is a need for
DNSSEC on the end nodes - how can we best facilitate that while trying
to be as supportive of current deployments as we can be? That is what
we are trying to do. If you only counter with "I require insecure DNS
for my network to function" or "all cache is evil", than you are not
openminded enough to the realities of the requirement of DNSSEC support.

>> Same, already works if you only need the one domain that is negotiated
>> via the VPN (eg the IKE XAUTH domain).
>
> You can negotiate more than one domain on a VPN .... again, see above.

not with IPsec/XAUTH. If more domains can come in via openvpn or
something, that I would assume the existing openvpn unbound plugin
already deals with that case. If not, please file a bug and we will fix
it.

>> We are not suggesting that for LAN or secure wifi. In those cases the
>> forward will be added. However, you don' want those forwards for open
>> wifi or else I can bring up "linksys" push you a forward for your
>> internal.domain.com and mislead you into thinking you would be going
>> over your VPN.
>
> This is a more serious problem, than a caching resolver could hope to
> solve as it shows malicious intent.

I'm sorry I don't understand what you are trying to say here.

>>> Case 1: The user doesn't know much about DNS. the ISP might be reliable
>>> or unreliable. If we assume as discussed that the cache is flushed on
>>> network change, they will have an empty cache.
>>
>> The cache is never fully flushed. It is only flushed for the domain
>> obtained via DHCP or VPN, because those entries can change. They are not
>> changed for anything else. If the upstream ISP could have spoofed them,
>> so be it - the publisher of the domains could have used DNSSEC to
>> prevent that from happening.
>
> No no no!!!! You need to flush *all* entries. Consider what I resolve
> www.google.com to. That changes *per* ISP because google provides
> different DNS endpoints and zones to ISPs to optimise traffic! So when I
> use google at work, I'm now getting a suboptimal route to their servers!

google publishes TTLs for that which are honoured. If google requires
different records when you switch ISPs, they need to use shorter TTLs.
The publisher decides here, not the consumer. Additionally, to resolve
these issues, there is a new draft that has been implemented by some
(such as opendns which specifically has this problem at a large scale):

https://tools.ietf.org/html/draft-vandergaast-edns-client-subnet-02

So I consider this a solved problem, even if code and deployment is not
there yet at this moment.

> So that's a valid point: A non-caching unbound that caps TTLs is a good
> idea, but as you say, you can't stop a dodgy ISP.

Actually you can! A captive hotspot is not much different from a dodgy
ISP. unbound tries its best to not use any DNS server that messes with
DNS. So ISPs like Rogers who like to rewrite DNS packets are explicitely
not used by unbound - it prefers to become a full recursive server
without offloading to any forwarder if the forwarder is that malicious.
We even run DNS resolvers as Fedora infrastructure that provides DNS
over TCP-80 and DNS over TLS-443 as alternatives to work around these
broken ISPs that also block port 53 in an attempt to force you to use
their DNS lies.

>>> Case 2: The user does know a bit. But when they change name records they
>>> may not be able to solve why a workstation can't resolve names like
>>> other clients.
>>
>> While we could flush the entire cache on (dis)connect, I think that's
>> rather drastic for this kind of odd use-case. If the user runs their own
>> zone and their own records, they should know about DNS and TTLs. But
>> even so, NM could offer an option to flush the DNS cache.
>
> But this isn't even an odd use case. There are enough power users in the
> world who do this. It's not just computer enthusiasts, I know a chemist
> who did this, and others. You can't just assume a generic case, and then
> break it for others.

If you are changing DNS records, you need to understand TTL and cache
flushing. If you don't than sure, you can be the clueless windows user
that reboots their machine. I care much more about some of the more
realistic use cases of fedora machines connected over 3G, where latency
matters and flushing the entire cache would cause both more traffic and
more latency. And things like pre-fetching where we renew cached DNS
entries that are still being served from cache, to avoid the outage when
the record expires.

>>> Case 3: This user does understand DNS, and they don't need DNS cache.
>>
>> That depends. You need caching for DNSSEC validation, so really, every
>> device needs a cache, unless you want to outsource your DNSSEC
>> validation over an insecure transport (LAN). That seems like a very bad
>> idea.
>
> If your lan is insecure, you have other issues. That isn't the problem
> you are trying to solve.

Yes it is. When I'm at the coffee shop, my LAN is insecure. I don't want
to trust DNS answers coming in. I want to validate those using DNSSEC
on my own device. So I need to run a validating recursive (caching) nameserver
for very valid security reasons - so that the guy next to me cannot spoof
paypal.com.

>>> They have bind / named setup, and they would like to rely on that
>>> instead.
>>
>> They can. DNS caches are chained. There is no reason to say you cannot
>> run your own cache and have a network based cache.
>
> But you don't *need* it. I went to efforts to setup my own bind to
> cache, I shouldn't need it on my system. Again, local caches cause all
> kinds of issues. A home user is likely to toy with things and set a
> high-ish ttl, say even 10 minutes, and change records on their server.
> Then their records appear broken, because the local cache isn't expired
> yet.

See above where the same argument was discussed. But also, you would
have the exact same problem on many devices on your network that won't
throw away that DNS record immediately. In-browser caches, OSX system wide
cache, and who knows what your PVR, game console and TV do these days. If
this worked for you in the past, you were lucky AND you engineered this
to work. If you handed that solution to you unknowledgable chemist, it's
time to update their solution to meet the modern demands of facilating
to use DNSSEC on every device.

>>> When they change records in their local zones, they don't want
>>> to have to flush caches etc. If their ISP is unreliable, or their own
>>> DNS is unreliable, a DNS cache will potentially mask this issue delaying
>>> them from noticing / solving the problem.
>>
>> This is becoming really contrived. Again, if you think this is a real
>> scenario (I don't think it is) than you could run unbound with ttl=0.
>> But a requirement of automagically understanding what a local zone is
>> and automagically understanding when a remote authoritative dns server
>> changes data, and not willing to enforce that with ttl=0, and using
>> that as argument why any solution of unbound to provide a security
>> feature (DNSSEC) is getting a little unrealistic. If you want your
>> laptop to start validating TLSA and SSHP and OPENPGPKEY records, you
>> need DNSSEC validation on the device. The question should be "how do you
>> change your network requirements to meet that goal". Yes, enforcing
>> security comes at a price.
>
> It's not contrived: This is a common network setup for all the people I
> know who are enthusiasts or how they setup their home networks. This is
> why it's a use case.

I suggest you keep a close eye on the IETF HOMENET people, because
DNSSEC is coming into your home automation one way or the other, and if
you depend on this system, you will run into trouble into the future.

>> Let me use your scenario based on TLS. You want to be able to change
>> your TLS certificates and the private CA you regenerate at any time,
>> without any browser on your network ever giving you a popup warning.
>> You know you cannot ask this - it goes against the security model. The
>> same applies for DNS with DNSSEC. The security demands we need to do
>> validation and caching and we should try to make that as flexible and
>> painless as possible.
>
> The issue is that by adding DNSSEC in this way, you are going to cause a
> great deal of pain because these caches. Add DNSSEC, but if you need to
> cache, cache for the most minimal time possible.

As I argued in the last few days, I do not see this "great deal of
pain" and I've provided an unbound workaround for you, and your corner
case can be dealt with via a new NM option.

> It's linked to the other cases. It's the point that local system caches
> aren't needed as you have access to highly reliable DNS systems.

You will just have to come to term with the fact that caches are needed
when you are doing constant DNSSEC validation. So your argument that
caches are not needed might have been true in the past, but is no
longer. Now let's work on ensuring your exception cases can be supported
in the precense of caches.

> Additionally, business networks are "trusted" so you can trust their DNS
> caches etc. (to a point)

Business networks are never compromised? But as I stated, we already
said we will do the forward using the DHCP supplied nameserver in case
of a LAN or secured WIFI connection.

>>> Case 8: Vpns are a bit unreliable, and have relatively high(ish)
>>> latency. But mostly they are quite good, ie openvpn. DNS cache *might*
>>> help here in case of traffic loss. Again, this would be masking a
>>> greater issue though, and could be better solved with TCP dns queries
>>> rather than UDP.
>>
>> The VPN cases aleady work very well in Fedora. I seamlessly connect and
>> disconnect from the redhat VPN. Resources that are available only via
>> the VPN are never blocked by wrong DNS cache I got from when the VPN was
>> down. VPNs are a non-issue.
>
> Consider a business with external and internal DNS zones. This becomes
> an issue in this case. If you have cached say "website.example.com" to
> the external IP, and that is DMZed somehow on the internal network, when
> you change to VPN, you need to use the internal view of that zone
> instead. But you can't the name is cached.

Which is why we flush the cache for the domain in question when we
detect a network change. See the above unbound commands used. This is a
solved problem. Every day, when my VPN is up I reach bugzilla.redhat.com
on its internal IP, and when my VPN is down I reach bugzilla.redhat.com
on its external IP. Without any manual intervention. It just works.

> No, cache is not a feature. It's a chronic issue.

Then please let us know what you intend to replace DNS with. The reason
DNS has worked for over 20 years is because it is a caching system.

> Look at windows
> systems that service desks around the world always advise the first step
> is reboot: Why? Flush dns caches (Or other things). When you can't get
> to a website? Restart the webbrower, to flush the cache. Intermittent
> network issues for different people on a network? The cache is allowing
> some people to work, but masking the issue to them. It's not allowing
> people to quickly and effectively isolate issues.

If DNS cache was the only cause for Windows machines to need a reboot,
I'm sure Microsoft would have fixed that by now. Let's remain honest
here and say there are a 1001 reasons why Windows users reboot their
machines. DNS might be one of them but it has no relationship to the
discussion we are having right now.

> DNSSEC is a good idea: Caches are a problem.

We disagree.

> If this really is to be used, I cannot stress enough, that a cache must
> be completely flushed every time the default route or network interface
> changes. You can't, and I can't possibly conceive every network setup in
> the world. If you make assumptions like this, systems will break and
> fedora will be blamed.

Consider some of the options I suggested for addition to NM to accomodate
your scenario, or suggest alternatives. If you are believe the only
solution is "no cache ever", than there is not much more we can talk
about. And if the majority of fedora users prefers an insecure no-cache
over a DNSSEC-cache solution, I guess I will go elsewhere and stop
running Fedora.

Paul