default local DNS caching name server

Sat Apr 12 17:04:04 UTC 2014

On Sat, Apr 12, 2014 at 12:06:23PM -0400, Paul Wouters wrote:
> On Sat, 12 Apr 2014, Chuck Anderson wrote:
> 
> >Okay, so here is where you and I differ then.  We need a solution to
> >run everywhere, on every system, in every use case.
> 
> Sounds like wanting ponies? Obviously I fully agree with a solution that
> works everywhere, all the time, for everyone, however the want it :)
> 
> > The local DNS
> >daemon (note that I didn't say "cache" this time) should be a part of
> >the Base OS like init/systemd is.  It should be small, unobtrusive,
> >and do very little, namely the one thing we need: handle failover
> >between multiple DNS servers.  I would use the term "DNS proxy" but
> >that term is too overloaded with other connotations and preconceived
> >ideas.
> 
> Handling failover requires keeping state of previous queries and
> outstanding requests to determine which servers are bad or not. Mind
> you, unbound allows you to set a max TTL on any record received using
> cache-max-ttl=0, so you can very easilly implement this idea. I think
> it is a bad idea, because your solution violates your own principle
> above: it interferes with my use case of optimising DNS caches, reducing
> unneccessary latency, and doing things like pre-fetching of low TTL
> records.

Of course there would be /some/ state kept.  It just wouldn't cache
the data, it would only use the state of recent queries and response
times to determine if that resolver was dead and start sending those
queries to another resolver.  It would basically do exactly what
glibc's stub resolver does now, but ONCE for the entire system rather
than having each process do that independently.  I would want this
daemon to be as lightweight as possible to minimize any interference
with optimising DNS caches, latency, etc. and so that it could be used
everywhere, just like systemd is used on all Fedora systems and some
form of "init" is used on all Linux systems.

Another way to think of this is to separate out the built-in logic in
unbound/BIND/dnsmasq/etc. that determines when an authoritative server
is dead and apply it to all queries that are made by glibc's stub
resolver.  Or separate out the logic that glibc uses to determine when
a nameserver in /etc/resolv.conf is dead and make that a system-wide
daemon.

> In DNS, the publisher of data tells you how long the data should be valid
> for. If they want the record not to be cached at all, they can set the TTL
> to 0. Why should we deploy a daemon that does not provide the very useful
> feature of caching in general (especially when doing DNSSEC validation)
> when people who wish to not get cached already have a means out, publish
> records with TTL=0? If you want to be Akamai, you can!

Because things get messy once you start caching on the end-user
system.  Sure, you can optionally have that messiness (and I'd argue
that for Fedora Workstation that would be a sane default) but for
Fedora Server I think it is too heavyweight of a solution to run
everywhere, and you agreed that running this in VMs is probably not
desired.

If the lightweight dnslookupd process is configured to forward the
request to a local unbound+dnssec-triggerd, then everything from that
point will work in the same way it does today with local caching, TTL
handling, DNSSEC, etc.  But that should be /optional/.  I'm arguing
that dnslookupd should be on by default everywhere.

> >dnslookupd keeps track of up/down DNS servers via some health check
> >mechanism, and switches between them appropriately.
> 
> I tend to call heartbeats/keepalives "make deads". They often do the
> opposite. Why invent a whole new health check protocol when you can
> simple send DNS queries and use strategies to prefer the nearest/fastest
> servers already. These kind of selection/preference protocols are part
> of any decent DNS implementation. There is no need to re-invent the
> wheel.

It doesn't need to do active heartbeats--it could passively watch
queries/responses that it is forwarding to the resolver and decide
based on that if a server is dead and stop querying it until the next
one fails, etc. just like glibc does today.

For the use cases you desire with full caching and DNSSEC, dnslookupd
shouldn't get in the way.  All applications/glibc would query
127.0.0.1, which would immediately forward all those requests to the
local unbound+dnssec-triggerd setup.  Dnslookupd would only take
action if unbound died for some reason (and if there was an alternate
DNS resolver to switch to).