Koozali.org: home of the SME Server

DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?

Offline FreakWent

  • ***
  • 85
  • +0/-0
I'm on 7.2 and I'm having a recurring DNS problem.  If the link changes for some reason, mostly due to the ISP shuffling addresses around, DNS stops working.  I can add opendns's servers to resolv.conf, and this fixes local resolution on the server but doesn't help the LAN clients.

Basically it's the 192.168.0.1:53 listener that gets stuffed up. Random twiddling used to fix it, but by the time I got it working I could never figure out what I did that worked.  I've tried listing or not listing my ISP's servers in the /sbin/e-smith/console setup and neither config works.

Now I'm at a point where a reboot works for the first 5 to 15 minutes, then it dies again.

The think is, I'm actually an experienced admin but I'm a lot more comfortable with the traditional /etc/init.d model, all this runsv business freaks me out.  I want "/etc/init.d/dnscache stop" to kill all the dnscache processes, but it doesn't.  Even "/etc/init.d/squid restart" doesn't work properly.

What I'd like from the other forum people then is either
1) Some kind of "repair" system that will revert all the necessary DNS pieces to a known good config (Get 7.3 maybe?), or
2) A basic breakdown of the architecture and what all the different bits are supposed to be doing

Can anyone help me with this? Has anyone else encountered similar problems?


Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #1 on: November 07, 2008, 04:50:21 PM »
What is the basic reason that your DNS is failing?  You mention that your ISP is shuffling addresses around - do you mean by this that your WAN IP is changing?

On my SME servers (6.01, 7.0rc3, 7.1, 7.2, 7.3) I have never had any trouble with:
- Use 'Dyndns.org' to manage the public internet address for my SME server
- Set the SME server to 'resolve locally'

LAN users get the LAN address for my SME; WAN users get the correct-but-ever-changing WAN address for my SME.

I've never had any trouble accessing other sites on the Internet (my clients, for example) that are also using Dyndns - the SME dns server correctly times out and gets the new address whenever the address changes...

So, my feeling is that you should be able to get your DNS working using the default SME DNS settings, but I suspect there's something I'm not understanding about your configuration...

Offline FreakWent

  • ***
  • 85
  • +0/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #2 on: November 07, 2008, 11:02:03 PM »
thanks for replying!

I'm trying to determine the basic reason.

Yes, I'm referring to WAN addresses.

All external DynDNS stuff works fine.  What doesn't work is the feature of the server whereby LAN clients send it DNS queries, and it looks them up externally, then responds to the client.

This is not about the LAN clients finding the SME server, but about the server resolving google.com or jabber.org on their behalf.

From the client to the server:
08:35:10.326753 IP 192.168.0.69.49451 > 192.168.0.1.domain:  60095+ A? google.com. (28)

And a reply:
08:35:12.968212 IP 192.168.0.1.domain > 192.168.0.69.49444:  26359 ServFail 0/0/0 (28)

But if I run a capture on the server, I see no DNS packets trying to exit the server to resolve this host!

The symptoms persist even with the firewall down.  If I add an external resolver to resolv.conf then various processes on the server use that and work ok (anti-spam blacklist lookups, etc) but this does nothing for the clients.

I expect to see packets flying off to the root servers asking for the name servers for .com and so on, or at least something heading to the ISP's name servers, which I haven't configured but which the server is told of during the pppoe dhcp assignment, but the system seems to think that it knows that all domains don't exist and won't try to resolve them.

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #3 on: November 08, 2008, 01:15:30 AM »
I'm on 7.2 and I'm having a recurring DNS problem.  If the link changes for some reason, mostly due to the ISP shuffling addresses around, DNS stops working.  I can add opendns's servers to resolv.conf, and this fixes local resolution on the server but doesn't help the LAN clients.

Basically it's the 192.168.0.1:53 listener that gets stuffed up. Random twiddling used to fix it, but by the time I got it working I could never figure out what I did that worked.

Random twiddling that you can't accurately specify makes it more difficult for us to help you for sure.

Quote
Now I'm at a point where a reboot works for the first 5 to 15 minutes, then it dies again.

You say above that the problem co-incides with changes in the ISP link. Here you suggest that it dies 5 to 15 minutes after reboot without any ISP link changes. Only one of those must be true.

You ask "where to begin". The place to begin is to understand what is not working before trying to fix it. Random fiddling just confuses you, and possibly makes things worse. If anything doesn't "just work", report via the bug tracker.

Quote
The think is, I'm actually an experienced admin but I'm a lot more comfortable with the traditional /etc/init.d model, all this runsv business freaks me out.

Don't let it freak you out. Stay calm, and do a little reading on supervise and runit (very similar programs). It'll be worth it in the long run.

Here's a good backgrounder:

http://thedjbway.org/daemontools.html

Quote
What I'd like from the other forum people then is either
1) Some kind of "repair" system that will revert all the necessary DNS pieces to a known good config (Get 7.3 maybe?), or

config delete dnscache
config delete dnscache.forwarder
signal-event post-upgrade ; signal-event reboot

Quote
2) A basic breakdown of the architecture and what all the different bits are supposed to be doing

/etc/resolv.conf should direct all DNS queries to dnscache running on $LocalIP. That dnscache will forward all queries to another dnscache instance running on 127.0.0.2. That dnscache will forward queries for local names to tinydns on 127.0.0.1, or will resolve by asking the root name servers, and following delegation responses.


Quote
Can anyone help me with this? Has anyone else encountered similar problems?

The limited information you've provided suggests that the dnscache instance running on 127.0.0.2 isn't sending replies, except for 5 to 15 minutes after a reboot. That might happen if it was unable to log.

Reset the configuration as instructed, and open a bug report if you still experience problems. It would be a good idea for you to upgrade to 7.3 while you are at it.



Offline FreakWent

  • ***
  • 85
  • +0/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #4 on: November 09, 2008, 03:20:22 AM »
This is great, thanks heaps for the pointers.

The config-delete configuration did not fix the problem.

Upgrading to 7.3 also did not help.

So now that I know what it's supposed to be doing, I can investigate properly:


bash-3.00# /sbin/iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Firewall is off to rule that out.



bash-3.00# nslookup
> server
Default server: 192.168.0.1
Address: 192.168.0.1#53
> amazon.com
;; connection timed out; no servers could be reached


Hrm. Capture on loopback shows:
13:05:51.343805 IP 127.0.0.2.19188 > 127.0.0.2.domain:  29669+ A? amazon.com. (28)
13:05:54.363004 IP 127.0.0.2.54391 > 127.0.0.2.domain:  58690+ A? amazon.com. (28)
13:05:56.344770 IP 192.168.0.1.32864 > 192.168.0.1.domain:  22145+ A? amazon.com. (28)
13:05:56.345297 IP 127.0.0.2.28339 > 127.0.0.2.domain:  64205+ A? amazon.com. (28)
13:05:59.364383 IP 127.0.0.2.63895 > 127.0.0.2.domain:  6075+ A? amazon.com. (28)
13:06:05.382827 IP 127.0.0.2.46545 > 127.0.0.2.domain:  33379+ A? amazon.com. (28)
13:06:10.384327 IP 127.0.0.2.57063 > 127.0.0.2.domain:  40789+ A? amazon.com. (28)

So I'm seeing repeated requests to the 127.0.0.2 forwarder, which is, it seems, sending them back to 192.168.0.1 maybe?  Either way, eventually we get

13:06:50.398912 IP 192.168.0.1.domain > 192.168.0.1.32864:  22145 ServFail 0/0/0 (28)
13:06:50.398969 IP 192.168.0.1 > 192.168.0.1: icmp 64: 192.168.0.1 udp port 32864 unreachable

and of course no DNS traffic is seen over ppp0.

Poking further I find what I should have found in the first place.  dnscache's log reports the cause of the servfail:

servfail mirrorlist.centos.org. input/output error

And dnscache.forwarder shows "unable to bind TCP socket: address already used"

So... something else must have bound the socket first.  Look in /services, dnscache.dead2 is there from earlier umm.... twiddling. 


It's true then that "Random fiddling just confuses you, and possibly makes things worse", so I'll reboot and post back here if it's still broken.

Offline FreakWent

  • ***
  • 85
  • +0/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #5 on: November 09, 2008, 03:24:41 AM »
Lovely.  Sorry for wasting your time, and thanks for giving it!

Offline CharlieBrady

  • *
  • 6,918
  • +3/-0
Re: DNS problems... dnscache, dnscache.forwarder, tinydns, where to begin?
« Reply #6 on: November 10, 2008, 02:08:34 AM »
Lovely.

Does that mean that after you removed the dnscache.dead2 directory (or symlink) and rebooted, the problem went away?

If so, good news.

Thanks for the apology and the thanks.

Offline feby

  • 10
  • +0/-0
i'm having the same problem, how did you managed to fix it ???

Offline mmccarn

  • *
  • 2,626
  • +10/-0
I'd recommend opening a new thread rather than tacking on to one that's over 3 yrs old.

Having said that, the only causes I've seen for DNS failure on a SME server is a full disk (When my SME's root partition fills up, DNS is the first service to fail) or damaged configuration. 

Check to make sure you have free space on your root partition.

Charlie's post from 11/7/2008 describes how to reset the dnscache configuration.