On Fri, 18 Feb 2005, Adam wrote:

> (I also posted this on scalug)
> A friend of mine has an interesting situation. Their DNS server gets around 2 
> to 3 million dns queries a day its running on a quad xeon system with 4 gigs 
> of ram. Its running Bind 9.2.2.
> At least twice a week if not more their server will stop looking up domains 
> that are not cached.
> They end up having to restart bind to get things working again.

Have they done any debugging to narrow it down?  Is named still getting 
CPU time after it stops?  Is named sending out any queries to the root's?

It sounds like maybe it has an older or bad root hints, and it's 
periodically picking a bad root address?

Next time it happens, run ps a couple of times and confirm whether the CPU 
time is increasing for named, just to prove named is still running.  Then 
run tcpdump or something on the box and see if named is sending queries 
out to the root's.  If it's getting millions of queries per day, you 
should see a bunch of outgoing queries to the root's.  If you see the 
outgoing queries but no responses, see if the IP it's sending to is valid 
and responding to queries.