Ascend Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

(ASCEND) Re: Monitoring Access Control



-We run Ascend's Access Control under NT.

-I use a PERL script to monitor it.  Runs on a Unix box but should be
trivial to port to PERL under NT.  Script attached.

-Notice that this script doesn't scream if the Radius box isn't answering
pings.  I have another app -- Nodewatch -- which has already screamed at
me if the box isn't answering pings -- I didn't want this script telling
me something I already knew.

-Here is the catch we've seen.  We authenticate users against the NT SAM
 ... and whatever service (daemon) which runs the NT SAM can crump without
taking down the box's IP stack or Radius (Access Control) service.  Thus,
radcheck queries the box, Access Control answers, the script thinks
everything is fine ... but if an actual authentication request hits Access
Control, Access Control turns around, asks the NT SAM ... the NT SAM has
died (the box is no longer functioning as a domain controller) ...
whatever Access Control hears, it interprets it as "no, bad
username/password", and
the user is denied access.  If I hard-code a password into the Users file,
Access Control will validate that username/password combination just fine,
even during one of these incidents.  Notice how diabolical this failure
is.  Doesn't matter how many secondary Radius servers I have -- the Maxen
won't even attempt to try them -- they think the primary Radius server is
functioning fine.  And, in fact, Radius is functioning fine ... but the
security database (SAM) sitting behind Radius has crumped.

-Sitting in front of the NT box when it is in this state ... there is
nothing you can do.  You can't log in -- the SAM is dead, no
username/password combination will work.  Have to reboot.

-The answer, to my mind, is to write a new script which builds a Radius
request and attempts to validate a test user.  In fact, we want a more
complete solution:  the script will dial a modem and attempt to create a
PPP session with the Max.  This will test the entire path:  PSTN, our
inbound PRI, our PBXes and corporate voice net, the Maxen, the data net
between the Maxen and the Access Control boxes, Access Control, and the NT
SAM.  Regrettably, our coding backlog has passed six months, which is as
far out as we predict, so I don't have anything beyond a pipe dream to
offer here.

-Now, of course, if the NT SAM behind your Access Control boxes never
crump, then you won't care about this issue.

--sk

Stuart Kendrick
FHCRC

#!/usr/local/bin/perl
# This script runs from cron.  If it notices that  
# RADIUS is not running, it logs to syslog and pages Duty.  --sk 4-13-98

use Sys::Syslog;
use POSIX;

$alldead = "Duty:  All RADIUS servers are dead.  Remote Access is shutdown.  --/home/netops/bin/monitor_radius";
$fail = "No reply";
$itlives = "Duty:  RADIUS lives.  --/home/netops/bin/monitor_radius";
$mailer = "/usr/bin/mailx -s";
$pager = "/opt/local/bin/qpage -f \"\" -p";
$pinger = "/opt/local/sbin/fping";
$succeed = "is responding";
$tool = "/home/radops/usr/sbin/radcheck -d /home/radops/etc/raddb ";
$whopage = "duty";
$whomail = "skendric\@fhcrc.org";
@radhost = qw
(
	fhcrc-nt	fhcrc-fh
);

# Ping RADIUS servers
$result = `$pinger @radhost`;
@result = split(/\n/, $result);
foreach $result (@result) {
  if ($result =~ /^(\S+) is alive/) {
    $ping_result{$1} = 1;
  } elsif ($result =~ /^(\S+) is unreachable/) {
    $ping_result{$1} = -1;
  } elsif ($result =~ /^(\S+) address not found/) {
    $ping_result{$1} = 0;
  } else {
      $ping_result{$1} = 0;
  }
}

# Test for presence of RADIUS.  Don't bother querying boxes
# which didn't respond to pings.
foreach $radhost (@radhost) {
  if ($ping_result{$radhost} == 1) {
    $lookup = `$tool $radhost`;
    @lookup = split(/\n/, $lookup);
    foreach $lookup (@lookup)  {
      if ($lookup =~ /$fail/) {
        $lookup_result{$radhost} = 0;
      } elsif ($lookup =~ /$succeed/) {
        $lookup_result{$radhost} = 1;
      }
    }
  }
}

# Analyze results
# If all RADIUS servers failed to perform the lookup ($sum == 0), then
# scream.
$msg = "/home/netops/tmp/allrad.msg";
$sum = 0;
foreach $radhost (@radhost) {
  $sum += $lookup_result{$radhost};
} 
if ($sum == 0) {
  open FILE, ">$msg";
  print FILE "$alldead";
  close FILE;
  $result = `$pager $whopage < $msg`;
  $result = `$mailer "RADIUS is broken" $whomail < $msg`;
  &openlog('monitor_radius', '', local1);
  &syslog("warning", "$alldead");
  &closelog();
}
# If RADIUS is alive but the dead message is still around, assume that RADIUS
# was only recently resusicated, tell people, and erase the message.
elsif (($sum > 0 ) && (-r $msg)) {
    open FILE, ">$msg";
    print FILE "$itlives";
    close FILE;
    $result = `$pager $whopage < $msg`;
    $result = `$mailer "RADIUS lives" $whomail < $msg`;
    &openlog('monitor_radius', '', local1);
    &syslog("warning", "$itlives");
    &closelog();
    unlink "$msg";
} 
# If at least one RADIUS server is dead, then analyze the results for each, and
# scream for each RADIUS server which failed to answer the query.  However, if
# a "dead" message is still hanging around, assume everyone already knows
# and don't scream.
# Analyze the results of nmblookup.
# Case A:  if a RADIUS server failed and the related msg file doesn't exist,
#	   then scream.
# Case B:  if a RADIUS server failed and the related msg file does exist, do
#	   nothing.
# Case C:  if a RADIUS server succeeded and the related msg file does exist,
#	   crow about it and erase the msg file.
foreach $radhost (@radhost) {
  $itbroken = "Duty:  RADIUS authentication on $radhost is broken.  --/home/netops/bin/monitor_radius";
  $itfixed = "Duty:  RADIUS authentication on $radhost is fixed.  --/home/netops/bin/monitor_radius";
  $msg = "/home/netops/tmp/$radhost.msg";
  if (($lookup_result{$radhost} == 0) && (!(-r $msg))) {
    open FILE, ">$msg";
    print FILE "$itbroken";
    close FILE;
    $result = `$pager $whopage < $msg`;
    $result = `$mailer "RADIUS on $radhost is broken" $whomail < $msg`;
    &openlog('monitor_radius', '', local1);
    &syslog("warning", "$itbroken");
    &closelog();
  }
  elsif (($lookup_result{$radhost} == 1) && (-r $msg)) {
    open FILE , ">$msg";
    print FILE "$itfixed";
    close FILE;
    $result = `$pager $whopage < $msg`;
    $result = `$mailer "RADIUS on $radhost is fixed" $whomail < $msg`;
    &openlog('monitor_radius', '', local1);
    &syslog("warning", "$itfixed");
    &closelog();
    unlink $msg;
  }
}


++ Ascend Users Mailing List ++
To unsubscribe:	send unsubscribe to ascend-users-request@bungi.com
To get FAQ'd:	<http://www.nealis.net/ascend/faq>