while i won't try to pass my advice off here as sage, i have a  
considerable amount of experience working in very large networks w/ 
carriers (disclaimer: i work for a very large network equipment  
vendor, likely the one you're discussing based on the model  
numbers.).  i would agree that your assessment here is dead on, it's  
a process thing.

bear in mind, vendors come up with default configurations that  
attempt to address the widest swath of deployment scenarios in their  
default configuration.  while i wouldn't advocate some of the less  
reasoned responses for dealing w/people who attempt to be
"helpful" putting some process in place and configuring the upstream  
equipment in such a manner that protects from this situation taking  
place will make your life a lot easier going forward.

On Jan 27, 2006, at 11:26 PM, Torleiv Ringer wrote:
> Hello,
>
> Here is something that I must ask for guidance from the more  
> experienced
> administrators. I must say up front that I am not a networking
> specialist. My forte is in our "custom" system.
>
> We have a situation at work where someone misconfigured a switch  
> and it
> caused some major failures that were hard to trace. We have a unique
> environment where we route custom packets through a brand-new Cisco  
> 4500. We
> have new Cisco 3560 switches that distribute links to each of two  
> rooms that
> have "custom" digital equipment. This is a new setup for us, and is
> mission-critical. We have moved from an analog network to this new
> digital UDP based system. In total we have about 14 3560s, and  
> plugged into
> these are about 200 other "custom" switches that are vendor-specific.
>
> The point of contention is that someone thought they were doing the
> right thing and jumped in where they were not asked to by installing a
> new 3560.
>
>   *) This person set the switch up "hot" (on the network)
>   *) They used two uplink ports, intending on ganging them togther
>   *) They did not properly set the ports into a channel-group
>
> This made the 3560 seem like a router and flooded all of our custom
> switches with so much traffic that the devices could not effectively
> talk to each other. This would be sorta OK in a TCP environment,  
> but we
> have a UDP based system that relies heavily on very low latency.
>
> So here is my question:
>
> I am being pushed by the higher-ups to come up with a software  
> solution
> for this problem, which I feel is a process problem. The process  
> should
> be to NOT SET THE SWITCH UP ON THE F**KING NETWORK! And to have  
> another
> person verify the setup prior to bringing up a new piece of  
> equipment on
> the network that is mission-critical. Beyond that the person just went
> and did it without coordinating with anyone.
>

i would go a step further an put any existing infrastructure gear  
into a port shut mode.  i know you can lock these network elements  
down to prevent someone from merely attaching gear to the network and  
interoperating with it.  further, you can turn TACACS on and use  
privilege levels to allow folks to view things w/o necessarily giving  
them the keys to the kingdom on the network.  additionally, when  
people know that command logging and accounting is turned on, they're  
much less likely to do, well, dumb stuff.

depending on your configuration and situation, i would recommend  
going through your configurations and using the 'shutdown' command on  
unused ports.  requiring coordinated action in order to make things  
come online in the network.  a little more work, but it usually  
results in very stable configurations and people not glibly plugging  
things into the network. in short, there are some very reasonable  
hardening guidelines that you can follow to put a little structure  
around things and prevent meltdowns caused by cockpit error.

> Should I bow to the pressure and force our vendor to "fix" their
> software to be able to function in an abnormal network setup? This  
> would
> allow certain folks to save face while straining our relationship with
> out vendor.

without more details on what your operating environment is here and  
what the objectives are, i would say that this is difficult and not  
likely to be a successful undertaking.  vendors don't tritely change  
default configurations and in the scenario you've outlined, it  
doesn't sound like you're doing anything that's

> Or
>
> Should I instill a process such that this would never happen again and
> put the lock-down on people who configure devices in/on this network?
> This involves disallowing the people who are supposed to be the
> networking specialists from configuring the "custom" network.
>
> Or
>
> Is there a Cisco configuration that can be used to disallow "unknown"
> routers on the VLAN? This seems unlikely to me.

there are some features that you can use to require that network  
elements authenticate themselves onto the network and some common  
best practices which accomplish something very similar.  these don't  
usually work at the VLAN level, .1x works at the port level, but you  
can do some simple things to prevent total topology meltdown in a  
purely switched network.

> It's one or the other at this point, as we have lost a lot of
> credibility in this situation, and we must move forward with
> implementation. This is the second time now that a misconfigured  
> switch
> has been setup hot on the "custom" network.
>
> Has anyone had a similar situation?
>
> Thanks in advance for your sage advice.
>
>
> p.s. No, at this point I cannot divulge what the "custom" is.

{ snipped - misc. signatures }

-- 
steve ulrich                       sulrich at botwerks.org
PGP: 8D0B 0EE9 E700 A6CF ABA7  AE5F 4FD4 07C9 133B FAFC