Ascend Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: (ASCEND) Ascend going mad ? (was: Stable code for Max 1800)



On Tue, Apr 21, 1998 at 05:43:10PM -0400, Tim Basher wrote:
> > It appears
> > that the 1800 uses _one_ BRI as clock source and gets out of sync if other
> > BRIs have remarkable swimming away clocks on other BRIs. This causes massive
> > packet loss with all upper layer protocols (ping measures show up to 50%
> > lost packets). The problem disappears (on upper layers) when PPP link
> > compression is disabled, but this is just an observation, no solution.
> 
> I am sorry, it is not that I don't believe that you are seeing a problem...
> but your explanation doesn't make sense to me.

It does not make much sense, Ok. Our observations were the packet loss
and the fact that it disappears when Stac is switched off.

> How can the use of STAC at the network layer have anything to do with 
> the physical layer's ability to establish sync? 

In an orthogonal world, it should not.

> If you really have a clock sync problem then you should be seeing massive
> CRC errors and packets (compressed or not) should be having problems.  
>  
> The fact that you say the "problem disappears" would point away from the
> issue being clock sync and more to a bug in the STAC implementation.

May well be. It's not my job to find it out ;-)

> In fact, I would expect that using compression should help to reduce the 
> impact of a clock sync problem since STAC will reduce the size of the packets   
> and thus should reduce the chance of an error occuring during the transmission  
> of a single packet.
> 
>   example: if you have 1 bit error out of every 1000 bits then if your
>            packets are only 800 bits long, you have a better chance of
>            sending a packet without an error than if your packets are
>            1600 bits long.
> 
> I can believe there are clock sync problems and I can believe that there
> are STAC problems, I just cannot see how the two are related as you report,
> such that disabling compression causes the problem to "disappear".

I can imagine Stac to worsen the packet loss on a slightly disturbed
line. Any hosed frame will cause the sync between the compressor and
the decompressor to be lost. If a frame is dropped because it came in
with bad FCS the _next_ received frame will be dropped, too - the
decompressor will send a Reset-Request instead. The compressor
will start over with his state and send a Reset-Ack and then continue
sending the next packets he gets from upper layers (this is actually
true for Stac-Draft-9, MS-Stac behaves better and loses less frames,
Stac - hmm, cannot say that, no docs available). Now if it is again
an orthogonal world, I would expect the following behavior on such
line when tested with one ICMP echo request per second:

1) No Stac: Packet loss of X due to FCS errors.
2) Stac: Packet loss of roughly 2*X due to history resets. Likely a bit
   more, because Reset-Requests and -Acks are candidates for loss, too.
   Not to forget history resets cause packets to grow. So maybe 2.2*X.

I don't actually see this behavior, but I cant debug the code in the
Ascends to find out whats happening ;-)

So how did I came to this "sync loss guess" at all ? When I first des-
cribed the problem here on the list (Aug 97) we only knew that there is
packet loss with Stac and no loss without. I got this reply from Jim
Howard:

-----------------------------------------------------------------------
Date: Mon, 11 Aug 1997 15:01:18 -0400
To: Andre Beck <beck@ibh-dd.de>
From: Jim Howard <jhoward@lyceum.com>
Subject: Re: (ASCEND) Weird Max1800/Stac phenomenon

I have two Max 1800's, one of which has 8-BRI lines
all loaded with permanently dialed connections.

I had seen this before, and still in Rel 5.0Ai13, 
and thought it was a clock problem stemming from 
the 1800 using a single incomming line's clock
as the transmit clock on all lines, and could usually 
correct the packet loss by disabling the first two lines,
to cause the clock source to move to line three, the reenabling 1 and 2.

To test your solution, I rebooted my 1800 to cause clock to go back to line 1,
saw that there was packet loss, and then turned off STAC on all
connection profiles and bumped them offline. 
I am still seeing data lost at the telco level, (suspected clocking sync)
but with no compression the IP packets seem to get retransmitted,
and at the IP network level my applications never see the loss.

If you can get ascend support on the case in the next two weeks,
they can take a look at my system as well, after that 
my PRIs should be installed and the 1800 will be taken offline.

-Jim H
At 17:05 08/11/1997 +0200, Andre Beck wrote:
>before eventually opening a TR for this now that we can reproduce it
>reliably. We drive several S0 leased lines from a central Max1800.
>There are P50s at the other ends, with Leased/Unused configurations.
>The lines use MPP, LQM and usually worked great. However, we recog-
>nized some small to medium packet loss on the lines when doing ping
>probes and an occasional sluggishness with TCP traffic. All these
>symptoms go instantly away as soon as we _disable_ Stac compression
>on these lines. We first had the impression that the impact of the

>Has anyone seen such phenomenon yet ? We were repeatedly told that the
>1800 has no known problems of this kind, but we have observed them in
>several cases with different units but couldn't really reproduce it when
>needed. 

-------------------------------------------------------------------------

This sounds like a very good guess on where the problem is coming from.
I dont _know_ whether it is true, but it is the only guess that goes
deeper than my own ones, so I have to believe it - until I either go
debug this with BRI sniffers and whatever myself or Ascend comes up
with a word about it.

BTW, my mail was not meant personal. I had to let a bit of steam off, and
it was IMHO time for it. I'm back talking with Ascend support and we
are searching for this again. If we come up with a solution or just
finally know whats going on there, I'll spread the word here.

Andre.
-- 

Kanther-Line: PGP SSH IDEA MD5 GOST RIPE-MD160 3DES RSA FEAL32 RC4

+-o-+--------------------------------------------------------+-o-+
| o |               \\\- Brain Inside -///                   | o |
| o |                   ^^^^^^^^^^^^^^                       | o |
| o | Andre' Beck  (ABPSoft)   AB10-RIPE   XLink PoP Dresden | o |
+-o-+--------------------------------------------------------+-o-+
++ Ascend Users Mailing List ++
To unsubscribe:	send unsubscribe to ascend-users-request@bungi.com
To get FAQ'd:	<http://www.nealis.net/ascend/faq>


References: