[ale] [Fwd: Re: NCR rel 17 panic]

Sun Apr 28 16:22:54 EDT 1996

This is a multi-part message in MIME format.

--------------45CD23EE2029
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

this too

--------------45CD23EE2029
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Received: from mroe.cs.Colorado.EDU by future.atlcom.net with SMTP (5.65/1.2-eef)
	id AA04819; Sun, 28 Apr 96 04:36:31 -0400
X-UIDL: 830709660.003
Return-Path: <owner-ncr53c810 at mroe.cs.colorado.edu>
Received: (from daemon at localhost) by mroe.cs.colorado.edu (8.7.5/8.7.3) id CAA10896 for ncr53c810-list; Sun, 28 Apr 1996 02:37:29 -0600 (MDT)
Received: from boulder.Colorado.EDU (root at boulder.Colorado.EDU [128.138.238.18]) by mroe.cs.colorado.edu (8.7.5/8.7.3) with SMTP id CAA10893 for <ncr53c810-alias at cs.Colorado.EDU>; Sun, 28 Apr 1996 02:37:28 -0600 (MDT)
Received: from porsta.cs.Helsinki.FI (porsta.cs.Helsinki.FI [128.214.48.124]) by boulder.Colorado.EDU (8.6.13/8.6.12/UnixOps) with ESMTP id CAA15165 for <ncr53c810 at colorado.edu>; Sun, 28 Apr 1996 02:37:25 -0600
Received: from linux.cs.Helsinki.FI (linux.cs.Helsinki.FI [128.214.48.39]) by porsta.cs.Helsinki.FI (8.6.10/8.6.9) with SMTP id LAA05783; Sun, 28 Apr 1996 11:37:19 +0300
Date: Sun, 28 Apr 1996 11:37:06 +0300 (EET DST)
 From: Linus Torvalds <torvalds at cs.Helsinki.FI>
To: Karsten Weiss <karsten at addx.tynet.sub.org>
Cc: ncr53c810 at Colorado.EDU
Subject: Re: NCR rel 17 panic
In-Reply-To: <199604260827.KAA02030 at boole.suse.de>
Message-Id: <Pine.LNX.3.91.960428113033.19350C-100000 at linux.cs.Helsinki.FI>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Mozilla-Status: 0011

On Thu, 25 Apr 1996, Karsten Weiss wrote:
> 
> Today I got the following kernel messages (running 1.3.95) while
> extracting an tar.gz file on one of my harddisks:

This looks like the errors that the Linux/alpha world has been plagued 
with. There is something wrong with the driver, probably related to 
disconnects (that's why it works for you in 1.2.13 - it didn't disconnect 
back then).

The simple fix is probably to disable disconnects (which is fine if you 
don't use SCSI tapes, but horrible if you do). Drew made some noises 
about having a handle on the hardware not doing what the documentation 
said it should do, but that was a long time ago and I haven't heard any 
more on it since.

I _suspect_ the problem is made worse by long SCSI commands: the new 
read-ahead code can result in pretty large reads, and maybe the 
disconnect problem gets worse then (disconnecting in the middle of a 
read? Can that even happen?). This is just idle speculation, I have no 
idea about the driver (but that might explain why we saw it on alpha's 
first: they have a larger page-size and are usually faster anyway, so 
they might have been more likely to request large SCSI reads even with 
the older code).

> Does anyone have an explantion? I've been running my system for a very
> long time now with kernel 1.2.13 and had no problems at all. After
> upgrading to 1.3.9x from some reasons I already got this error
> twice (the first time I was making a system backup to my DAT drive
> when it happened. This was on kernel 1.3.94 I believe).

That does sound like a disconnect thing (the tape access..). Maybe a race 
condition in the driver when the disconnect happens at an inopportune time. 

Drew, any comments?

		Linus

--------------45CD23EE2029--