Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)

From: Jesse Pollard <pollard@cats-chateau.net>
Subject: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)
Date: Mon, 28 Feb 2000 19:19:05 -0600

Next Article (by Author): Re: 1.0.9b-pre2 uploaded Jesse Pollard
Previous Article (by Author): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Top of Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) ao@morpork.shnet.org (A. Ott)
Articles sorted by: [Date] [Author] [Subject]

Hi again - a quick followup. It seems to have nothing to do with the
timer.

It occurs on the first attempt to write to the root file system. This is
done to verify that the system was booted with root read-only. The test
is done by attempting to write a file, and receiving an error value back.
Then the boot script (/etc/rc.S) performs an FSCK of all disks.

After a successfull fsck the root would be remounted read/write. The error
occurs on the test. This is back to looking like a logical error rather than a
lost interrupt.

The following message from me was unintentionally omitted from the rsbac mailing
list, for completness here it is:

On Mon, 28 Feb 2000, you wrote:
>Hi!
>
>I found this posting in linux-kernel, with similar wait_on_bh in 2.2.14.  
>It might be something outside RSBAC.
>
>Maybe we should use one of the kernel debugging patches, e.g. IKB, with  
>lockup detection.
>
>Amon.
[.... snip ...]

That is the type of error all right. It's possible that RSBAC added just the
right amount of delay somewhere that allowed the bug to show up. I do have
similar hardware:

2 PPro
2 SCSI controllers - BT958: scsi0, BT948 scsi1 with:

Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Quantum  Model: XP34300W         Rev: L912
Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE  Model: ST43400N         Rev: 1028
Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 01 Lun: 00
Vendor: SEAGATE  Model: ST32151N         Rev: 0284
Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 03 Lun: 00
Vendor: TEAC     Model: CD-R55S          Rev: 1.0J
Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 05 Lun: 00
Vendor: ARCHIVE  Model: Python 28388-XXX Rev: 5.45
Type:   Sequential-Access                ANSI SCSI revision: 02

AND two IDE devices:

hda: WDC AC31200F, ATA DISK drive
hdc: CD-ROM CDU77E, ATAPI CDROM drive

All disks end up being used very close to where it hung, I just didn't get it
in the syslog (wish I did, it would make tracing easier). I was beginning to
think about that rsbac timer I saw being activated.  I think I'll see
what happens if I enable that printk in rsbacd - I'm curious mostly about
that spin_lock_irq/flush_signals/spin_unlock_irq section. I want to see
if this gets called during the hang.

-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@cats-chateau.net

Any opinions expressed are solely my own.
-
To unsubscribe from the rsbac list, send a mail to
majordomo@morpork.shnet.org with
unsubscribe rsbac
as single line in the body.

Go to Compuniverse LWGate Home Page.