From: Jesse Pollard <pollard@cats-chateau.net>
Subject: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)
Date: Mon, 28 Feb 2000 19:19:05 -0600
Next Article (by Author): Re: 1.0.9b-pre2 uploaded Jesse Pollard
Previous Article (by Author): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Top of Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) ao@morpork.shnet.org (A. Ott)
Articles sorted by: [Date]
[Author]
[Subject]
Hi again - a quick followup. It seems to have nothing to do with the timer. It occurs on the first attempt to write to the root file system. This is done to verify that the system was booted with root read-only. The test is done by attempting to write a file, and receiving an error value back. Then the boot script (/etc/rc.S) performs an FSCK of all disks. After a successfull fsck the root would be remounted read/write. The error occurs on the test. This is back to looking like a logical error rather than a lost interrupt. The following message from me was unintentionally omitted from the rsbac mailing list, for completness here it is: On Mon, 28 Feb 2000, you wrote: >Hi! > >I found this posting in linux-kernel, with similar wait_on_bh in 2.2.14. >It might be something outside RSBAC. > >Maybe we should use one of the kernel debugging patches, e.g. IKB, with >lockup detection. > >Amon. [.... snip ...] That is the type of error all right. It's possible that RSBAC added just the right amount of delay somewhere that allowed the bug to show up. I do have similar hardware: 2 PPro 2 SCSI controllers - BT958: scsi0, BT948 scsi1 with: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: Quantum Model: XP34300W Rev: L912 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST43400N Rev: 1028 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST32151N Rev: 0284 Type: Direct-Access ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 03 Lun: 00 Vendor: TEAC Model: CD-R55S Rev: 1.0J Type: CD-ROM ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 05 Lun: 00 Vendor: ARCHIVE Model: Python 28388-XXX Rev: 5.45 Type: Sequential-Access ANSI SCSI revision: 02 AND two IDE devices: hda: WDC AC31200F, ATA DISK drive hdc: CD-ROM CDU77E, ATAPI CDROM drive All disks end up being used very close to where it hung, I just didn't get it in the syslog (wish I did, it would make tracing easier). I was beginning to think about that rsbac timer I saw being activated. I think I'll see what happens if I enable that printk in rsbacd - I'm curious mostly about that spin_lock_irq/flush_signals/spin_unlock_irq section. I want to see if this gets called during the hang. -- ------------------------------------------------------------------------- Jesse I Pollard, II Email: pollard@cats-chateau.net Any opinions expressed are solely my own. - To unsubscribe from the rsbac list, send a mail to majordomo@morpork.shnet.org with unsubscribe rsbac as single line in the body.
Next Article (by Author): Re: 1.0.9b-pre2 uploaded Jesse Pollard
Previous Article (by Author): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Top of Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) ao@morpork.shnet.org (A. Ott)
Articles sorted by: [Date]
[Author]
[Subject]