Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)


From: ao@morpork.shnet.org (A. Ott)
Subject: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)
Date: 28 Feb 2000 10:56:00 +0100

Next Article (by Date): Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Previous Article (by Date): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Next in Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Articles sorted by: [Date] [Author] [Subject]


Hi!

I found this posting in linux-kernel, with similar wait_on_bh in 2.2.14.  
It might be something outside RSBAC.

Maybe we should use one of the kernel debugging patches, e.g. IKB, with  
lockup detection.

Amon.

> David Ford wrote:
> > As the subject of lost interrupts is brought up again, I feel like
> > pitching in again :)
> > If DMA is enabled in the kernel compile, the last message will instead be:
> >
> >  hda: lost interrupt
> >
> > and the laptop will lock up.
> ---
>
> 	I don't know if this is related or not...but thought I would report
> it.  System 2xPIII, 1, then 2 SCSI HD's.
>
> 	I've been running off of one SCSI disk on the 2.2.14 kernel for
> a while now -- no problem.  Yesturday I mounted a 2nd SCSI hard to do
> builds on.  I was remotely logged in.  I loaded it up with a large project
> and did a make. Then I remotely logged in on another window.  It just
> hung there.  Tried a third window.  It hung too.  So of course both
> of the login attempts would be accessing sda -- the make was accessing
> sdb.  Finally logged in after long wait -- tried a 'vi' on a file on
> sdb in one window and a 'top' in the other -- both hung.  Tried to kill
> the make -- that wouldn't work -- just ignored me -- finally suspened and
> issued kill -- that took forever.  Went to console and screen was in
> power save off -- at first unresponsive.  Then it turned on the screen
> to a 'frozen' screen saver (Matrix).  Machine was still pingable, but
> login attempts would time out.  Ping times were down in the .2 range, but
> occasionally there would be a ping time of over 1400ms, followed by a
> 2nd at 200+ms, then back to .2.  Had to power-cycle machine to reboot.
> Lots of messed up files in /var + /tmp (sda) and on my build disk (sdb).
> So there were lots of outstanding writes that were needing completion.
>
> Had to manually fsck 2 of the partitions (/var + /tmp) -- when it came back
> up I looked at the log -- the last thing before the reboot was:
> --------
> Feb  1 08:22:32 ishtar -- MARK --
> Feb  1 08:39:51 ishtar in.rlogind[8354]: connect from 192.111.23.234
> Feb  1 08:39:52 ishtar pam_rhosts_auth[8354]: allowed to
> law@nimue.corp.sgi.com as law
> Feb  1 08:40:09 ishtar su: (to root) law on /dev/pts/3
> Feb  1 08:43:17 ishtar su: (to root) law on /dev/pts/3
> Feb  1 09:02:32 ishtar -- MARK --
> Feb  1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid
> 98419, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1a ff 00 00 80 00
> Feb  1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid
> 98420, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1b 7f 00 00 50 00
> Feb  1 09:04:44 ishtar kernel: SCSI host 0 abort (pid 98420) timed out -
> resetting Feb  1 09:04:44 ishtar kernel: SCSI bus is being reset for host 0
> channel 0. Feb  1 09:04:44 ishtar kernel:
> Feb  1 09:04:44 ishtar kernel: wait_on_bh, CPU 0:
> Feb  1 09:04:44 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:04:44 ishtar kernel: bh:   1 [0 1]
> Feb  1 09:04:47 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]>
> <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]>
> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb  1 09:04:47
> ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb
> 1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid
> 98431, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 14 87 00 00 18 00
> Feb  1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid
> 98435, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 00 57 00 00 28 00
> Feb  1 09:05:18 ishtar kernel: SCSI host 0 abort (pid 98431) timed out -
> resetting Feb  1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0
> channel 0. Feb  1 09:05:19 ishtar kernel:
> Feb  1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0 channel 0.
> Feb  1 09:05:19 ishtar kernel:
> Feb  1 09:05:19 ishtar kernel: wait_on_bh, CPU 0:
> Feb  1 09:05:19 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:05:19 ishtar kernel: bh:   1 [0 1]
> Feb  1 09:05:22 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]>
> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb
> 1 09:05:22 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec,
> offset 15. Feb  1 09:05:52 ishtar kernel: scsi : aborting command due to
> timeout : pid 98449, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 07
> 00 00 80 00 Feb  1 09:05:52 ishtar kernel: scsi : aborting command due to
> timeout : pid 98450, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 87
> 00 00 80 00 Feb  1 09:05:52 ishtar kernel: SCSI host 0 abort (pid 98449)
> timed out - resetting Feb  1 09:05:52 ishtar kernel: SCSI bus is being reset
> for host 0 channel 0. Feb  1 09:05:53 ishtar kernel:
> Feb  1 09:05:53 ishtar kernel: wait_on_bh, CPU 1:
> Feb  1 09:05:53 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:05:53 ishtar kernel: bh:   1 [1 0]
> Feb  1 09:05:56 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]>
> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb
> 1 09:05:57 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec,
> offset 15. Feb  1 09:07:02 ishtar kernel: scsi : aborting command due to
> timeout : pid 98706, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 29 17
> 00 00 50 00 Feb  1 09:07:02 ishtar kernel: scsi : aborting command due to
> timeout : pid 98707, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 11 3f
> 00 00 08 00 Feb  1 09:07:02 ishtar kernel: SCSI host 0 abort (pid 98707)
> timed out - resetting Feb  1 09:07:02 Feb  1 09:07:02 ishtar kernel: SCSI
> bus is being reset for host 0 channel 0. Feb  1 09:07:03 ishtar kernel:
> Feb  1 09:07:03 ishtar kernel: wait_on_bh, CPU 0:
> Feb  1 09:07:03 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:07:03 ishtar kernel: bh:   1 [0 1]
> Feb  1 09:07:06 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]>
> <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb
> 1 09:07:06 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec,
> offset 15. Feb  1 09:07:36 ishtar kernel: scsi : aborting command due to
> timeout : pid 98721, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d 3f
> 00 00 80 00 Feb  1 09:07:36 ishtar kernel: scsi : aborting command due to
> timeout : pid 98722, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d bf
> 00 00 80 00 Feb  1 09:07:38 ishtar kernel: SCSI host 0 abort (pid 98722)
> timed out - resetting Feb  1 09:07:38 ishtar kernel: SCSI bus is being reset
> for host 0 channel 0. Feb  1 09:07:41 ishtar kernel: (scsi0:0:1:0)
> Synchronous at 80.0 Mbyte/sec, offset 31. Feb  1 09:07:41 ishtar kernel:
> (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb  1 09:08:11
> ishtar kernel: scsi : aborting command due to timeout : pid 98776, scsi0,
> channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 1f 00 00 80 00 Feb  1 09:08:11
> ishtar kernel: scsi : aborting command due to timeout : pid 98777, scsi0,
> channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 9f 00 00 80 00 Feb  1 09:08:13
> ishtar kernel: SCSI host 0 abort (pid 98776) timed out - resetting Feb  1
> 09:08:13 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb  1
> 09:08:13 ishtar kernel: Feb  1 09:08:13 ishtar kernel: wait_on_bh, CPU 0:
> Feb  1 09:08:13 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:08:13 ishtar kernel: bh:   1 [0 1]
>   Feb  1 09:08:13 ishtar kernel: irq:  0 [0 0]
> Feb  1 09:08:13 ishtar kernel: bh:   1 [0 1]
> Feb  1 09:08:16 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]>
> <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]>
> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb  1 09:08:16
> ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb
> 1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid
> 98800, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 44 19 5f 00 00 80 00
> Feb  1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid
> 98801, scsi0, channel 0, id 1<...then the log was corrupted by a mail
> message appearing in the middle of the log.  >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/


--
## CrossPoint v3.11 ##
-
To unsubscribe from the rsbac list, send a mail to
majordomo@morpork.shnet.org with
unsubscribe rsbac
as single line in the body.

Next Article (by Date): Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Previous Article (by Date): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Next in Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Articles sorted by: [Date] [Author] [Subject]


Go to Compuniverse LWGate Home Page.