From: ao@morpork.shnet.org (A. Ott)
Subject: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.)
Date: 28 Feb 2000 10:56:00 +0100
Next Article (by Date): Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Previous Article (by Date): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Next in Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Articles sorted by: [Date]
[Author]
[Subject]
Hi! I found this posting in linux-kernel, with similar wait_on_bh in 2.2.14. It might be something outside RSBAC. Maybe we should use one of the kernel debugging patches, e.g. IKB, with lockup detection. Amon. > David Ford wrote: > > As the subject of lost interrupts is brought up again, I feel like > > pitching in again :) > > If DMA is enabled in the kernel compile, the last message will instead be: > > > > hda: lost interrupt > > > > and the laptop will lock up. > --- > > I don't know if this is related or not...but thought I would report > it. System 2xPIII, 1, then 2 SCSI HD's. > > I've been running off of one SCSI disk on the 2.2.14 kernel for > a while now -- no problem. Yesturday I mounted a 2nd SCSI hard to do > builds on. I was remotely logged in. I loaded it up with a large project > and did a make. Then I remotely logged in on another window. It just > hung there. Tried a third window. It hung too. So of course both > of the login attempts would be accessing sda -- the make was accessing > sdb. Finally logged in after long wait -- tried a 'vi' on a file on > sdb in one window and a 'top' in the other -- both hung. Tried to kill > the make -- that wouldn't work -- just ignored me -- finally suspened and > issued kill -- that took forever. Went to console and screen was in > power save off -- at first unresponsive. Then it turned on the screen > to a 'frozen' screen saver (Matrix). Machine was still pingable, but > login attempts would time out. Ping times were down in the .2 range, but > occasionally there would be a ping time of over 1400ms, followed by a > 2nd at 200+ms, then back to .2. Had to power-cycle machine to reboot. > Lots of messed up files in /var + /tmp (sda) and on my build disk (sdb). > So there were lots of outstanding writes that were needing completion. > > Had to manually fsck 2 of the partitions (/var + /tmp) -- when it came back > up I looked at the log -- the last thing before the reboot was: > -------- > Feb 1 08:22:32 ishtar -- MARK -- > Feb 1 08:39:51 ishtar in.rlogind[8354]: connect from 192.111.23.234 > Feb 1 08:39:52 ishtar pam_rhosts_auth[8354]: allowed to > law@nimue.corp.sgi.com as law > Feb 1 08:40:09 ishtar su: (to root) law on /dev/pts/3 > Feb 1 08:43:17 ishtar su: (to root) law on /dev/pts/3 > Feb 1 09:02:32 ishtar -- MARK -- > Feb 1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid > 98419, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1a ff 00 00 80 00 > Feb 1 09:04:43 ishtar kernel: scsi : aborting command due to timeout : pid > 98420, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 1b 7f 00 00 50 00 > Feb 1 09:04:44 ishtar kernel: SCSI host 0 abort (pid 98420) timed out - > resetting Feb 1 09:04:44 ishtar kernel: SCSI bus is being reset for host 0 > channel 0. Feb 1 09:04:44 ishtar kernel: > Feb 1 09:04:44 ishtar kernel: wait_on_bh, CPU 0: > Feb 1 09:04:44 ishtar kernel: irq: 0 [0 0] > Feb 1 09:04:44 ishtar kernel: bh: 1 [0 1] > Feb 1 09:04:47 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]> > <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]> > <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:04:47 > ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb > 1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid > 98431, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 14 87 00 00 18 00 > Feb 1 09:05:17 ishtar kernel: scsi : aborting command due to timeout : pid > 98435, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 00 57 00 00 28 00 > Feb 1 09:05:18 ishtar kernel: SCSI host 0 abort (pid 98431) timed out - > resetting Feb 1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0 > channel 0. Feb 1 09:05:19 ishtar kernel: > Feb 1 09:05:18 ishtar kernel: SCSI bus is being reset for host 0 channel 0. > Feb 1 09:05:19 ishtar kernel: > Feb 1 09:05:19 ishtar kernel: wait_on_bh, CPU 0: > Feb 1 09:05:19 ishtar kernel: irq: 0 [0 0] > Feb 1 09:05:19 ishtar kernel: bh: 1 [0 1] > Feb 1 09:05:22 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> > <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb > 1 09:05:22 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, > offset 15. Feb 1 09:05:52 ishtar kernel: scsi : aborting command due to > timeout : pid 98449, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 07 > 00 00 80 00 Feb 1 09:05:52 ishtar kernel: scsi : aborting command due to > timeout : pid 98450, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 9c 13 87 > 00 00 80 00 Feb 1 09:05:52 ishtar kernel: SCSI host 0 abort (pid 98449) > timed out - resetting Feb 1 09:05:52 ishtar kernel: SCSI bus is being reset > for host 0 channel 0. Feb 1 09:05:53 ishtar kernel: > Feb 1 09:05:53 ishtar kernel: wait_on_bh, CPU 1: > Feb 1 09:05:53 ishtar kernel: irq: 0 [0 0] > Feb 1 09:05:53 ishtar kernel: bh: 1 [1 0] > Feb 1 09:05:56 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> > <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb > 1 09:05:57 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, > offset 15. Feb 1 09:07:02 ishtar kernel: scsi : aborting command due to > timeout : pid 98706, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 24 29 17 > 00 00 50 00 Feb 1 09:07:02 ishtar kernel: scsi : aborting command due to > timeout : pid 98707, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 6c 11 3f > 00 00 08 00 Feb 1 09:07:02 ishtar kernel: SCSI host 0 abort (pid 98707) > timed out - resetting Feb 1 09:07:02 Feb 1 09:07:02 ishtar kernel: SCSI > bus is being reset for host 0 channel 0. Feb 1 09:07:03 ishtar kernel: > Feb 1 09:07:03 ishtar kernel: wait_on_bh, CPU 0: > Feb 1 09:07:03 ishtar kernel: irq: 0 [0 0] > Feb 1 09:07:03 ishtar kernel: bh: 1 [0 1] > Feb 1 09:07:06 ishtar kernel: <[c010a4b5]> <[c018f1a4]> <[c018fd43]> > <[c0195188]> <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb > 1 09:07:06 ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, > offset 15. Feb 1 09:07:36 ishtar kernel: scsi : aborting command due to > timeout : pid 98721, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d 3f > 00 00 80 00 Feb 1 09:07:36 ishtar kernel: scsi : aborting command due to > timeout : pid 98722, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 c4 1d bf > 00 00 80 00 Feb 1 09:07:38 ishtar kernel: SCSI host 0 abort (pid 98722) > timed out - resetting Feb 1 09:07:38 ishtar kernel: SCSI bus is being reset > for host 0 channel 0. Feb 1 09:07:41 ishtar kernel: (scsi0:0:1:0) > Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:07:41 ishtar kernel: > (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb 1 09:08:11 > ishtar kernel: scsi : aborting command due to timeout : pid 98776, scsi0, > channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 1f 00 00 80 00 Feb 1 09:08:11 > ishtar kernel: scsi : aborting command due to timeout : pid 98777, scsi0, > channel 0, id 1, lun 0 Write (10) 00 00 6c 1c 9f 00 00 80 00 Feb 1 09:08:13 > ishtar kernel: SCSI host 0 abort (pid 98776) timed out - resetting Feb 1 > 09:08:13 ishtar kernel: SCSI bus is being reset for host 0 channel 0. Feb 1 > 09:08:13 ishtar kernel: Feb 1 09:08:13 ishtar kernel: wait_on_bh, CPU 0: > Feb 1 09:08:13 ishtar kernel: irq: 0 [0 0] > Feb 1 09:08:13 ishtar kernel: bh: 1 [0 1] > Feb 1 09:08:13 ishtar kernel: irq: 0 [0 0] > Feb 1 09:08:13 ishtar kernel: bh: 1 [0 1] > Feb 1 09:08:16 ishtar kernel: <[c010a4b5]> <[c0165e42]> <[c0165eac]> > <[c0165ebd]> <[c01746c5]> <[c0157c4f]> <[c013112b]> <[c0131616]> > <6>(scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31. Feb 1 09:08:16 > ishtar kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. Feb > 1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid > 98800, scsi0, channel 0, id 1, lun 0 Write (10) 00 01 44 19 5f 00 00 80 00 > Feb 1 09:08:46 ishtar kernel: scsi : aborting command due to timeout : pid > 98801, scsi0, channel 0, id 1<...then the log was corrupted by a mail > message appearing in the middle of the log. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.rutgers.edu > Please read the FAQ at http://www.tux.org/lkml/ -- ## CrossPoint v3.11 ## - To unsubscribe from the rsbac list, send a mail to majordomo@morpork.shnet.org with unsubscribe rsbac as single line in the body.
Next Article (by Date): Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Previous Article (by Date): Re: 1.0.9b-pre2 uploaded (2.2.13 SMP testing) Jesse Pollard
Next in Thread: Re: Lost interrupt w/SCSI too (was Re: IDE, DMA, and lost interrupts.) Jesse Pollard
Articles sorted by: [Date]
[Author]
[Subject]