Re: 1.0.9b-pre3 SMP test (was 1.0.9b-pre2 uploaded)


From: Jesse Pollard <pollard@cats-chateau.net>
Subject: Re: 1.0.9b-pre3 SMP test (was 1.0.9b-pre2 uploaded)
Date: Thu, 16 Mar 2000 21:25:24 -0600

Next Article (by Date): Re: Trusted Irix: http://oss.sgi.com/projects/ob1/index.html ao@morpork.shnet.org (A. Ott)
Previous Article (by Date): Trusted Irix: http://oss.sgi.com/projects/ob1/index.html Larry Colen
Next in Thread: Re: 1.0.9b-pre3 SMP test (was 1.0.9b-pre2 uploaded) ao@morpork.shnet.org (A. Ott)
Articles sorted by: [Date] [Author] [Subject]


On Wed, 15 Mar 2000, you wrote:
>********* ***************** ********** ****  *****   ***** ************
>  To subject Re: 1.0.9b-pre2 uploaded
>  pollard@cats-chateau.net (Jesse Pollard)  wrote:
>********** ******************** ******  ********  ******* *************
>
>> Hi,
>> A followup on SMP testing -
>> 1. I've finished reconfiguring my system. I now have a single 2G partition
>> for    testing. The test systems only use the one partition + swap. (Haven't
>>    finished web reconfig...)
>> 2. I did load 2.2.13, and 2.3.47 onto it. The problems still occur -
>>    a. once I traced it down to a page fault (looked like disk failure...)
>> 	this under 2.2.13
>>    b. once I traced it to the keyboard, also under 2.2.13
>>    c. Under 2.3.47, I couldn't get any output after the problem occured. I
>> could 	type in 6 characters, and recieve the echo. Then it hung. No output
>>  	dump trace ever.
>>    d. Under 2.3.47, I tried a maintence kernel, but the same thing occured.
>>
>> In all three cases I did a little extra testing while booting:
>>
>> I added thefollowing code to rc.S, after enabling swap and running
>> /bin/update:
>>
>> if [ "`/bin/uname -r`" != "2.2.13.SMP" ]; then
>>     echo "CRASH TEST - echo of output"
>>     echo "CRASH TEST" >/CRASH.TEST
>>     echo "append test" >>/CRASH.TEST
>>     echo "after append test"
>>     echo " reading contents of CRASH.TEST"
>>     cat /CRASH.TEST
>>     echo "beginning keyboard read test"
>>     echo "beginning keyboard read test" >>CRASH.TEST
>>     read junk
>>     echo "READ ...${junk}..." >>CRASH.TEST
>>
>>     echo " reading contents again:"
>>     cat /CRASH.TEST
>> fi
>>
>> When I boot the system I do have the disk write enabled to see if anything
>> occured. The failure only occurs at the "read junk" line. I do get the
>> contents of the CRASH.TEST file output, even though it doesn't quite make it
>> to disk (might if I put a sync in there...)
>
>I admit I currently have no idea what happens here, but I will  
>reinvestigate the locking.
>
>> One other thing I noticed -- from another post (AUTH problems):
>>
>> > kernel: rsbac_reg_init(): Initializing RSBAC: REG module registration Mar
>> > 8 12:20:05 ganja kernel: rsbac_init(): Starting rsbacd thread Mar  8
>> > 12:20:05 ganja kernel: rsbac_init(): Setting RSBAC auto timer Mar  8
>> > 12:20:05 ganja kernel: rsbac_init(): Ready.
>>
>> I don't get the line "kernel: rsbac_init(): Ready.". This may be due to
>> it being the very first boot.
>
>No, from pre3 on it should always be there. It is logged on the same level  
>as the first one ('initializing'): KERN_INFO.
>
>> A default /rsbac/useraci file was created.
>
>Good.
>
>> If you have some debugging suggestions/configuration changes I'm ready to
>> try them out.
>
>Please try changing the rsbac locking functions in aci_data_structures.h  
>to using irqsave/irqrestore (see include/asm/spinlock.h). The flags  
>parameter should be correctly provided in all locking calls.
>
>This is to make sure that really nothing can bypass the locks, but it  
>cannot be a long term solution.

I switched to the pre3 version:

I do see the "ready" message in both SMP and uniprocessor tests shown below.
The SMP errors still occur too. The following (quickie) trace is what I
saw:

wait_on_irq, CPU 1:
irq: 0 [0 0]
bh:  1 [1 0]
<[c010bc21]>		__global_cli
<[c01f588b]>		vgacon_set_cursor_size
<[c01bf9be]>		update_region
<[c01c3542]>		con_flush_chars
<[c01c690c]>		opost_block

I did try the exact same configuration with a uniprocessor kernel. This
system did what I expected: (taken from the messages file...):

....
Mar 16 21:09:31 tabby kernel: Partition check:
Mar 16 21:09:31 tabby kernel:  sda: sda1 sda2
Mar 16 21:09:31 tabby kernel:  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 >
Mar 16 21:09:31 tabby kernel:  sdc: sdc1
Mar 16 21:09:31 tabby kernel:  sdd: sdd1
Mar 16 21:09:31 tabby kernel: Real Time Clock Driver v1.10
Mar 16 21:09:31 tabby kernel: Linux PCMCIA Card Services 3.1.11
Mar 16 21:09:31 tabby kernel:   options:  [pci] [cardbus]
Mar 16 21:09:31 tabby kernel: Databook TCIC-2 PCMCIA probe: not found.
Mar 16 21:09:31 tabby kernel: ds: no socket drivers loaded!
Mar 16 21:09:31 tabby kernel: rsbac_init(): Initializing RSBAC v1.0.9b-pre3
Mar 16 21:09:31 tabby kernel: rsbac_init(): compiled modules: MAC FC SIM PM MS F
F AUTH REG ACL
Mar 16 21:09:31 tabby kernel: rsbac_init(): Registering RSBAC proc dir
Mar 16 21:09:31 tabby kernel: rsbac_init_pm(): Initializing RSBAC: PM subsystem
Mar 16 21:09:31 tabby kernel: rsbac_init_auth(): Initializing RSBAC: AUTH subsys
tem
Mar 16 21:09:31 tabby kernel: rsbac_init_acl(): Initializing RSBAC: ACL subsyste
m
Mar 16 21:09:31 tabby kernel: rsbac_reg_init(): Initializing RSBAC: REG module r
egistration
Mar 16 21:09:31 tabby kernel: rsbac_init(): Starting rsbacd thread
Mar 16 21:09:31 tabby kernel: rsbac_init(): Setting RSBAC auto timer
Mar 16 21:09:31 tabby kernel: rsbac_init(): Ready.
Mar 16 21:09:31 tabby kernel: scsi0: Tagged Queuing now active for Target 0
Mar 16 21:09:31 tabby kernel: Adding Swap: 265064k swap-space (priority -1)
Mar 16 21:09:31 tabby kernel: rsbac_adf_request(): request CHANGE_OWNER, caller_
pid 76, caller_prog_name rpc.portmap, caller_uid 0, target-type PROCESS, tid 76,
 attr owner, value 1, result NOT_GRANTED by AUTH
Mar 16 21:16:25 tabby syslogd 1.3-3: restart.

The last entry is when I rebooted the system. From another mail message
I saw the answer to the "...request CHANGE_OWNER...NOT_GRANTED by AUTH" message
where it hung. (I have to give init, ... and other daemons the privleges
needed). I did get the expected warning about the read/write root in the
uniprocessor test.

It's looking more and more like a missing lock in the vga console driver...

It looks like my little keyboard test above isn't showing the problem
quite yet.

I'm going to think about trying a serial console (a temporary link to my
firewalls unused serial lines...).

That may separate the problem from the vga virtual terminals and the
console device.
-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@cats-chateau.net

Any opinions expressed are solely my own.
-
To unsubscribe from the rsbac list, send a mail to
majordomo@morpork.shnet.org with
unsubscribe rsbac
as single line in the body.

Next Article (by Date): Re: Trusted Irix: http://oss.sgi.com/projects/ob1/index.html ao@morpork.shnet.org (A. Ott)
Previous Article (by Date): Trusted Irix: http://oss.sgi.com/projects/ob1/index.html Larry Colen
Next in Thread: Re: 1.0.9b-pre3 SMP test (was 1.0.9b-pre2 uploaded) ao@morpork.shnet.org (A. Ott)
Articles sorted by: [Date] [Author] [Subject]


Go to Compuniverse LWGate Home Page.