On Fri, Jan 04, 2002 at 04:36:35PM +0200, Matti Aarnio wrote: > For past few weeks I have wondered of why my web-server machine > is hanging semi-regularly. Over the weekend I tried something new. > I have: > - Two 30+ GB SCSI Ultra2-Wide disks > - onboard AIC7XXX controller > - Disks with identical partition maps > - RAID-1 bound pairwise on those partitions > (RAIDTAB entries md3/md4/md5 - the md0/md1/md2 were on > other older disk, which was removed latter..) > - EXT3 filesystem at all partitions (except at 2 G swap..) > Mounted with default options > - machine with dual-P-III 750 MHz, and 786 MB memory (3*256MB) .. running 2.4.17 code compiled with "SMP" disabled, but having APICs enabled. (e.g. Local and IO-APIC.) It appears to me as: - SMP code, SMP mode: hangup in use - SMP code, "nosmp" boot option: hangup in use - UP code: works Tool chain: $ gcc -v gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-97.1) $ ld -V GNU ld version 2.11.92.0.12 20011121 I have partial evidence that EXT3 may be part of the problem, as another machine with RAID-1 disks with EXT2 filesystems is not hanging up when running RedHat 2.4.16-0.9custom kernel. That another machine has, however, IDE disks. Earlier experiements with the hanging box used that same RH kernel, and hangups were observed there too.. > When the machine is up all the way, and MD disks have finished > syncing, I execute command: > > dd if=/dev/zero bs=1024k of=test.file count=8000 > > which will lead to hard system hangup where the keyboard won't > react, SCSI led shines constantly, but nothig happens. > Right at the moment when the keyboard becomes unresponsibe, > the disk led will continue to flicker for a few seconds, but > then the flicker will stop, and the led stays constantly on. Of this flicker I am not entirely sure anymore. Maybe it happens, maybe not. I tried also SGI's kdb at 2.4.17, but when the system hangs, "pause/break" won't react at all. > Earlier guestimates of using "noapic", have no effect on > system hangups. Same command causes it quite soon. Even > "noapic nosmp" does hang. > > Large amount of RAM may contribute, but this 3*256MB > does not need e.g. PAE mode extensions. /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo _at_ vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Follow-Ups:
- Re: 2.4.17 RAID-1 EXT3 reliable to hang....Andrew Morton <akpm _at_ zip.com.au>
- Prev by Date: Re: [PATCH] More loop/BIO breakage (ll_rw_blk.c:1359)
- Next by Date: Re: [announce] [patch] ultra-scalable O(1) SMP and UP scheduler
- Previous by thread: [PATCH] [errata] locked page handling in shrink_cache
- Next by thread: Re: 2.4.17 RAID-1 EXT3 reliable to hang....
- Indexes:[Main][Thread]