On Thu, 8 Jan 2004, martin f krafft wrote: > Hi all, > > I operate a groupware server which is giving me a very hard time. > It's an AMD Athlon XP 3000+ with 1Gb of RAM, and four 7200 UPM IDE > harddrives, two attached to the primary channels of the onboard > controller, and two to the primary channels of a Promise 20269 EIDE > controller. The kernel is a 2.4.24 with the configuration I placed > here: > ftp://ftp.madduck.net/scratch/config-2.4.24-gaia.gz > > The system is configured with 7 Software-RAID and three swap partitions: > > md1: /boot (ext3) RAID 1 spanning hda1 and hdc1 > md5: / (ext3) RAID 5 hda5/hdc5/hde5 with hdg5 as a spare > md6: /usr (ext3) RAID 5 hda6/hdc6/hde6 with hdg6 as a spare > md7: /var (ext3) RAID 5 hda7/hdc7/hde7 with hdg7 as a spare > md8: /usr/local (ext3) RAID 5 hda8/hdc8/hde8 with hdg8 as a spare > md9: /home (ext3) RAID 5 hda9/hdc9/hde9 with hdg9 as a spare > md10: /tmp (ext3) RAID 5 hda10/hdc10/hde10 with hdg10 as a spare > > hda2 holds a non-RAID rescue system with RAID 1/5 supporrt > > hdc2, hde2, hdg2 are swap partitions of 256 Mb each. > > hde1 and hdg1 are unused. > > The individial harddisks are identically tweaked with hdparm: > > hdparm -A1 -B255 -c1 -d1 -p -u0 -W0 -Xudma6 /dev/hd{a,c,e,g} > > See the end of this mail for details. > > The system experiences severe stability problems, which I relate to > the filesystem, RAID, or controller code, because it's reproducible > with excessive disk operations. E.g., doing something like > > rsync -a --exclude /tmp / /tmp/dump <snip> > Interestingly, just now, the machine crashed differently. `vmstat 1` > was still running, but new processes could not be started, after the > kernel reported a lot of oopses in user-space processes (e.g. rsync, > top, zsh), as well as some of the kjournald oopses like above. > I have included the footprint of the user-space program oopses > further down. `vmstat 1` was happily printing the following away, > when the system was already unusable. The b > 127 value is > interesting, as it has been continuously increasing (well, in > a non-decreasing way) after a certain point, and somewhere on the > way, the system reached the state of agnosia. > > 0 127 2 16124 10304 43004 682188 0 0 0 0 109 7 0 0 100 > 0 127 2 16124 10304 43004 682188 0 0 0 0 111 5 0 0 100 > 0 127 2 16124 10304 43004 682188 0 0 0 0 114 9 0 0 100 > 0 127 2 16124 10304 43004 682188 0 0 0 0 111 5 0 0 100 > 0 127 2 16124 10304 43004 682188 0 0 0 0 115 9 0 0 100 > 0 128 2 16124 10420 43004 682060 0 0 0 0 119 12 0 0 100 > 0 128 2 16124 10420 43004 682060 0 0 0 0 122 11 0 0 100 > > Apart from these panics and hangups, the system also randomly issues > segfaults to processes, or reports a kernel oops. These take the > following form: Hi Martin, I can't help you much, but I believe your problem might be related to faulty hardware. Have you checked if the memory OK ? Try disabling DMA on the Promise? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo _at_ vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Follow-Ups:
- Re: stability problems with 2.4.24/Software RAID/ext3Marcelo Tosatti
- Re: stability problems with 2.4.24/Software RAID/ext3Martin F Krafft
- stability problems with 2.4.24/Software RAID/ext3martin f krafft
- Prev by Date: Re: RAID 5 from 2.4.24 to 2.6.0
- Next by Date: [PATCH] LSI Logic MegaRAID3 PCI ID [Was: MegaRAID on AMD64 under2.6.1]
- Previous by thread: stability problems with 2.4.24/Software RAID/ext3
- Next by thread: Re: stability problems with 2.4.24/Software RAID/ext3
- Indexes:[Main][Thread]