Google luky.org euqset.org

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] [request for inclusion] Realtime LSM


On Thu, 2005-01-13 at 16:25 -0500, Lee Revell wrote:
> On Thu, 2005-01-13 at 22:07 +0100, Arjan van de Ven wrote:
> > On Thu, Jan 13, 2005 at 03:04:26PM -0600, Jack O'Quin wrote:
> > > 
> > > (Probably, this simplistic analysis misses some other, more subtle,
> > > factors.)
> > 
> > I think you can do nasty things to the locks held by those threads too
> > 
> > > 
> > > RT threads should not do FS writes of their own.  But, a badly broken
> > > or malicious one could, I suppose.  So, that might provide a mechanism
> > > for losing more data than usual.  Is that what you had in mind?
> > 
> > basically yes.
> > note that "FS writes" can come from various things, including library calls
> > made and such. But I think you got my point; even though it might seem a bit
> > theoretical it sure is unpleasant.
> > 
> 
> I added Con to the cc: because this thread is starting to converge with
> an email discussion we've been having.
> 
> The basic issue is that the current semantics of SCHED_FIFO seem make
> the deadlock/data corruption due to runaway RT thread issue difficult.
> The obvious solution is a new scheduling class equivalent to SCHED_FIFO
> but with a mechanism for the kernel to demote the offending thread to
> SCHED_OTHER in an emergency.  The problem can be solved in userspace
> with a SCHED_FIFO watchdog thread that runs at a higher RT priority than
> all other RT processes.
> 
> This all seems to imply that introducing an rlimit for MAX_RT_PRIO is an
> excellent solution.  The RT watchdog thread could run as root, and the
> rlimit would be used to ensure than even nonroot users in the RT group
> could never preempt the watchdog thread.

Just an idea. What about throttling runaway RT tasks?
If the system spend more than 98% in RT tasks for 5s consider this as a
_fatal error_. Print an error message and throttle RT tasks by inserting
ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
means one SCHED_OTHER only tick all 50 ticks.

The limit and timeout should be configurable and of course it can be
disabled.

I know this is against RT task preempt all SCHED_OTHER but this is only
for a fatal system state to be able to recover sanely. A locked up
machine is is the worse alternative.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


$B$3$N>pJs$,$"$J$?$NC5$7$F$$?$b$N$+$I$&$+A*Br$7$F$/$@$5$!#(B
yes/$B$^$5$K$3$l$@!*(B   no/$B0c$&$J$!(B   part/$B0lIt8+$D$+$C$?(B   try/$B$3$l$G;n$7$F$_$k(B

$B$"$J$?$,C5$7$F$$?>pJs$O$I$N$h$&$J$3$H$+!"$4<+M3$K5-F~2<$5$!#FC$K!V$^$5$K$3$l$@!*!W$H8@$&>l9g$O5-F~$r$*4j$$7$^$9!#(B
$BNc(B:$B!VJ#?t$N%^%7%s$+$i(BCATV$B7PM3$G(Bipmasquerade$B$rMxMQ$7$F(BWeb$B$r;2>H$7$?$>l9g$N@_Dj$K$D$$F!W(B
Follow-Ups: References: