Age | Commit message (Collapse) | Author |
|
Tasks can block on hotplug.lock in pin_current_cpu(), but their state
might be != RUNNING. So the mutex wakeup will set the state
unconditionally to RUNNING. That might cause spurious unexpected
wakeups. We could provide a state preserving mutex_lock() function,
but this is semantically backwards. So instead we convert the
hotplug.lock() to a spinlock for RT, which has the state preserving
semantics already.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The wait_task_interactive() will have a task sleep waiting for another
task to have a certain state. But it ignores the rt_spin_locks state
and can return with an incorrect result if the task it is waiting
for is blocked on a rt_spin_lock() and is waking up.
The rt_spin_locks save the tasks state in the saved_state field
and the wait_task_interactive() must also test that state.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190345.979435764@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
|
On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
> to missing hacks in the console locking which I dropped on purpose.
>
To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.
Comments are welcome of course. Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.
Thanks,
Jason.
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks. This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context. A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
function __do_softirq_common() does the actual work.
The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section. This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section. Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing. This issue will be fixed by a later commit.
The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.
[ paulmck: Added a useful changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
With RT_FULL we get the below wreckage:
[ 126.060484] =======================================================
[ 126.060486] [ INFO: possible circular locking dependency detected ]
[ 126.060489] 3.0.1-rt10+ #30
[ 126.060490] -------------------------------------------------------
[ 126.060492] irq/24-eth0/1235 is trying to acquire lock:
[ 126.060495] (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060503]
[ 126.060504] but task is already holding lock:
[ 126.060506] (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060511]
[ 126.060511] which lock already depends on the new lock.
[ 126.060513]
[ 126.060514]
[ 126.060514] the existing dependency chain (in reverse order) is:
[ 126.060516]
[ 126.060516] -> #1 (&p->pi_lock){-...-.}:
[ 126.060519] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060524] [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85
[ 126.060527] [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f
[ 126.060531] [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a
[ 126.060534] [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f
[ 126.060537] [<ffffffff810d9020>] rcu_boost+0xad/0xde
[ 126.060541] [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b
[ 126.060544] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060547] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060551]
[ 126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
[ 126.060555] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060558] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060561] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060564] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060566] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060569] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060573] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060576] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060580] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060583] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060585] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060590] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060593] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060597] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060600] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060603] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060606] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060608] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060611]
[ 126.060612] other info that might help us debug this:
[ 126.060614]
[ 126.060615] Possible unsafe locking scenario:
[ 126.060616]
[ 126.060617] CPU0 CPU1
[ 126.060619] ---- ----
[ 126.060620] lock(&p->pi_lock);
[ 126.060623] lock(&(lock)->wait_lock);
[ 126.060625] lock(&p->pi_lock);
[ 126.060627] lock(&(lock)->wait_lock);
[ 126.060629]
[ 126.060629] *** DEADLOCK ***
[ 126.060630]
[ 126.060632] 1 lock held by irq/24-eth0/1235:
[ 126.060633] #0: (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[ 126.060638]
[ 126.060638] stack backtrace:
[ 126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
[ 126.060643] Call Trace:
[ 126.060644] <IRQ> [<ffffffff810acbde>] print_circular_bug+0x289/0x29a
[ 126.060651] [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[ 126.060655] [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99
[ 126.060658] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060661] [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[ 126.060664] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060668] [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[ 126.060671] [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[ 126.060674] [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c
[ 126.060677] [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[ 126.060680] [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4
[ 126.060683] [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[ 126.060687] [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[ 126.060690] [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[ 126.060693] [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[ 126.060696] [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5
[ 126.060701] [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90
[ 126.060704] [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[ 126.060708] [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21
[ 126.060711] [<ffffffff81075425>] wake_up_process+0x15/0x17
[ 126.060715] [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[ 126.060718] [<ffffffff81081df9>] irq_exit+0x49/0x55
[ 126.060721] [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[ 126.060724] [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[ 126.060726] <EOI> [<ffffffff81072855>] ? migrate_disable+0x75/0x12d
[ 126.060733] [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f
[ 126.060736] [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f
[ 126.060739] [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[ 126.060742] [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59
[ 126.060745] [<ffffffff810d582c>] irq_thread+0xde/0x1af
[ 126.060748] [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a
[ 126.060751] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060754] [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[ 126.060757] [<ffffffff8109a760>] kthread+0x99/0xa1
[ 126.060761] [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[ 126.060764] [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a
[ 126.060768] [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe
[ 126.060771] [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c
[ 126.060774] [<ffffffff81509b10>] ? gs_change+0xb/0xb
Because irq_exit() does:
void irq_exit(void)
{
account_system_vtime(current);
trace_hardirq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
...
}
Which triggers a wakeup, which uses RCU, now if the interrupted task has
t->rcu_read_unlock_special set, the rcu usage from the wakeup will end
up in rcu_read_unlock_special(). rcu_read_unlock_special() will test
for in_irq(), which will fail as we just decremented preempt_count
with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for
PREEMPT_RT_FULL reads:
int in_serving_softirq(void)
{
int res;
preempt_disable();
res = __get_cpu_var(local_softirq_runner) == current;
preempt_enable();
return res;
}
Which will thus also fail, resulting in the above wreckage.
The 'somewhat' ugly solution is to open-code the preempt_count() test
in rcu_read_unlock_special().
Also, we're not at all sure how ->rcu_read_unlock_special gets set
here... so this is very likely a bandaid and more thought is required.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This fixes the following build error for the preempt-rt kernel.
make kernel/fork.o
CC kernel/fork.o
kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration
make[2]: *** [kernel/fork.o] Error 1
make[1]: *** [kernel/fork.o] Error 2
The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default.
The non-rt kernels explicitly cache align only the tasklist_lock in
kernel/fork.c
That can create a build conflict. This fixes the build problem by making the
non-rt kernels cache align RWLOCKs by default. The side effect is that
the other RWLOCKs are also cache aligned for non-rt.
This is a short term solution for rt only.
The longer term solution would be to push the cache aligned DEFINE_RWLOCK
to mainline. If there are objections, then we could create a
DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature.
Comments? Objections?
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
based locking functions for preempt-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Requeue with timeout causes a bug with PREEMPT_RT_FULL.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The processing of softirqs in irq thread context is a performance gain
for the non-rt workloads of a system, but it's counterproductive for
interrupts which are explicitely related to the realtime
workload. Allow such interrupts to prevent softirq processing in their
thread context.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads,
and spinlocks turn are mutexes. But this can cause issues with
tasks disabling tasklets. A tasklet runs under ksoftirqd, and
if a tasklets are disabled with tasklet_disable(), the tasklet
count is increased. When a tasklet runs, it checks this counter
and if it is set, it adds itself back on the softirq queue and
returns.
The problem arises in RT because ksoftirq will see that a softirq
is ready to run (the tasklet softirq just re-armed itself), and will
not sleep, but instead run the softirqs again. The tasklet softirq
will still see that the count is non-zero and will not execute
the tasklet and requeue itself on the softirq again, which will
cause ksoftirqd to run it again and again and again.
It gets worse because ksoftirqd runs as a real-time thread.
If it preempted the task that disabled tasklets, and that task
has migration disabled, or can't run for other reasons, the tasklet
softirq will never run because the count will never be zero, and
ksoftirqd will go into an infinite loop. As an RT task, it this
becomes a big problem.
This is a hack solution to have tasklet_disable stop tasklets, and
when a tasklet runs, instead of requeueing the tasklet softirqd
it delays it. When tasklet_enable() is called, and tasklets are
waiting, then the tasklet_enable() will kick the tasklets to continue.
This prevents the lock up from ksoftirq going into an infinite loop.
[ rostedt@goodmis.org: ported to 3.0-rt ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
ERROR: "in_serving_softirq" [net/sched/cls_cgroup.ko] undefined!
The above can be fixed by exporting in_serving_softirq
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1321235083-21756-2-git-send-email-jkacur@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124423.567944215@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124423.128129033@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The migrate_disable() can cause a bit of a overhead to the RT kernel,
as changing the affinity is expensive to do at every lock encountered.
As a running task can not migrate, the actual disabling of migration
does not need to occur until the task is about to schedule out.
In most cases, a task that disables migration will enable it before
it schedules making this change improve performance tremendously.
[ Frank Rowand: UP compile fix ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124422.779693167@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
<NMI> [<ffffffff812dafd8>] spin_bug+0x94/0xa8
[<ffffffff812db07f>] do_raw_spin_lock+0x43/0xea
[<ffffffff814fa9be>] _raw_spin_lock_irqsave+0x6b/0x85
[<ffffffff8106ff9e>] ? migrate_disable+0x75/0x12d
[<ffffffff81078aaf>] ? pin_current_cpu+0x36/0xb0
[<ffffffff8106ff9e>] migrate_disable+0x75/0x12d
[<ffffffff81115b9d>] pagefault_disable+0xe/0x1f
[<ffffffff81047027>] copy_from_user_nmi+0x74/0xe6
[<ffffffff810489d7>] perf_callchain_user+0xf3/0x135
Now clearly we can't go around taking locks from NMI context, cure
this by short-circuiting migrate_disable() when we're in an atomic
context already.
Add some extra debugging to avoid things like:
preempt_disable()
migrate_disable();
preempt_enable();
migrate_enable();
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314967297.1301.14.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-wbot4vsmwhi8vmbf83hsclk6@git.kernel.org
|
|
Assigning mask = tsk_cpus_allowed(p) after p->migrate_disable = 0 ensures
that we won't see a mask change.. no push/pull, we stack tasks on one CPU.
Also add a couple fields to sched_debug for the next guy.
[ Build fix from Stratos Psomadakis <psomas@gentoo.org> ]
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1314108763.6689.4.camel@marge.simson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Make migrate_disable() be a preempt_disable() for !rt kernels. This
allows generic code to use it but still enforces that these code
sections stay relatively small.
A preemptible migrate_disable() accessible for general use would allow
people growing arbitrary per-cpu crap instead of clean these things
up.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-275i87sl8e1jcamtchmehonm@git.kernel.org
|
|
Change from task_rq_lock() to raw_spin_lock(&rq->lock) to avoid a few
atomic ops. See comment on why it should be safe.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-cbz6hkl5r5mvwtx5s3tor2y6@git.kernel.org
|
|
RT added two bytes to trace migrate disable counting to the trace events
and used two bytes of the padding to make the change. The structures and
all were updated correctly, but the display in the event formats was
not:
cat /debug/tracing/events/sched/sched_switch/format
name: sched_switch
ID: 51
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:unsigned short common_migrate_disable; offset:8; size:2; signed:0;
field:int common_padding; offset:10; size:2; signed:0;
The field for common_padding has the correct size and offset, but the
use of "int" might confuse some parsers (and people that are reading
it). This needs to be changed to "unsigned short".
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1321467575.4181.36.camel@frodo
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
cpu_unplug_begin() should be called before CPU_DOWN_PREPARE, because
at CPU_DOWN_PREPARE cpu_active is cleared and sched_domain is
rebuilt. Otherwise the 'sync_unplug' thread will be running on the cpu
on which it's created and not bound on the cpu which is about to go
down.
I found that by an incorrect warning on smp_processor_id() called by
sync_unplug/1, and trace shows below:
(echo 1 > /sys/device/system/cpu/cpu1/online)
bash-1664 [000] 83.136620: _cpu_down: Bind sync_unplug to cpu 1
bash-1664 [000] 83.136623: sched_wait_task: comm=sync_unplug/1 pid=1724 prio=120
bash-1664 [000] 83.136624: _cpu_down: Wake sync_unplug
bash-1664 [000] 83.136629: sched_wakeup: comm=sync_unplug/1 pid=1724 prio=120 success=1 target_cpu=000
Wants to be folded back....
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/1318762607-2261-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When retry happens, it's likely that the task has been migrated to
another cpu (except unplug failed), but it still derefernces the
original hotplug_pcp per cpu data.
Update the pointer to hotplug_pcp in the retry path, so it points to
the current cpu.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110728031600.GA338@windriver.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
" in task name
Otherwise the output will look a little odd.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/1318762607-2261-2-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
get_online_cpus() is a heavy weight function which involves a global
mutex. migrate_disable() wants a simpler construct which prevents only
a CPU from going doing while a task is in a migrate disabled section.
Implement a per cpu lockless mechanism, which serializes only in the
real unplug case on a global mutex. That serialization affects only
tasks on the cpu which should be brought down.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Instead of playing with non-preemption, introduce explicit
startup serialization. This is more robust and cleaner as
well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
When a task blocks on a rt lock, it saves the current state in
p->saved_state, so a lock related wake up will not destroy the
original state.
When a real wakeup happens, while the task is running due to a lock
wakeup already, we update p->saved_state to TASK_RUNNING, but we do
not return success, which might cause another wakeup in the waitqueue
code and the task remains in the waitqueue list. Return success in
that case as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|