Age | Commit message (Collapse) | Author |
|
There are macros for static initializer for the three out of four
possible notifier types, that are:
ATOMIC_NOTIFIER_HEAD()
BLOCKING_NOTIFIER_HEAD()
RAW_NOTIFIER_HEAD()
This patch provides a static initilizer for the forth type to make it
complete.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Issue debugged by Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
sparc does not have CONFIG_EARLY_PRINTK option.
So early-printk-consolidate.patch breaks compilation:
arch/sparc/built-in.o: In function `setup_arch':
(.init.text+0x15e4): undefined reference to `early_console'
arch/sparc/built-in.o: In function `setup_arch':
(.init.text+0x15ec): undefined reference to `early_console'
The below addition fixes that.
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see
if the RCU core needs anything from this CPU. If so, RCU raises
RCU_SOFTIRQ to carry out any needed processing.
This approach has worked well historically, but it is undesirable on
NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time
in userspace, so that scheduling-clock interrupts can be disabled while
there is only one runnable task on the CPU in question. Unfortunately,
raising any softirq has the potential to wake up ksoftirqd, which would
provide the second runnable task on that CPU, preventing disabling of
scheduling-clock interrupts.
What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone,
relying on the grace-period kthreads' quiescent-state forcing to
do any needed RCU work on behalf of those CPUs.
This commit therefore refrains from raising RCU_SOFTIRQ on any
NO_HZ_FULL CPUs during any grace periods that have been in effect
for less than one second. The one-second limit handles the case
where an inappropriate workload is running on a NO_HZ_FULL CPU
that features lots of scheduling-clock interrupts, but no idle
or userspace time.
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Mike Galbraith <bitbucket@online.de>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
neigh_priv_len is defined as u8. With all debug enabled struct
ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96
bytes and here the spinlock with 72 bytes.
The size value still fits in this u8 leaving some room for more.
On -RT struct ipoib_neigh put on weight and has 392 bytes. The main
reason is sk_buff_head with 288 and the fatty here is spinlock with 192
bytes. This does no longer fit into into neigh_priv_len and gcc
complains.
This patch changes neigh_priv_len from being 8bit to 16bit. Since the
following element (dev_id) is 16bit followed by a spinlock which is
aligned, the struct remains with a total size of 3200 (allmodconfig) /
2048 (with as much debug off as possible) bytes on x86-64.
On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug
off as possible) bytes long. The numbers were gained with and without
the patch to prove that this change does not increase the size of the
struct.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There was a reported deadlock on -rt which lockdep didn't report.
It turns out that in irq_exit() we tell lockdep that the hardirq
context ends and then do all kinds of locking afterwards.
To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this
ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly
recorded as happening from hardirq context.
This however leads to the 'fun' little problem of running softirqs
while in hardirq context. To cure this make the softirq code a little
more complex (in the CONFIG_TRACE_IRQFLAGS case).
Due to stack swizzling arch dependent trickery we cannot pass an
argument to __do_softirq() to tell it if it was done from hardirq
context or not; so use a side-band argument.
When we do __do_softirq() from hardirq context, 'atomically' flip to
softirq context and back, so that no locking goes without being in
either hard- or soft-irq context.
I didn't find any new problems in mainline using this patch, but it
did show the -rt problem.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
|
|
|
|
The patch "net: gianfar: do not try to cleanup TX packets if they are
not done" for 3.12-rt left out the return value for gfar_clean_tx_ring().
This would cause an error when building this module. Note, this module
does not build on x86 and was not tested because of that.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The changes to move the migrate_disable() down in the trylocks()
caused race conditions to appear in the cpu hotplug code. The
migrate disables must be done before any of the rtmutexes are
taken, otherwise a lock may be held that prevents hotplug from
moving forward.
Link: http://lkml.kernel.org/r/20140429201308.63292691@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Nicholas Mc Guire <der.herr@hofr.at>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Clark Williams <williams@redhat.com>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8vdw4bfcsds27cvox6rpb334@git.kernel.org
|
|
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The ntp code for notify_cmos_timer() is called from a hard interrupt
context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks
that have been converted to mutexes, thus calling schedule_delayed_work()
from interrupt is not safe.
Add a helper thread that does the call to schedule_delayed_work and wake
up that thread instead of calling schedule_delayed_work() directly.
This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls
schedule_delayed_work() directly in irq context.
Note: There's a few places in the kernel that do this. Perhaps the RT
code should have a dedicated thread that does the checks. Just register
a notifier on boot up for your check and wake up the thread when
needed. This will be a todo.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merged Steven's
static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) {
- swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
+ wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]);
}
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The function "swait_head_has_waiters()" was internalized into
wait-simple.c but it parallels the waitqueue_active of normal
waitqueue support. Given that there are over 150 waitqueue_active
users in drivers/ fs/ kernel/ and the like, lets make it globally
visible, and rename it to parallel the waitqueue_active accordingly.
We'll need to do this if we expect to expand its usage beyond RT.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
wait_queue is a swiss army knife and in most of the cases the
complexity is not needed. For RT waitqueues are a constant source of
trouble as we can't convert the head lock to a raw spinlock due to
fancy and long lasting callbacks.
Provide a slim version, which allows RT to replace wait queues. This
should go mainline as well, as it lowers memory consumption and
runtime overhead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
smp_mb() added by Steven Rostedt to fix a race condition with swait
wakeups vs adding items to the list.
|
|
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
This tracepoint is responsible for:
|[<814cc358>] __schedule_bug+0x4d/0x59
|[<814d24cc>] __schedule+0x88c/0x930
|[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50
|[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50
|[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250
|[<814d27d9>] schedule+0x29/0x70
|[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278
|[<814d3786>] rt_spin_lock+0x26/0x30
|[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915]
|[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915]
|[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915]
|[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915]
|[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60
|[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180
|[<8101d063>] ? native_sched_clock+0x13/0x80
|[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915]
|[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm]
|[<8101d0d9>] ? sched_clock+0x9/0x10
|[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915]
|[<810f1c18>] ? rb_commit+0x68/0xa0
|[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0
|[<81197467>] do_vfs_ioctl+0x97/0x540
|[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130
|[<811979a1>] sys_ioctl+0x91/0xb0
|[<814db931>] tracesys+0xe1/0xe6
Chris Wilson does not like to move i915_trace_irq_get() out of the macro
|No. This enables the IRQ, as well as making a number of
|very expensively serialised read, unconditionally.
so it is gone now on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Joakim Hernberg <jbh@alchemy.lu>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The opencode part is gone in 1f83fee0 ("drm/i915: clear up wedged transitions")
the owner check is still there.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Luis captured the following:
| BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
| in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg
| 2 locks held by Xorg/517:
| #0:
| (
| &dev->vbl_lock
| ){......}
| , at:
| [<ffffffffa0024c60>] drm_vblank_get+0x30/0x2b0 [drm]
| #1:
| (
| &dev->vblank_time_lock
| ){......}
| , at:
| [<ffffffffa0024ce1>] drm_vblank_get+0xb1/0x2b0 [drm]
| Preemption disabled at:
| [<ffffffffa008bc95>] i915_get_vblank_timestamp+0x45/0xa0 [i915]
| CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5
| Call Trace:
| [<ffffffff8164b790>] dump_stack+0x19/0x1b
| [<ffffffff8107e62f>] __might_sleep+0xff/0x170
| [<ffffffff81651ac4>] rt_spin_lock+0x24/0x60
| [<ffffffffa0084e67>] i915_read32+0x27/0x170 [i915]
| [<ffffffffa008a591>] i915_pipe_enabled+0x31/0x40 [i915]
| [<ffffffffa008a6be>] i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915]
| [<ffffffffa00245d4>] drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm]
| [<ffffffffa008bc95>] i915_get_vblank_timestamp+0x45/0xa0 [i915]
| [<ffffffffa0024998>] drm_get_last_vbltimestamp+0x48/0x70 [drm]
| [<ffffffffa0024db5>] drm_vblank_get+0x185/0x2b0 [drm]
| [<ffffffffa0025d03>] drm_wait_vblank+0x83/0x5d0 [drm]
| [<ffffffffa00212a2>] drm_ioctl+0x552/0x6a0 [drm]
| [<ffffffff811a0095>] do_vfs_ioctl+0x325/0x5b0
| [<ffffffff811a03a1>] SyS_ioctl+0x81/0xa0
| [<ffffffff8165a342>] tracesys+0xdd/0xe2
After a longer thread it was decided to drop the preempt_disable()/
enable() invocations which were meant for -RT and Mario Kleiner looks
for a replacement.
Cc: stable-rt@vger.kernel.org
Reported-By: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
On RT the trans_pcie->irq_lock lock is converted into a sleeping lock
and can't be used in primary irq handler. The lock is used in mutliple
places which means turning it into a raw lock could increase the
latency of the system.
For now both handlers are moved into the thread.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Clark Williams <clark.williams@gmail.com>
|
|
On !RT interrupt runs with interrupts disabled. On RT it's in a
thread, so no need to disable interrupts at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The lock is taken while reading two registers. On RT the first lock is
taken in hard irq where it might sleep and in the threaded irq.
The threaded irq runs in oneshot mode so the hard irq does not run until
the thread the completes so there is no reason to grab the lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority. This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc. The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads. That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
This uses a timer_list timer from the irq disabled guts of the idle
code. Disable it for now to prevent wreckage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
This patch removes the recursive calls to migrate_disable/enable in
local_bh_disable/enable
the softirq-local-lock.patch introduces local_bh_disable/enable wich
decrements/increments the current->softirq_nestcnt and disable/enables
migration as well. as softirq_nestcnt (include/linux/sched.h conditioned
on CONFIG_PREEMPT_RT_BASE) already is tracking the nesting level of the
recursive calls to local_bh_disable/enable (all in kernel/softirq.c) - no
need to do it twice.
migrate_disable/enable thus can be conditionsed on softirq_nestcnt making
a transition from 0-1 to disable migration and 1-0 to re-enable it.
No change of functional behavior, this does noticably reduce the observed
nesting level of migrate_disable/enable
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
We can't rely on ksoftirqd anymore and we need to check the tasks
which run a particular softirq and if such a task is pi blocked ignore
the other pending bits of that task as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
trivial API cleanup - kernel/softirq.c was mimiking local_lock.
No change of functional behavior
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.
If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Split out the inner handling function, so RT can reuse it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Avoid the percpu softirq_runner pointer magic by using a task flag.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
I discovered this bug when booting 3.4-rt on my powerpc box. It crashed
with the following report:
------------[ cut here ]------------
kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
Modules linked in:
NIP: c0000000004aa03c LR: c0000000004aa01c CTR: c00000000009b2ac
REGS: c00000003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19)
MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000082 XER: 20000000
SOFTE: 0
TASK = c00000003e8fdcd0[11] 'ksoftirqd/1' THREAD: c00000003e8d4000 CPU: 1
GPR00: 0000000000000001 c00000003e8d7bd0 c000000000d6cbb0 0000000000000000
GPR04: c00000003e8fdcd0 0000000000000000 0000000024004082 c000000000011454
GPR08: 0000000000000000 0000000080000001 c00000003e8fdcd1 0000000000000000
GPR12: 0000000024000084 c00000000fff0280 ffffffffffffffff 000000003ffffad8
GPR16: ffffffffffffffff 000000000072c798 0000000000000060 0000000000000000
GPR20: 0000000000642741 000000000072c858 000000003ffffaf0 0000000000000417
GPR24: 000000000072dcd0 c00000003e7ff990 0000000000000000 0000000000000001
GPR28: 0000000000000000 c000000000792340 c000000000ccec78 c000000001182338
NIP [c0000000004aa03c] .wakeup_next_waiter+0x44/0xb8
LR [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8
Call Trace:
[c00000003e8d7bd0] [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable)
[c00000003e8d7c60] [c0000000004a0320] .rt_spin_lock_slowunlock+0x8c/0xe4
[c00000003e8d7ce0] [c0000000004a07cc] .rt_spin_unlock+0x54/0x64
[c00000003e8d7d60] [c0000000000636bc] .__thread_do_softirq+0x130/0x174
[c00000003e8d7df0] [c00000000006379c] .run_ksoftirqd+0x9c/0x1a4
[c00000003e8d7ea0] [c000000000080b68] .kthread+0xa8/0xb4
[c00000003e8d7f90] [c00000000001c2f8] .kernel_thread+0x54/0x70
Instruction dump:
60000000 e86d01c8 38630730 4bff7061 60000000 ebbf0008 7c7c1b78 e81d0040
7fe00278 7c000074 7800d182 68000001 <0b000000> e88d01c8 387d0010 38840738
The rtmutex_common.h:75 is:
rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rt_mutex_waiter *w;
w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter,
list_entry);
BUG_ON(w->lock != lock);
return w;
}
Where the waiter->lock is corrupted. I saw various other random bugs
that all had to with the softirq lock and plist. As plist needs to be
initialized before it is used I investigated how this lock is
initialized. It's initialized with:
void __init softirq_early_init(void)
{
local_irq_lock_init(local_softirq_lock);
}
Where:
#define local_irq_lock_init(lvar) \
do { \
int __cpu; \
for_each_possible_cpu(__cpu) \
spin_lock_init(&per_cpu(lvar, __cpu).lock); \
} while (0)
As the softirq lock is a local_irq_lock, which is a per_cpu lock, the
initialization is done to all per_cpu versions of the lock. But lets
look at where the softirq_early_init() is called from.
In init/main.c: start_kernel()
/*
* Interrupts are still disabled. Do necessary setups, then
* enable them
*/
softirq_early_init();
tick_init();
boot_cpu_init();
page_address_init();
printk(KERN_NOTICE "%s", linux_banner);
setup_arch(&command_line);
mm_init_owner(&init_mm, &init_task);
mm_init_cpumask(&init_mm);
setup_command_line(command_line);
setup_nr_cpu_ids();
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
One of the first things that is called is the initialization of the
softirq lock. But if you look further down, we see the per_cpu areas
have not been set up yet. Thus initializing a local_irq_lock() before
the per_cpu section is set up, may not work as it is initializing the
per cpu locks before the per cpu exists.
By moving the softirq_early_init() right after setup_per_cpu_areas(),
the kernel boots fine.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <clark@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Carsten Emde <cbe@osadl.org>
Cc: vomlehn@texas.net
Link: http://lkml.kernel.org/r/1349362924.6755.18.camel@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Otherwise we get a deadlock like below:
[ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003
[ 1044.042752] INFO: lockdep is turned off.
[ 1044.042754] Modules linked in:
[ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29
[ 1044.042759] Call Trace:
[ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80
[ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70
[ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0
[ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0
[ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0
[ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40
[ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40
[ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50
[ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40
[ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10
[ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0
[ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90
[ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200
[ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70
[ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120
[ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40
[ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80
[ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70
[ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70
[ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200
[ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20
[ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0
[ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200
[ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210
[ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200
[ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0
[ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10
[ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110
[ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70
[ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe
[ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0
[ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
rwlocks and rwsems on RT do not allow multiple readers. Annotate the
lockdep acquire functions accordingly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The crypto notifier deadlocks on RT. Though this can be a real deadlock
on mainline as well due to fifo fair rwsems.
The involved parties here are:
[ 82.172678] swapper/0 S 0000000000000001 0 1 0 0x00000000
[ 82.172682] ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238
[ 82.172685] 0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8
[ 82.172688] 0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0
[ 82.172689] Call Trace:
[ 82.172697] [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a
[ 82.172701] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.172704] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
[ 82.172708] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
[ 82.172713] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
[ 82.172716] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
[ 82.172719] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
[ 82.172722] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
[ 82.172726] [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b
[ 82.172728] [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a
[ 82.172730] [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72
[ 82.172734] [<ffffffff81ad7686>] ? aes_init+0x12/0x12
[ 82.172737] [<ffffffff81ad76ea>] aesni_init+0x64/0x66
[ 82.172741] [<ffffffff81000318>] do_one_initcall+0x7f/0x13b
[ 82.172744] [<ffffffff81ac4d34>] kernel_init+0x199/0x22c
[ 82.172747] [<ffffffff81ac44ef>] ? loglevel+0x31/0x31
[ 82.172752] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.172755] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.172759] [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca
[ 82.172761] [<ffffffff814987c0>] ? gs_change+0x13/0x13
[ 82.174186] cryptomgr_test S 0000000000000001 0 41 2 0x00000000
[ 82.174189] ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292
[ 82.174192] 0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8
[ 82.174195] 0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0
[ 82.174195] Call Trace:
[ 82.174198] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.174201] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0
[ 82.174204] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c
[ 82.174206] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141
[ 82.174209] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f
[ 82.174212] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182
[ 82.174215] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e
[ 82.174218] [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385
[ 82.174221] [<ffffffff814943de>] notifier_call_chain+0x6b/0x98
[ 82.174224] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
[ 82.174227] [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d
[ 82.174230] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
[ 82.174234] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
[ 82.174236] [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74
[ 82.174238] [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f
[ 82.174241] [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5
[ 82.174243] [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10
[ 82.174246] [<ffffffff8103085d>] ablk_init_common+0x1d/0x38
[ 82.174249] [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17
[ 82.174251] [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114
[ 82.174254] [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4
[ 82.174256] [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5
[ 82.174258] [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b
[ 82.174261] [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa
[ 82.174263] [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7
[ 82.174267] [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac
[ 82.174269] [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47
[ 82.174272] [<ffffffff81061161>] kthread+0x7e/0x86
[ 82.174275] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
[ 82.174278] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.174281] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.174284] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
[ 82.174287] [<ffffffff814987c0>] ? gs_change+0x13/0x13
[ 82.174329] cryptomgr_probe D 0000000000000002 0 47 2 0x00000000
[ 82.174332] ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006
[ 82.174335] 0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8
[ 82.174338] 0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0
[ 82.174338] Call Trace:
[ 82.174342] [<ffffffff8148fd3f>] schedule+0x64/0x66
[ 82.174344] [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe
[ 82.174347] [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159
[ 82.174351] [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f
[ 82.174353] [<ffffffff81490372>] rt_mutex_lock+0x33/0x37
[ 82.174356] [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a
[ 82.174358] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12
[ 82.174360] [<ffffffff8108a11c>] rt_down_read+0x10/0x12
[ 82.174363] [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d
[ 82.174366] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16
[ 82.174369] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50
[ 82.174372] [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b
[ 82.174374] [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0
[ 82.174377] [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6
[ 82.174379] [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63
[ 82.174382] [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac
[ 82.174385] [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b
[ 82.174388] [<ffffffff81061161>] kthread+0x7e/0x86
[ 82.174391] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa
[ 82.174394] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10
[ 82.174398] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13
[ 82.174401] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c
[ 82.174403] [<ffffffff814987c0>] ? gs_change+0x13/0x13
cryptomgr_test spawns the cryptomgr_probe thread from the notifier
call. The probe thread fires the same notifier as the test thread and
deadlocks on the rwsem on RT.
Now this is a potential deadlock in mainline as well, because we have
fifo fair rwsems. If another thread blocks with a down_write() on the
notifier chain before the probe thread issues the down_read() it will
block the probe thread and the whole party is dead locked.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|