Age | Commit message (Collapse) | Author |
|
BSC9132 on-board FPGA connected on I2C bus, so add regmap support
i2c access control the QIXIS.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Change-Id: Iecaf76384cafd9d85b13f8f95055f460a9e48f24
Reviewed-on: http://git.am.freescale.net:8181/34989
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
QIXIS System Logic FPGA support to manage system power. So we
through QIXIS to power off freescale SOC.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Change-Id: I7478cca11dcd23c8d9580c3f52c46946375573c4
Reviewed-on: http://git.am.freescale.net:8181/33082
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
Since both ppc and ppc64 have LE variants which are now reported by uname, add
that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
variant.
Without this, perf trace and auditctl fail.
Mainline kernel reports ppc64le (per a058801) but there is no matching
AUDIT_ARCH_PPC64LE.
Since 32-bit PPC LE is not supported by audit, don't advertise it in
AUDIT_ARCH_PPC* variants.
See:
https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 63f13448d81c910a284b096149411a719cbed501)
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Change-Id: I8bee5c00b6d4e0f3a6a3d322b21c2f103bd9ce00
Reviewed-on: http://git.am.freescale.net:8181/33027
Reviewed-by: Scott Wood <scottwood@freescale.com>
Tested-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
For all arches which support audit implement syscall_get_arch()
They are all pretty easy and straight forward, stolen from how the call
to audit_syscall_entry() determines the arch.
Based-on-patch-by: Richard Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-ia64@vger.kernel.org
Cc: microblaze-uclinux@itee.uq.edu.au
Cc: linux-mips@linux-mips.org
Cc: linux@lists.openrisc.net
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
(cherry picked from commit ce5d112827e5c2e9864323d0efd7ec2a62c6dce0)
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Change-Id: I772d52f630cca58c583a8f9b42f396ffecacdd1e
Conflicts:
arch/mips/include/asm/syscall.h
Change-Id: I261719173454c5157a96eaf06c1deb9b2e3835d6
Reviewed-on: http://git.am.freescale.net:8181/33086
Reviewed-by: Scott Wood <scottwood@freescale.com>
Tested-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled. Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.
To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.
For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters. Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.
This will be a slight slowdown on some arches. The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.
This is a prerequisite for making two-phase seccomp work cleanly.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
(backported from commit a4412fc9486ec85686c6c7929e7e829f62ae377e)
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Change-Id: I4109ed2560d19349927c3e3f7648022ae23db318
Reviewed-on: http://git.am.freescale.net:8181/33029
Reviewed-by: Scott Wood <scottwood@freescale.com>
Tested-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args. So just get rid of both of
those things. Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
(backported from commit 5e937a9ae9137899c6641d718bd3820861099a09)
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Change-Id: Ifeefd84eeaa99445fdfc49ef782b01957dd67c00
Reviewed-on: http://git.am.freescale.net:8181/33023
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
Technical Details : Add support for the following BMI counters
and make them available to the DPA stats interface in the
User Space:
e_FM_PORT_COUNTERS_DISCARD_FRAME,
/* BMI stat counter */
e_FM_PORT_COUNTERS_RX_BAD_FRAME,
/* BMI Rx stat counter */
e_FM_PORT_COUNTERS_RX_LARGE_FRAME,
/* BMI Rx stat counter */
e_FM_PORT_COUNTERS_RX_LIST_DMA_ERR,
/* BMI Rx OP stat counter */
e_FM_PORT_COUNTERS_RX_OUT_OF_BUFFERS_DISCARD,
/* BMI Rx OP stat counter */
e_FM_PORT_COUNTERS_WRED_DISCARD,
/* BMI OP stat counter */
@Function FM_PORT_GetBmiCounters
@Description Read port's BMI stat counters and place them into
a designated structure of counters.
@Param[in] h_FmPort A handle to a FM Port module.
@Param[out] p_BmiStats counters structure
Change-Id: I464b5defc29e149252002c911b22e69343e61adf
Signed-off-by: Mandy Lavi <mandy.lavi@freescale.com>
Reviewed-on: http://git.am.freescale.net:8181/32755
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Igal Liberman <Igal.Liberman@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
This reverts commit 1aa49383a4e16ce0c98c73cf81e1a9b2938e68fc.
Change-Id: I67ab8f420b1d0777b8229dd91cba56b1bdc88d27
Reviewed-on: http://git.am.freescale.net:8181/33431
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
Technical Details : Add support for the following BMI counters
and make them available to the DPA stats interface in the
User Space:
e_FM_PORT_COUNTERS_DISCARD_FRAME,
/* BMI stat counter */
e_FM_PORT_COUNTERS_RX_BAD_FRAME,
/* BMI Rx stat counter */
e_FM_PORT_COUNTERS_RX_LARGE_FRAME,
/* BMI Rx stat counter */
e_FM_PORT_COUNTERS_RX_LIST_DMA_ERR,
/* BMI Rx OP stat counter */
e_FM_PORT_COUNTERS_RX_OUT_OF_BUFFERS_DISCARD,
/* BMI Rx OP stat counter */
e_FM_PORT_COUNTERS_WRED_DISCARD,
/* BMI OP stat counter */
@Function FM_PORT_GetBmiCounters
@Description Read port's BMI stat counters and place them into
a designated structure of counters.
@Param[in] h_FmPort A handle to a FM Port module.
@Param[out] p_BmiStats counters structure
Change-Id: I464b5defc29e149252002c911b22e69343e61adf
Signed-off-by: Mandy Lavi <mandy.lavi@freescale.com>
Reviewed-on: http://git.am.freescale.net:8181/32755
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Igal Liberman <Igal.Liberman@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
|
|
broadcast on PowerPC
The hrtimer broad support was back-ported from upstream for LS1, the Power
platform do not implement this yet. The tick_setup_hrtimer_broadcast
function was not defined on Power platform.
When build for PowerPC with this backport support, such error occurs:
In file included from include/linux/tick.h:9:0,
from arch/powerpc/kernel/idle.c:27:
include/linux/clockchips.h:193:13: error: 'tick_setup_hrtimer_broadcast'
defined but not used [-Werror=unused-function]
As hrtimer based broadcast was only backported for ARM, this patch try to
provide a workaround to fix the build error on Power platform.
In linux-next, PowerPC will support hrtimer based broadcast too. There will
be no build error and no workaround needed.
Signed-off-by: Alison Wang <alison.wang@freescale.com>
Change-Id: If0f61062d6aa3d91ff81d074f20765a5880a5168
Reviewed-on: http://git.am.freescale.net:8181/33071
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Zhengxiong Jin <Jason.Jin@freescale.com>
|
|
In order to do so it was also required to introduce
a new if type for SGMII2.5G
Change-Id: Iddbe223d8b716c3ed348c7a8a53bee0c037f04f4
Signed-off-by: Mandy Lavi <mandy.lavi@freescale.com>
Reviewed-on: http://git.am.freescale.net:8181/23474
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
The key statistics for match tables are not accessible from user space
using fmlib. This update implements the support for the function
FM_PCD_MatchTableGetKeyStatistics to be accessible from user space.
Signed-off-by: Marian Chereji <marian.chereji@freescale.com>
Change-Id: Ibcf40fdcf7a60afc65b2f926c2a1474513ae8950
Reviewed-on: http://git.am.freescale.net:8181/25376
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Mandy Lavi <Mandy.Lavi@freescale.com>
Reviewed-by: Honghua Yin <Hong-Hua.Yin@freescale.com>
|
|
Add qman_delete_cgr_safe() that can be called from any CPU.
This in turn schedules qman_delete_cgr() on the proper CPU.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Change-Id: I762e83108533a4e537a534e90073df26a6b7b09c
Reviewed-on: http://git.am.freescale.net:8181/28532
Tested-by: Review Code-CDREVIEW <CDREVIEW@freescale.com>
Reviewed-by: Marian Cristian Rotariu <marian.rotariu@freescale.com>
|
|
Introduce managed counterparts for alloc_percpu() and free_percpu().
Add devm_alloc_percpu() and devm_free_percpu() into the managed
interfaces list.
Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Change-Id: I93546348e7b0e1974fda8b6c7a3b3710ce45b724
Reviewed-on: http://git.am.freescale.net:8181/24140
Reviewed-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Conflicts:
Documentation/driver-model/devres.txt
drivers/base/devres.c
|
|
Signed-off-by: Scott Wood <scottwood@freescale.com>
Conflicts:
arch/arm/kvm/mmu.c
arch/arm/mm/proc-v7-3level.S
arch/powerpc/kernel/vdso32/getcpu.S
drivers/crypto/caam/error.c
drivers/crypto/caam/sg_sw_sec4.h
drivers/usb/host/ehci-fsl.c
|
|
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The function "swait_head_has_waiters()" was internalized into
wait-simple.c but it parallels the waitqueue_active of normal
waitqueue support. Given that there are over 150 waitqueue_active
users in drivers/ fs/ kernel/ and the like, lets make it globally
visible, and rename it to parallel the waitqueue_active accordingly.
We'll need to do this if we expect to expand its usage beyond RT.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
wait_queue is a swiss army knife and in most of the cases the
complexity is not needed. For RT waitqueues are a constant source of
trouble as we can't convert the head lock to a raw spinlock due to
fancy and long lasting callbacks.
Provide a slim version, which allows RT to replace wait queues. This
should go mainline as well, as it lowers memory consumption and
runtime overhead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
smp_mb() added by Steven Rostedt to fix a race condition with swait
wakeups vs adding items to the list.
|
|
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.
If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Avoid the percpu softirq_runner pointer magic by using a task flag.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
The netfilter code relies only on the implicit semantics of
local_bh_disable() for serializing wt_write_recseq sections. RT breaks
that and needs explicit serialization here.
Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Bringing a CPU down is a pain with the PREEMPT_RT kernel because
tasks can be preempted in many more places than in non-RT. In
order to handle per_cpu variables, tasks may be pinned to a CPU
for a while, and even sleep. But these tasks need to be off the CPU
if that CPU is going down.
Several synchronization methods have been tried, but when stressed
they failed. This is a new approach.
A sync_tsk thread is still created and tasks may still block on a
lock when the CPU is going down, but how that works is a bit different.
When cpu_down() starts, it will create the sync_tsk and wait on it
to inform that current tasks that are pinned on the CPU are no longer
pinned. But new tasks that are about to be pinned will still be allowed
to do so at this time.
Then the notifiers are called. Several notifiers will bring down tasks
that will enter these locations. Some of these tasks will take locks
of other tasks that are on the CPU. If we don't let those other tasks
continue, but make them block until CPU down is done, the tasks that
the notifiers are waiting on will never complete as they are waiting
for the locks held by the tasks that are blocked.
Thus we still let the task pin the CPU until the notifiers are done.
After the notifiers run, we then make new tasks entering the pinned
CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
in the hotplug_pcp descriptor.
To help things along, a new function in the scheduler code is created
called migrate_me(). This function will try to migrate the current task
off the CPU this is going down if possible. When the sync_tsk is created,
all tasks will then try to migrate off the CPU going down. There are
several cases that this wont work, but it helps in most cases.
After the notifiers are called and if a task can't migrate off but enters
the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
until the CPU down is complete. Then the scheduler will force the migration
anyway.
Also, I found that THREAD_BOUND need to also be accounted for in the
pinned CPU, and the migrate_disable no longer treats them special.
This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
since c2f21ce ("locking: Implement new raw_spinlock")
include/linux/spinlock.h includes spin_unlock_wait() to wait for a concurren
holder of a lock. this patch just moves over to that API. spin_unlock_wait
covers both raw_spinlock_t and spinlock_t so it should be safe here as well.
the added rt-variant of read_seqbegin in include/linux/seqlock.h that is being
modified, was introduced by patch:
seqlock-prevent-rt-starvation.patch
behavior should be unchanged.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thaanks to Al Viro for distangling some VFS code to make that
possible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
We hit the following bug with 3.6-rt:
[ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002
[ 5.898991] no locks held by swapper/3/0.
[ 5.898993] Modules linked in:
[ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.898997] Call Trace:
[ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002
[ 5.899037] no locks held by swapper/7/0.
[ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899040] Modules linked in:
[ 5.899041]
[ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1
[ 5.899047] Call Trace:
[ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90
[ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0
[ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200
[ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0
[ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70
[ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51
[ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0
[ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e
[ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90
[ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30
[ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40
[ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20
[ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80
[ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50
[ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23
[ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ?
As the acpi code disables interrupts in acpi_idle_enter_bm, and calls
code that grabs the acpi lock, it causes issues as the lock is currently
in RT a sleeping lock.
The lock was converted from a raw to a sleeping lock due to some
previous issues, and tests that showed it didn't seem to matter.
Unfortunately, it did matter for one of our boxes.
This patch converts the lock back to a raw lock. I've run this code on a
few of my own machines, one being my laptop that uses the acpi quite
extensively. I've been able to suspend and resume without issues.
[ tglx: Made the change exclusive for acpi_gbl_hardware_lock ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: John Kacur <jkacur@gmail.com>
Cc: Clark Williams <clark@redhat.com>
Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
|
|
On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
> to missing hacks in the console locking which I dropped on purpose.
>
To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.
Comments are welcome of course. Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.
Thanks,
Jason.
|
|
There are (probably rare) situations when a system crashed and the system
console becomes unresponsive but the network icmp layer still is alive.
Wouldn't it be wonderful, if we then could submit a sysreq command via ping?
This patch provides this facility. Please consult the updated documentation
Documentation/sysrq.txt for details.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
|
|
irq_work is processed in softirq context on -RT because we want to avoid
long latencies which might arise from processing lots of perf events.
The noHZ-full mode requires its callback to be called from real hardirq
context (commit 76c24fb ("nohz: New APIs to re-evaluate the tick on full
dynticks CPUs")). If it is called from a thread context we might get
wrong results for checks like "is_idle_task(current)".
This patch introduces a second list (hirq_work_list) which will be used
if irq_work_run() has been invoked from hardirq context and process only
work items marked with IRQ_WORK_HARD_IRQ.
This patch also removes arch_irq_work_raise() from sparc & powerpc like
it is already done for x86. Atleast for powerpc it is somehow
superfluous because it is called from the timer interrupt which should
invoke update_process_times().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
We need to protect the per cpu variable and prevent migration.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
called from softirq context, it may block the ksoftirqd() from running, in
which case, it may never wake up the msleep() causing the deadlock.
I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.
The solution is not to convert the cpu_chill() back to a cpu_relax() as that
will re-create a possible live lock that the cpu_chill() fixed earlier, and may
also leave this bug open on other softirqs. The fix is to remove the
dependency on ksoftirqd from cpu_chill(). That is, instead of calling
msleep() that requires ksoftirqd to wake it up, use the
hrtimer_nanosleep() code that does the wakeup from hard irq context.
|Looks to be the lock of the block softirq. I don't have the core dump
|anymore, but from what I could tell the ksoftirqd was blocked on the
|block softirq lock, where the block softirq handler did a msleep
|(called by the qla2xxx interrupt handler).
|
|Looking at trigger_softirq() in block/blk-softirq.c, it can do a
|smp_callfunction() to another cpu to run the block softirq. If that
|happens to be the cpu where the qla2xx irq handler is doing the block
|softirq and is in a middle of a msleep(), I believe the ksoftirqd will
|try to run the softirq. If it does that, then BOOM, it's deadlocked
|because the ksoftirqd will never run the timer softirq either.
|I should have also stated that it was only one lock that was involved.
|But the lock owner was doing a msleep() that requires a wakeup by
|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
|held by the msleep() caller, then you have your deadlock.
|
|It's best not to have any softirqs going to sleep requiring another
|softirq to wake it up. Note, if we ever require a timer softirq to do a
|cpu_chill() it will most definitely hit this deadlock.
Cc: stable-rt@vger.kernel.org
Found-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bigeasy: add the 4 | chapters from email]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks. This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context. A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
function __do_softirq_common() does the actual work.
The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section. This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section. Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing. This issue will be fixed by a later commit.
The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.
[ paulmck: Added a useful changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Mike Galbraith captered the following:
| >#11 [ffff88017b243e90] _raw_spin_lock at ffffffff815d2596
| >#12 [ffff88017b243e90] rt_mutex_trylock at ffffffff815d15be
| >#13 [ffff88017b243eb0] get_next_timer_interrupt at ffffffff81063b42
| >#14 [ffff88017b243f00] tick_nohz_stop_sched_tick at ffffffff810bd1fd
| >#15 [ffff88017b243f70] tick_nohz_irq_exit at ffffffff810bd7d2
| >#16 [ffff88017b243f90] irq_exit at ffffffff8105b02d
| >#17 [ffff88017b243fb0] reschedule_interrupt at ffffffff815db3dd
| >--- <IRQ stack> ---
| >#18 [ffff88017a2a9bc8] reschedule_interrupt at ffffffff815db3dd
| > [exception RIP: task_blocks_on_rt_mutex+51]
| >#19 [ffff88017a2a9ce0] rt_spin_lock_slowlock at ffffffff815d183c
| >#20 [ffff88017a2a9da0] lock_timer_base.isra.35 at ffffffff81061cbf
| >#21 [ffff88017a2a9dd0] schedule_timeout at ffffffff815cf1ce
| >#22 [ffff88017a2a9e50] rcu_gp_kthread at ffffffff810f9bbb
| >#23 [ffff88017a2a9ed0] kthread at ffffffff810796d5
| >#24 [ffff88017a2a9f50] ret_from_fork at ffffffff815da04c
lock_timer_base() does a try_lock() which deadlocks on the waiter lock
not the lock itself.
This patch takes the waiter_lock with trylock so it should work from interrupt
context as well. If the fastpath doesn't work and the waiter_lock itself is
taken then it seems that the lock itself taken.
This patch also adds "rt_spin_unlock_after_trylock_in_irq" to keep lockdep
happy. If we managed to take the wait_lock in the first place we should also
be able to take it in the unlock path.
Cc: stable-rt@vger.kernel.org
Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Mike,
On Thu, 7 Nov 2013, Mike Galbraith wrote:
> On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote:
> > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote:
>
> > > I bet you are trying to work around some of the side effects of the
> > > occasional tick which is still necessary despite of full nohz, right?
> >
> > Nope, I wanted to check out cost of nohz_full for rt, and found that it
> > doesn't work at all instead, looked, and found that the sole running
> > task has just awakened ksoftirqd when it wants to shut the tick down, so
> > that shutdown never happens.
>
> Like so in virgin 3.10-rt. Box is x3550 M3 booted nowatchdog
> rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via
> cpusets as well.
well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(&base->lock) there).
Though it's not a full solution. It needs some thought versus the
softirq code of timers. Assume we have only one timer queued 1000
ticks into the future. So this change will cause the timer softirq not
to be called until that timer expires and then the timer softirq is
going to do 1000 loops until it catches up with jiffies. That's
anything but pretty ...
What worries me more is this one:
pert-5229 [003] d..h1.. 684.482618: softirq_raise: vec=9 [action=RCU]
The CPU has no callbacks as you shoved them over to cpu 0, so why is
the RCU softirq raised?
Thanks,
tglx
------------------
Message-id: <alpine.DEB.2.02.1311071158350.23353@ionos.tec.linutronix.de>
|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This fixes the following build error for the preempt-rt kernel.
make kernel/fork.o
CC kernel/fork.o
kernel/fork.c:90: error: section of tasklist_lock conflicts with previous declaration
make[2]: *** [kernel/fork.o] Error 1
make[1]: *** [kernel/fork.o] Error 2
The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default.
The non-rt kernels explicitly cache align only the tasklist_lock in
kernel/fork.c
That can create a build conflict. This fixes the build problem by making the
non-rt kernels cache align RWLOCKs by default. The side effect is that
the other RWLOCKs are also cache aligned for non-rt.
This is a short term solution for rt only.
The longer term solution would be to push the cache aligned DEFINE_RWLOCK
to mainline. If there are objections, then we could create a
DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature.
Comments? Objections?
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
With the migration pushdonw a few of the do{ }while(0)
loops became obsolete but got left over - this patch
only removes this fallout.
Patch applies on top of 3.12.9-rt13
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
pushdown of migrate_disable/enable from read_*lock* to the rt_read_*lock*
api level
general mapping to mutexes:
read_*lock*
`-> rt_read_*lock*
`-> __spin_lock (the sleeping spin locks)
`-> rt_mutex
The real read_lock* mapping:
read_lock_irqsave -.
read_lock_irq `-> rt_read_lock_irqsave()
`->read_lock ---------. \
read_lock_bh ------+ \
`--> rt_read_lock()
if (rt_mutex_owner(lock) != current){
`-> __rt_spin_lock()
rt_spin_lock_fastlock()
`->rt_mutex_cmpxchg()
migrate_disable()
}
rwlock->read_depth++;
read_trylock mapping:
read_trylock
`-> rt_read_trylock
if (rt_mutex_owner(lock) != current){
`-> rt_mutex_trylock()
rt_mutex_fasttrylock()
rt_mutex_cmpxchg()
migrate_disable()
}
rwlock->read_depth++;
read_unlock* mapping:
read_unlock_bh --------+
read_unlock_irq -------+
read_unlock_irqrestore +
read_unlock -----------+
`-> rt_read_unlock()
if(--rwlock->read_depth==0){
`-> __rt_spin_unlock()
rt_spin_lock_fastunlock()
`-> rt_mutex_cmpxchg()
migrate_disable()
}
So calls to migrate_disable/enable() are better placed at the rt_read_*
level of lock/trylock/unlock as all of the read_*lock* API has this as a
common path. In the rt_read* API of lock/trylock/unlock the nesting level
is already being recorded in rwlock->read_depth, so we can push down the
migrate disable/enable to that level and condition it on the read_depth
going from 0 to 1 -> migrate_disable and 1 to 0 -> migrate_enable. This
eliminates the recursive calls that were needed when migrate_disable/enable
was done at the read_*lock* level.
The approach to read_*_bh also eliminates the concerns raised with the
regards to api inbalances (read_lock_bh -> read_unlock+local_bh_enable)
Tested-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|