Age | Commit message (Collapse) | Author |
|
Setting up IRQ routes is nothing IOAPIC specific. Extract everything
that really is generic code into irqchip.c and only leave the ioapic
specific bits to irq_comm.c.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Conflicts:
virt/kvm/irq_comm.c
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
|
|
The current irq_comm.c file contains pieces of code that are generic
across different irqchip implementations, as well as code that is
fully IOAPIC specific.
Split the generic bits out into irqchip.c.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Conflicts:
virt/kvm/irq_comm.c
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
|
|
The prototype has been stale for a while, I can't spot any real function
define behind it. Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Conflicts:
include/linux/kvm_host.h
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
|
|
We have a capability enquire system that allows user space to ask kvm
whether a feature is available.
The point behind this system is that we can have different kernel
configurations with different capabilities and user space can adjust
accordingly.
Because features can always be non existent, we can drop any #ifdefs
on CAP defines that could be used generically, like the irq routing
bits. These can be easily reused for non-IOAPIC systems as well.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Quite a bit of code in KVM has been conditionalized on availability of
IOAPIC emulation. However, most of it is generically applicable to
platforms that don't have an IOPIC, but a different type of irq chip.
Make code that only relies on IRQ routing, not an APIC itself, on
CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The concept of routing interrupt lines to an irqchip is nothing
that is IOAPIC specific. Every irqchip has a maximum number of pins
that can be linked to irq lines.
So let's add a new define that allows us to reuse generic code for
non-IOAPIC platforms.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
|
|
There are important bug fixes in the upstream patches
that got applied. This revert commit removes the old patches to
prepare for the new set.
------------------------------------------------------------------
Revert "KVM: PPC: MPIC: Restrict to e500 platforms"
This reverts commit 540983d28a0dbe60b8bd08d96448b6544ad60293.
Revert "KVM: PPC: MPIC: Add support for KVM_IRQ_LINE"
This reverts commit ce5692ad437dfa0e7eb35918c40cdbd119b50909.
Revert "KVM: PPC: Support irq routing and irqfd for in-kernel MPIC"
This reverts commit 4ad8621c44d1420090241eaa8d5594e72ae6e05f.
Revert "KVM: Move irqfd resample cap handling to generic code"
This reverts commit e5557be2f787cff8f4daab4b39d38b5822f41390.
Revert "KVM: Move irq routing setup to irqchip.c"
This reverts commit 4486cf9a7a4d823c43b54e1b6280f41bcc59022d.
Revert "KVM: Extract generic irqchip logic into irqchip.c"
This reverts commit 0028971f3b4251cc231989dd9570f426d8472a2b.
Revert "KVM: Move irq routing to generic code"
This reverts commit e144029a9451b391afcbe123d1895523654e2bc5.
Revert "KVM: Remove kvm_get_intr_delivery_bitmask"
This reverts commit a7e10a68a247bbcde502947cf2fd4a7722c30512.
Revert "KVM: Drop __KVM_HAVE_IOAPIC condition on irq routing"
This reverts commit 38deef57a0eef339636c8fb8b8207a5a24e241ad.
Revert "KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTING"
This reverts commit 3b196a30f460bf0f1bd6737266ed479a597a5b58.
Revert "KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINS"
This reverts commit 40602a78a8a12d17ca895ea0a2721441c155801f.
Revert "kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC"
This reverts commit 36e75cd4b65376dbee78ad92b084f9428868fe29.
Revert "kvm/ppc/mpic: in-kernel MPIC emulation"
This reverts commit 13ded2807a22aceff940ca1e282897e36fb0ba47.
Revert "kvm/ppc/mpic: adapt to kernel style and environment"
This reverts commit ac811c14ba3229ecbd39f3dc0b7f4c43ded36308.
Revert "kvm/ppc/mpic: remove some obviously unneeded code"
This reverts commit a60b865ac8d5cf3bff1e4a7dd43b6d8aaed87391.
Revert "kvm/ppc/mpic: import hw/openpic.c from QEMU"
This reverts commit 256cdf3f6df0561883ce801fd29595ecd031209a.
Revert "kvm: add device control API"
This reverts commit 8c848b9ed8b15aaccfb54511b22b205afc14f2d6.
------------------------------------------------------------------
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
|
|
Incoming packets are randomly corrucpted by h/w resulting
in varying errors. This workaround makes FS as default mode
in all affected socs by
- Disabling HS chirp signalling
- Forcing EPS field of all packets to FS
This errata does not affect FS mode.
Forces all HS devices to connect in FS mode for all socs
affected by this erratum:
P3041 and P2041 rev 1.0 and 1.1
P5020 and P5010 rev 1.0 and 2.0
P5040 and P1010 rev 1.0
Workaround can be disabled by mentioning "no_erratum_a005275"
in hwconfig string (in u-boot command line)
Signed-off-by: Ramneek Mehresh <ramneek.mehresh@freescale.com>
Change-Id: Ie7b75b033220e4be44b5c769d7c187928d84dd6d
Reviewed-on: http://git.am.freescale.net:8181/1435
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
Conflicts:
include/linux/preempt.h
|
|
Updated the API for the Traffic Manager counter according to the
implementation from QMan.
Signed-off-by: Aurelian Zanoschi <Aurelian.Zanoschi@freescale.com>
Change-Id: I3e0985d4dc402ba59754cec762fdd3a9210e938a
Reviewed-on: http://git.am.freescale.net:8181/2242
Reviewed-by: Floarea Anca Jeanina-B12569 <anca.floarea@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
Added support for non consistent storage profile reassembly counter
for FMANv3 capable platforms. For non FMANv3 platforms the driver will
accept the stat selection, but will always return 0
Signed-off-by: Aurelian Zanoschi <Aurelian.Zanoschi@freescale.com>
Change-Id: I27501de84499c1db5085510eb7c320709786617e
Reviewed-on: http://git.am.freescale.net:8181/2241
Reviewed-by: Floarea Anca Jeanina-B12569 <anca.floarea@freescale.com>
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
The purpose of the DPA Stats module is to provide to the application
a unitary method for retrieving counters that are spread at different
hardware or software locations.
Signed-off-by: Anca Jeanina FLOAREA <anca.floarea@freescale.com>
Change-Id: I3b4d886ef5aab00f6de6a330e068b7401bc24b6c
Reviewed-on: http://git.am.freescale.net:8181/2237
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
The DPA IPSec component exports a set of functions used to:
- initialize the DPA IPSec module internal data structures
- create and configure full inbound IPSec hardware accelerated paths
- create and configure full outbound IPSec hardware accelerated paths
- replace expired SAs (after rekeying) without packet loss
During the initialization phase the DPA IPSec implementation
performs a series of actions meant to remove the need to perform
memory allocations and hardware/software object initializations
during the runtime phases.
INBOUND PATH:
_____________ ______________ _______________________________ __________
|| SA lookup ||--> || Decryption || --> || Inbound Policy Verification || --> || App Rx ||
||___________|| ||____________|| ||_____________________________|| ||________||
The inbound processing of an encrypted packet begins by determining the SA
that will be used for decryption and authentication. In accordance to the
RFC the packets are classified based on a 3-tuple that uniquely identifies
the SA. This 3-tuple is formed from:
- the destination IP address in the IP header of the encrypted packet
- the value of the IP protocol field in the IP header of the encrypted packet
- the value in the SPI field in the ESP header
A special case is that were the encrypted packets are encapsulated in an
UDP header in order to support NAT traversal. In this case the
classification key should contain the following fields:
- the destination IP address in the IP header of the encrypted packet
- the IP protocol field in the IP header of the encrypted packet
- the SPI field in the ESP header
- the source UDP port in the UDP header
- the destination UDP port in the UDP header
This lookup is offloaded to FMAN by means of classifier API and FMAN API.
When an encrypted packet matches an offloaded key, it is directed by the
hardware into the decryption process by enquing the packet to a SEC frame
queue (FQ). A shared descriptor (representing the decryption SA) is set on
this FQ and the SEC will begin the decryption process and then place the
clear text packet to a FQ that is input for an offline port (OH). Processing
continues with inbound policy verification done on OH using the FMAN hardware.
After this step the packet is enqueued to a FQ created by the application
which benefits of IPSec security
OUTBOUND PATH:
__________ _________________ ______________ __________________
|| App Tx || --> || Policy Lookup || --> || Encryption || --> || Error Checking ||
||________|| ||_______________|| ||____________|| ||________________||
The primary function of the policy lookup block is to classify frames and
determine the correct SA on which they should be processed.
The DPA IPSec can be configured to build a policy key using any subset of the following fields:
- masked source IP address
- masked destination IP address
- optionally masked IP protocol field value
- masked source port value / ICMP type field value
- masked destination port value / ICMP code field value
IP fragmentation can be configured per policy and is performed, if required,
on the packets before being sent to the Encryption Block. A fragmentation header
manipulation identifier has to be passed when offloading the policy. If a
clear text packet hits an offloaded policy the packet will be directed by FMAN
hardware into the proper FQ for SEC processing.
After the SEC has completed all the required operations, a new frame is created
containing the ESP encapsulated packet. This frame will be sent to
the next block for further processing i.e input to offline port where error checking
is done prior to forwarding the packet to an application desired FQ based on the
SA that processed that packet.
Signed-off-by: Andrei Varvara <andrei.varvara@freescale.com>
Signed-off-by: Mihai Serb
Change-Id: Id8a4afa1cfda42dd2ba1408614a5900cb7b80cee
Reviewed-on: http://git.am.freescale.net:8181/2235
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
The packet classification offloading driver implements its functionalities
using the NetCommSw FMan driver. It exposes an easy-to-use API which can be
called either from kernel space or from user space (through the
classification offloading driver wrapper).
It is able to create or import (from FMD resources) 3 types of tables -
exact match table, indexed table and HASH table. Imported tables can be
either empty or prefilled.
It offers an API to insert, remove, or modify entries. It allows the users
to classify and
- enqueue
- multicast
- discard
- re-classify or
- return-to-KeyGen
network packets.
It is able to create or import (from FMD resources) header manipulation
operations and attach them to table entries. It allows runtime modification
of existing (created or imported) header manipulation operations.
It offers an API to create or import (from FMD resources) multicast groups.
It allows the user to add or remove members to existing multicast groups.
Signed-off-by: Marian Chereji <marian.chereji@freescale.com>
Signed-off-by: Radu Bulie <radu.bulie@freescale.com>
Change-Id: I854c0c3c2eba6d6f441cb46e502e6dbc623c48d5
Reviewed-on: http://git.am.freescale.net:8181/2233
Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
Tested-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com>
|
|
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.
If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: stable-rt@vger.kernel.org
|
|
wait_queue is a swiss army knife and in most of the cases the
complexity is not needed. For RT waitqueues are a constant source of
trouble as we can't convert the head lock to a raw spinlock due to
fancy and long lasting callbacks.
Provide a slim version, which allows RT to replace wait queues. This
should go mainline as well, as it lowers memory consumption and
runtime overhead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex.
Since I don't see a reason why this can't be a mutex, make it one. We
probably don't have that much reads at the same time in the hot path.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
|
|
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The netfilter code relies only on the implicit semantics of
local_bh_disable() for serializing wt_write_recseq sections. RT breaks
that and needs explicit serialization here.
Reported-by: Peter LaDow <petela@gocougs.wsu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Make SLUB RT aware and remove the restriction in Kconfig.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The 3.x RT series removed the split softirq implementation in favour
of pushing softirq processing into the context of the thread which
raised it. Though this prevents us from handling the various softirqs
at different priorities. Now instead of reintroducing the split
softirq threads we split the locks which serialize the softirq
processing.
If a softirq is raised in context of a thread, then the softirq is
noted on a per thread field, if the thread is in a bh disabled
region. If the softirq is raised from hard interrupt context, then the
bit is set in the flag field of ksoftirqd and ksoftirqd is invoked.
When a thread leaves a bh disabled region, then it tries to execute
the softirqs which have been raised in its own context. It acquires
the per softirq / per cpu lock for the softirq and then checks,
whether the softirq is still pending in the per cpu
local_softirq_pending() field. If yes, it runs the softirq. If no,
then some other task executed it already. This allows for zero config
softirq elevation in the context of user space tasks or interrupt
threads.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Avoid the percpu softirq_runner pointer magic by using a task flag.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
might sleep can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The plain spinlock while sufficient does not update the local_lock
internals. Use a proper local_lock function instead to ease debugging.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Bringing a CPU down is a pain with the PREEMPT_RT kernel because
tasks can be preempted in many more places than in non-RT. In
order to handle per_cpu variables, tasks may be pinned to a CPU
for a while, and even sleep. But these tasks need to be off the CPU
if that CPU is going down.
Several synchronization methods have been tried, but when stressed
they failed. This is a new approach.
A sync_tsk thread is still created and tasks may still block on a
lock when the CPU is going down, but how that works is a bit different.
When cpu_down() starts, it will create the sync_tsk and wait on it
to inform that current tasks that are pinned on the CPU are no longer
pinned. But new tasks that are about to be pinned will still be allowed
to do so at this time.
Then the notifiers are called. Several notifiers will bring down tasks
that will enter these locations. Some of these tasks will take locks
of other tasks that are on the CPU. If we don't let those other tasks
continue, but make them block until CPU down is done, the tasks that
the notifiers are waiting on will never complete as they are waiting
for the locks held by the tasks that are blocked.
Thus we still let the task pin the CPU until the notifiers are done.
After the notifiers run, we then make new tasks entering the pinned
CPU sections grab a mutex and wait. This mutex is now a per CPU mutex
in the hotplug_pcp descriptor.
To help things along, a new function in the scheduler code is created
called migrate_me(). This function will try to migrate the current task
off the CPU this is going down if possible. When the sync_tsk is created,
all tasks will then try to migrate off the CPU going down. There are
several cases that this wont work, but it helps in most cases.
After the notifiers are called and if a task can't migrate off but enters
the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex
until the CPU down is complete. Then the scheduler will force the migration
anyway.
Also, I found that THREAD_BOUND need to also be accounted for in the
pinned CPU, and the migrate_disable no longer treats them special.
This helps fix issues with ksoftirqd and workqueue that unbind on CPU down.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thaanks to Al Viro for distangling some VFS code to make that
possible.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
|
|
On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
> - KGDB (not yet disabled) is reportedly unusable on -rt right now due
> to missing hacks in the console locking which I dropped on purpose.
>
To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.
Comments are welcome of course. Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.
Thanks,
Jason.
|
|
There are (probably rare) situations when a system crashed and the system
console becomes unresponsive but the network icmp layer still is alive.
Wouldn't it be wonderful, if we then could submit a sysreq command via ping?
This patch provides this facility. Please consult the updated documentation
Documentation/sysrq.txt for details.
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks. This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context. A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs(). The underlying
function __do_softirq_common() does the actual work.
The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section. This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section. Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.
This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing. This issue will be fixed by a later commit.
The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.
[ paulmck: Added a useful changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This fixes the following build error for the preempt-rt kernel.
make kernel/fork.o
CC kernel/fork.o
kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration
make[2]: *** [kernel/fork.o] Error 1
make[1]: *** [kernel/fork.o] Error 2
The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default.
The non-rt kernels explicitly cache align only the tasklist_lock in
kernel/fork.c
That can create a build conflict. This fixes the build problem by making the
non-rt kernels cache align RWLOCKs by default. The side effect is that
the other RWLOCKs are also cache aligned for non-rt.
This is a short term solution for rt only.
The longer term solution would be to push the cache aligned DEFINE_RWLOCK
to mainline. If there are objections, then we could create a
DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature.
Comments? Objections?
Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
based locking functions for preempt-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|