Age | Commit message (Collapse) | Author |
|
When the kernel is running at EL2, it doesn't need init_hyp_mode() to
configure page tables for HYP. This function also registers the CPU
hotplug and lower power notifiers that cause HYP to be re-initialised
after the CPU has been reset.
To avoid losing the register state that controls stage2 translation, move
the registering of these notifiers into init_subsystems(), and add a
is_kernel_in_hyp_mode() path to each callback.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 1e947bad0b6 ("arm64: KVM: Skip HYP setup when already running in HYP")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
Preemption must be disabled when calling smp_call_function_many
Reported-by: bartosz.wawrzyniak@tieto.com
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Here are the main arm64 updates for 4.6. There are some relatively
intrusive changes to support KASLR, the reworking of the kernel
virtual memory layout and initial page table creation.
Summary:
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture
requires break-before-make in such cases to avoid TLB conflicts but
that's not always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked
to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
of the vmalloc space, allowing the kernel to be loaded (nearly)
anywhere in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is
provided by UEFI (efi_get_random_bytes() patches merged via the
arm64 tree, acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
but actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this
allows uaccess functions (get_user etc.) to be implemented using
LDTR/STTR instructions. Such instructions, when run by the kernel,
perform unprivileged accesses adding an extra level of protection.
The set_fs() macro is used to "upgrade" such instruction to
privileged accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the
sigcontext information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Fix misspellings in comments.
arm64: efi: add missing frame pointer assignment
arm64: make mrs_s prefixing implicit in read_cpuid
arm64: enable CONFIG_DEBUG_RODATA by default
arm64: Rework valid_user_regs
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: KVM: Move kvm_call_hyp back to its original localtion
arm64: mm: treat memstart_addr as a signed quantity
arm64: mm: list kernel sections in order
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
arm64: kconfig: add submenu for 8.2 architectural features
arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
arm64: Add support for Half precision floating point
arm64: Remove fixmap include fragility
arm64: Add workaround for Cavium erratum 27456
arm64: mm: Mark .rodata as RO
...
|
|
Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.
ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Make schedstats a runtime tunable (disabled by default) and
optimize it via static keys.
As most distributions enable CONFIG_SCHEDSTATS=y due to its
instrumentation value, this is a nice performance enhancement.
(Mel Gorman)
- Implement 'simple waitqueues' (swait): these are just pure
waitqueues without any of the more complex features of full-blown
waitqueues (callbacks, wake flags, wake keys, etc.). Simple
waitqueues have less memory overhead and are faster.
Use simple waitqueues in the RCU code (in 4 different places) and
for handling KVM vCPU wakeups.
(Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
Marcelo Tosatti)
- sched/numa enhancements (Rik van Riel)
- NOHZ performance enhancements (Rik van Riel)
- Various sched/deadline enhancements (Steven Rostedt)
- Various fixes (Peter Zijlstra)
- ... and a number of other fixes, cleanups and smaller enhancements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
sched/cputime: Fix steal_account_process_tick() to always return jiffies
sched/deadline: Remove dl_new from struct sched_dl_entity
Revert "kbuild: Add option to turn incompatible pointer check into error"
sched/deadline: Remove superfluous call to switched_to_dl()
sched/debug: Fix preempt_disable_ip recording for preempt_disable()
sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
time, acct: Drop irq save & restore from __acct_update_integrals()
acct, time: Change indentation in __acct_update_integrals()
sched, time: Remove non-power-of-two divides from __acct_update_integrals()
sched/rt: Kick RT bandwidth timer immediately on start up
sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
sched/debug: Move sched_domain_sysctl to debug.c
sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
sched/rt: Fix PI handling vs. sched_setscheduler()
sched/core: Remove duplicated sched_group_set_shares() prototype
sched/fair: Consolidate nohz CPU load update code
sched/fair: Avoid using decay_load_missed() with a negative value
sched/deadline: Always calculate end of period on sched_yield()
sched/cgroup: Fix cgroup entity load tracking tear-down
rcu: Use simple wait queues where possible in rcutree
...
|
|
Programming the active state in the (re)distributor can be an
expensive operation so it makes some sense to try and reduce
the number of accesses as much as possible. So far, we
program the active state on each VM entry, but there is some
opportunity to do less.
An obvious solution is to cache the active state in memory,
and only program it in the HW when conditions change. But
because the HW can also change things under our feet (the active
state can transition from 1 to 0 when the guest does an EOI),
some precautions have to be taken, which amount to only caching
an "inactive" state, and always programing it otherwise.
With this in place, we observe a reduction of around 700 cycles
on a 2GHz GICv2 platform for a NULL hypercall.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Doing a linear search is a bit silly when we can do a binary search.
Not that we trap that so many things that it has become a burden yet,
but it makes sense to align it with the arm64 code.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
As we're going to play some tricks on the struct coproc_reg,
make sure its 64bit indicator field matches that of coproc_params.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Since we're obviously terrible at sorting the CP tables, make sure
we're going to do it properly (or fail to boot). arm64 has had the
same mechanism for a while, and nobody ever broke it...
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Not having the invariant table properly sorted is an oddity, and
may get in the way of future optimisations. Let's fix it.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
To configure the virtual PMUv3 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.
After configuring the PMUv3, call the vcpu ioctl with attribute
KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In some cases it needs to get/set attributes specific to a vcpu and so
needs something else than ONE_REG.
Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vcpu file descriptor.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
When KVM frees VCPU, it needs to free the perf_event of PMU.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
When calling perf_event_create_kernel_counter to create perf_event,
assign a overflow handler. Then when the perf event overflows, set the
corresponding bit of guest PMOVSSET register. If this counter is enabled
and its interrupt is enabled as well, kick the vcpu to sync the
interrupt.
On VM entry, if there is counter overflowed and interrupt level is
changed, inject the interrupt with corresponding level. On VM exit, sync
the interrupt level as well if it has been changed.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Using the common HYP timer code is a bit more tricky, since we
use system register names. Nothing a set of macros cannot
work around...
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
No need to keep our own private version, the common one is
strictly identical.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In order to be able to use the code located in virt/kvm/arm/hyp,
we need to make the global hyp.h file accessible from include/asm,
similar to what we did for arm64.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
With the kernel running at EL2, there is no point trying to
configure page tables for HYP, as the kernel is already mapped.
Take this opportunity to refactor the whole init a bit, allowing
the various parts of the hypervisor bringup to be split across
multiple functions.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
So far, our handling of cache maintenance by VA has been pretty
simple: Either the access is in the guest RAM and generates a S2
fault, which results in the page being mapped RW, or we go down
the io_mem_abort() path, and nuke the guest.
The first one is fine, but the second one is extremely weird.
Treating the CM as an I/O is wrong, and nothing in the ARM ARM
indicates that we should generate a fault for something that
cannot end-up in the cache anyway (even if the guest maps it,
it will keep on faulting at stage-2 for emulation).
So let's just skip this instruction, and let the guest get away
with it.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
These are now handled as a panic, so there is little point in
keeping them around.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
This field was never populated, and the panic code already
does something similar. Delete the related code.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Just like on arm64, having the CP15 registers expressed as a set
of #defines has been very conflict-prone. Let's turn it into an
enum, which should make it more manageable.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Now that the old code is long gone, we can remove all the weak
attributes, as there is only one version of the code.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
As we now have hooks to setup VTCR from C code, let's drop the
original VTCR setup and reimplement it as part of the HYP code.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
As we now have a full reimplementation of the world switch, it is
time to kiss the old stuff goodbye. I'm not sure we'll miss it.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Having u64 as the kvm_call_hyp return type is problematic, as
it forces all kind of tricks for the return values from HYP
to be promoted to 64bit (LE has the LSB in r0, and BE has them
in r1).
Since the only user of the return value is perfectly happy with
a 32bit value, let's make kvm_call_hyp return an unsigned long,
which is 32bit on ARM.
This solves yet another headache.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Instead of spinning forever, let's "properly" handle any unexpected
exception ("properly" meaning "print a spat on the console and die").
This has proved useful quite a few times...
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
This part is almost entierely borrowed from the existing code, just
slightly simplifying the HYP function call (as we now save SPSR_hyp
in the world switch).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
On guest exit, we must take care of populating our fault data
structure so that the host code can handle it. This includes
resolving the IPA for permission faults, which can result in
restarting the guest.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The new world switch implementation is modeled after the arm64 one,
calling the various save/restore functions in turn, and having as
little state as possible.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Similar to the arm64 version, add the code that deals with VFP traps,
re-enabling VFP, save/restoring the registers and resuming the guest.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Add the very minimal piece of code that is now required to jump
into the guest (and return from it). This code is only concerned
with save/restoring the USR registers (r0-r12+lr for the guest,
r4-r12+lr for the host), as everything else is dealt with in C
(VFP is another matter though).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Banked registers are one of the many perks of the 32bit architecture,
and the world switch needs to cope with it.
This requires some "special" accessors, as these are not accessed
using a standard coprocessor instruction.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
This is almost a copy/paste of the existing version, with a couple
of subtle differences:
- Only write to FPEXC once on the save path
- Add an isb when enabling VFP access
The patch also defines a few sysreg accessors and a __vfp_enabled
predicate that test the VFP trapping state.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
This patch shouldn't exist, as we should be able to reuse the
arm64 version for free. I'll get there eventually, but in the
meantime I need an interrupt controller.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
This patch shouldn't exist, as we should be able to reuse the
arm64 version for free. I'll get there eventually, but in the
meantime I need a timer ticking.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Concert the CP15 save/restore code to C.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Convert the TLB invalidation code to C, hooking it into the
build system whilst we're at it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In order to move system register (CP15, mostly) access to C code,
add a few macros to facilitate this, and minimize the difference
between 32 and 64bit CP15 registers.
This will get heavily used in the following patches.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In order to expose the various HYP services that are private to
the hypervisor, add a new hyp.h file.
So far, it only contains mundane things such as section annotation
and VA manipulation.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Continuing our rework of the CPU context, we now move the GP
registers into the CPU context structure.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Continuing our rework of the CPU context, we now move the CP15
array into the CPU context structure. As this causes quite a bit
of churn, we introduce the vcpu_cp15() macro that abstract the
location of the actual array. This will probably help next time
we have to revisit that code.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In order to turn the WS code into something that looks a bit
more like the arm64 version, move the VFP registers into a
CPU context container for both the host and the guest.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Now that we've unified the way we refer to the HYP text between
arm and arm64, drop __kvm_hyp_code_start/end, and just use the
__hyp_text_start/end symbols.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
In order to be able to spread the HYP code into multiple compilation
units, adopt a layout similar to that of arm64:
- the HYP text is emited in its own section (.hyp.text)
- two linker generated symbols are use to identify the boundaries
of that section
No functionnal change.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
As we're about to move the stage2 init to C code, introduce some
C hooks that will later be populated with arch-specific implementations.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.
Fix up kvm to do
return copy_to_user(...)) ? -EFAULT : 0;
everywhere.
Cc: stable@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The problem:
On -rt, an emulated LAPIC timer instances has the following path:
1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled
This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.
The solution:
Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.
Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.
cyclictest command line:
This patch reduces the average latency in my tests from 14us to 11us.
Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04:
./x86-run x86/tscdeadline_latency.flat -cpu host
with idle=poll.
The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:
"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.
The mean shows an improvement indeed."
Before:
min max mean std
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 5162.596000 2019270.084000 5824.491541 20681.645558
std 75.431231 622607.723969 89.575700 6492.272062
min 4466.000000 23928.000000 5537.926500 585.864966
25% 5163.000000 1613252.750000 5790.132275 16683.745433
50% 5175.000000 2281919.000000 5834.654000 23151.990026
75% 5190.000000 2382865.750000 5861.412950 24148.206168
max 5228.000000 4175158.000000 6254.827300 46481.048691
After
min max mean std
count 1000.000000 1000.00000 1000.000000 1000.000000
mean 5143.511000 2076886.10300 5813.312474 21207.357565
std 77.668322 610413.09583 86.541500 6331.915127
min 4427.000000 25103.00000 5529.756600 559.187707
25% 5148.000000 1691272.75000 5784.889825 17473.518244
50% 5160.000000 2308328.50000 5832.025000 23464.837068
75% 5172.000000 2393037.75000 5853.177675 24223.969976
max 5222.000000 3922458.00000 6186.720500 42520.379830
[Patch was originaly based on the swait implementation found in the -rt
tree. Daniel ported it to mainline's version and gathered the
benchmark numbers for tscdeadline_latency test.]
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
On an MMIO access, we always copy the on-stack buffer info
the shared "run" structure, even if this is a read access.
This ends up leaking up to 8 bytes of uninitialized memory
into userspace, depending on the size of the access.
An obvious fix for this one is to only perform the copy if
this is an actual write.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|