Age | Commit message (Collapse) | Author |
|
It turns out that if we exit the guest due to a hcall instruction (sc 1),
and the loading of the instruction in the guest exit path fails for any
reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the
instruction after the hcall instruction rather than the hcall itself.
This in turn means that the instruction doesn't get recognized as an
hcall in kvmppc_handle_exit_pr() but gets passed to the guest kernel
as a sc instruction. That usually results in the guest kernel getting
a return code of 38 (ENOSYS) from an hcall, which often triggers a
BUG_ON() or other failure.
This fixes the problem by adding a new variant of kvmppc_get_last_inst()
called kvmppc_get_last_sc(), which fetches the instruction if necessary
from pc - 4 rather than pc.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Currently this is only being done on 64-bit. Rather than just move it
out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
consistent with lazy ee state, and so that we don't track more host
code as interrupts-enabled than necessary.
Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
that this function now has a role on 32-bit as well.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This corrects the usage of the tlbie (TLB invalidate entry) instruction
in HV KVM. The tlbie instruction changed between PPC970 and POWER7.
On the PPC970, the bit to select large vs. small page is in the instruction,
not in the RB register value. This changes the code to use the correct
form on PPC970.
On POWER7 we were calculating the AVAL (Abbreviated Virtual Address, Lower)
field of the RB value incorrectly for 64k pages. This fixes it.
Since we now have several cases to handle for the tlbie instruction, this
factors out the code to do a sequence of tlbies into a new function,
do_tlbies(), and calls that from the various places where the code was
doing tlbie instructions inline. It also makes kvmppc_h_bulk_remove()
use the same global_invalidates() function for determining whether to do
local or global TLB invalidations as is used in other places, for
consistency, and also to make sure that kvm->arch.need_tlb_flush gets
updated properly.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Older version of power architecture use Real Mode Offset register and Real Mode Limit
Selector for mapping guest Real Mode Area. The guest RMA should be physically
contigous since we use the range when address translation is not enabled.
This patch switch RMA allocation code to use contigous memory allocator. The patch
also remove the the linear allocator which not used any more
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Powerpc architecture uses a hash based page table mechanism for mapping virtual
addresses to physical address. The architecture require this hash page table to
be physically contiguous. With KVM on Powerpc currently we use early reservation
mechanism for allocating guest hash page table. This implies that we need to
reserve a big memory region to ensure we can create large number of guest
simultaneously with KVM on Power. Another disadvantage is that the reserved memory
is not available to rest of the subsystems and and that implies we limit the total
available memory in the host.
This patch series switch the guest hash page table allocation to use
contiguous memory allocator.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Pull KVM fixes from Paolo Bonzini:
"On the x86 side, there are some optimizations and documentation
updates. The big ARM/KVM change for 3.11, support for AArch64, will
come through Catalin Marinas's tree. s390 and PPC have misc cleanups
and bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
KVM: PPC: Ignore PIR writes
KVM: PPC: Book3S PR: Invalidate SLB entries properly
KVM: PPC: Book3S PR: Allow guest to use 1TB segments
KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
KVM: PPC: Book3S PR: Fix proto-VSID calculations
KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
KVM: Fix RTC interrupt coalescing tracking
kvm: Add a tracepoint write_tsc_offset
KVM: MMU: Inform users of mmio generation wraparound
KVM: MMU: document fast invalidate all mmio sptes
KVM: MMU: document fast invalidate all pages
KVM: MMU: document fast page fault
KVM: MMU: document mmio page fault
KVM: MMU: document write_flooding_count
KVM: MMU: document clear_spte_count
KVM: MMU: drop kvm_mmu_zap_mmio_sptes
KVM: MMU: init kvm generation close to mmio wrap-around value
KVM: MMU: add tracepoint for check_mmio_spte
KVM: MMU: fast invalidate all mmio sptes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull voluntary preemption fixes from Ingo Molnar:
"This tree contains a speedup which is achieved through better
might_sleep()/might_fault() preemption point annotations for uaccess
functions, by Michael S Tsirkin:
1. The only reason uaccess routines might sleep is if they fault.
Make this explicit for all architectures.
2. A voluntary preemption point in uaccess functions means compiler
can't inline them efficiently, this breaks assumptions that they
are very fast and small that e.g. net code seems to make. Remove
this preemption point so behaviour matches with what callers
assume.
3. Accesses (e.g through socket ops) to kernel memory with KERNEL_DS
like net/sunrpc does will never sleep. Remove an unconditinal
might_sleep() in the might_fault() inline in kernel.h (used when
PROVE_LOCKING is not set).
4. Accesses with pagefault_disable() return EFAULT but won't cause
caller to sleep. Check for that and thus avoid might_sleep() when
PROVE_LOCKING is set.
These changes offer a nice speedup for CONFIG_PREEMPT_VOLUNTARY=y
kernels, here's a network bandwidth measurement between a virtual
machine and the host:
before:
incoming: 7122.77 Mb/s
outgoing: 8480.37 Mb/s
after:
incoming: 8619.24 Mb/s [ +21.0% ]
outgoing: 9455.42 Mb/s [ +11.5% ]
I kept these changes in a separate tree, separate from scheduler
changes, because it's a mixed MM and scheduler topic"
* 'sched-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm, sched: Allow uaccess in atomic with pagefault_disable()
mm, sched: Drop voluntary schedule from might_fault()
x86: uaccess s/might_sleep/might_fault/
tile: uaccess s/might_sleep/might_fault/
powerpc: uaccess s/might_sleep/might_fault/
mn10300: uaccess s/might_sleep/might_fault/
microblaze: uaccess s/might_sleep/might_fault/
m32r: uaccess s/might_sleep/might_fault/
frv: uaccess s/might_sleep/might_fault/
arm64: uaccess s/might_sleep/might_fault/
asm-generic: uaccess s/might_sleep/might_fault/
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes:
- load-calculation cleanups and improvements, by Alex Shi
- various nohz related tidying up of statisics, by Frederic
Weisbecker
- factor out /proc functions to kernel/sched/proc.c, by Paul
Gortmaker
- simplify the RT policy scheduler, by Kirill Tkhai
- various fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
sched/debug: Remove CONFIG_FAIR_GROUP_SCHED mask
sched/debug: Fix formatting of /proc/<PID>/sched
sched: Fix typo in struct sched_avg member description
sched/fair: Fix typo describing flags in enqueue_entity
sched/debug: Add load-tracking statistics to task
sched: Change get_rq_runnable_load() to static and inline
sched/tg: Remove tg.load_weight
sched/cfs_rq: Change atomic64_t removed_load to atomic_long_t
sched/tg: Use 'unsigned long' for load variable in task group
sched: Change cfs_rq load avg to unsigned long
sched: Consider runnable load average in move_tasks()
sched: Compute runnable load avg in cpu_load and cpu_avg_load_per_task
sched: Update cpu load after task_tick
sched: Fix sleep time double accounting in enqueue entity
sched: Set an initial value of runnable avg for new forked task
sched: Move a few runnable tg variables into CONFIG_SMP
Revert "sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking"
sched: Don't mix use of typedef ctl_table and struct ctl_table
sched: Remove WARN_ON(!sd) from init_sched_groups_power()
sched: Fix memory leakage in build_sched_groups()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull WW mutex support from Ingo Molnar:
"This tree adds support for wound/wait style locks, which the graphics
guys would like to make use of in the TTM graphics subsystem.
Wound/wait mutexes are used when other multiple lock acquisitions of a
similar type can be done in an arbitrary order. The deadlock handling
used here is called wait/wound in the RDBMS literature: The older
tasks waits until it can acquire the contended lock. The younger
tasks needs to back off and drop all the locks it is currently
holding, ie the younger task is wounded.
See this LWN.net description of W/W mutexes:
https://lwn.net/Articles/548909/
The comments there outline specific usecases for this facility (which
have already been implemented for the DRM tree).
Also see Documentation/ww-mutex-design.txt for more details"
* 'core-mutexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking-selftests: Handle unexpected failures more strictly
mutex: Add more w/w tests to test EDEADLK path handling
mutex: Add more tests to lib/locking-selftest.c
mutex: Add w/w tests to lib/locking-selftest.c
mutex: Add w/w mutex slowpath debugging
mutex: Add support for wound/wait style locks
arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the big TTY / Serial driver merge for 3.11-rc1.
It's not all that big, nothing major changed in the tty api, which is
a nice change, just a number of serial driver fixes and updates and
new drivers, along with some n_tty fixes to help resolve some reported
issues.
All of these have been in the linux-next releases for a while, with
the exception of the last revert patch, which was reported this past
weekend by two different people as being needed."
* tag 'tty-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (51 commits)
Revert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller"
pch_uart: Add uart_clk selection for the MinnowBoard
tty: atmel_serial: prepare clk before calling enable
tty: Reset itty for other pty
n_tty: Buffer work should not reschedule itself
n_tty: Fix unsafe update of available buffer space
n_tty: Untangle read completion variables
n_tty: Encapsulate minimum_to_wake within N_TTY
serial: omap: Fix device tree based PM runtime
serial: imx: Fix serial clock unbalance
serial/mpc52xx_uart: fix kernel panic when system reboot
serial: mfd: Add sysrq support
serial: imx: enable the clocks for console
tty: serial: add Freescale lpuart driver support
serial: imx: Improve Kconfig text
serial: imx: Allow module build
serial: imx: Fix warning when !CONFIG_SERIAL_IMX_CONSOLE
tty/serial/sirf: fix error propagation in sirfsoc_uart_probe()
serial: omap: fix potential NULL pointer dereference in serial_omap_runtime_suspend()
tty: serial: Enable uartlite for ARM zynq
...
|
|
Merge in a recent upstream commit:
c2853c8df57f include/linux/math64.h: add div64_ul()
because:
72a4cf20cb71 sched: Change cfs_rq load avg to unsigned long
relies on it.
[ We don't rebase sched/core for this, because the handful of
followup commits after the broken commit are not behavioral
changes so are unlikely to be needed during bisection. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With this, the guest can use 1TB segments as well as 256MB segments.
Since we now have the situation where a single emulated guest segment
could correspond to multiple shadow segments (as the shadow segments
are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
to scan for all shadow segments that need to be removed.
This restructures the guest HPT (hashed page table) lookup code to
use the correct hashing and matching functions for HPTEs within a
1TB segment. We use the standard hpt_hash() function instead of
open-coding the hash calculation, and we use HPTE_V_COMPARE() with
an AVPN value that has the B (segment size) field included. The
calculation of avpn is done a little earlier since it doesn't change
in the loop starting at the do_second label.
The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
This is because the users of this function are creating 256MB SLB
entries. We set a new VSID_1T flag so that entries created from 1T
segments don't collide with entries from 256MB segments.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This will allow me to call functions that have multiple
arguments if fastpath fails. This is required to support ticket
mutexes, because they need to be able to pass an extra argument
to the fail function.
Originally I duplicated the functions, by adding
__mutex_fastpath_lock_retval_arg. This ended up being just a
duplication of the existing function, so a way to test if
fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being
able to call an atomic_set instead of atomic_xchg, and making it
easier to detect if the wrong unlock function was previously
used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Most of the stuff from kernel/sched.c was moved to kernel/sched/core.c long time
back and the comments/Documentation never got updated.
I figured it out when I was going through sched-domains.txt and so thought of
fixing it globally.
I haven't crossed check if the stuff that is referenced in sched/core.c by all
these files is still present and hasn't changed as that wasn't the motive behind
this patch.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/cdff76a265326ab8d71922a1db5be599f20aad45.1370329560.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We want the changes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
"So here are 3 fixes still for 3.10. Fixes are simple, bugs are nasty
(though not recent regressions, nasty enough) and all targeted at
stable"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix missing/delayed calls to irq_work
powerpc: Fix emulation of illegal instructions on PowerNV platform
powerpc: Fix stack overflow crash in resume_kernel when ftracing
|
|
It's possible for us to crash when running with ftrace enabled, eg:
Bad kernel stack pointer bffffd12 at c00000000000a454
cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
pc: c00000000000a454: resume_kernel+0x34/0x60
lr: c00000000000335c: performance_monitor_common+0x15c/0x180
sp: bffffd12
msr: 8000000000001032
dar: bffffd12
dsisr: 42000000
If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.
Dumping the stack we see:
3:mon> t c0000002ecab0000
[c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
[c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
--- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
[c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
[c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
[c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
[c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
[c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
[c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
[c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
[c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
[c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
[c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
[c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
--- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
... and so on
__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.
This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.
The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().
Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.
[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
"powerpc: Rework runlatch code" by myself --BenH
]
CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Pull kvm bugfixes from Gleb Natapov:
"There is one more fix for MIPS KVM ABI here, MIPS and PPC build
breakage fixes and a couple of PPC bug fixes"
* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
kvm/ppc/booke: Hold srcu lock when calling gfn functions
kvm/ppc/booke64: Disable e6500 support
kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
kvm: Add definition of KVM_REG_MIPS
KVM: add kvm_para_available to asm-generic/kvm_para.h
|
|
Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.
This fixes a build break for 64-bit booke KVM.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
When introducing support for DABRX in 4474ef0, we broke older 32-bit CPUs
that don't have that register.
Some CPUs have a DABR but not DABRX. Configuration are:
- No 32bit CPUs have DABRX but some have DABR.
- POWER4+ and below have the DABR but no DABRX.
- 970 and POWER5 and above have DABR and DABRX.
- POWER8 has DAWR, hence no DABRX.
This introduces CPU_FTR_DABRX and sets it on appropriate CPUs. We use
the top 64 bits for CPU FTR bits since only 64 bit CPUs have this.
Processors that don't have the DABRX will still work as they will fall
back to software filtering these breakpoints via perf_exclude_event().
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: "Gorelik, Jacob (335F)" <jacob.gorelik@jpl.nasa.gov>
cc: stable@vger.kernel.org (v3.9 only)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
Add MPC5125 PSC register layout structure, MPC5125 specific
psc_ops function set and the compatible string.
Signed-off-by: Vladimir Ermakov <vooon341@gmail.com>
Signed-off-by: Matteo Facchinetti <matteo.facchinetti@sirius-es.it>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X. H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state. H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor. Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.
These hcalls are not currently used by Linux guests, but may be in
future.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On context switch, we should have no prefetch streams leak from one
userspace process to another. This frees up prefetch resources for the
next process.
Based on patch from Milton Miller.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
transactions
When in an active transaction that takes a signal, we need to be careful with
the stack. It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend. In this case, the stack is part of the checkpointed
transactional memory state. If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.
To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state. This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback. The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.
For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.
Tested with 64 and 32 bit signals
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
These cause codes are usable by userspace, so let's export to uapi.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context. We need to abort these transactions and send them back to
userspace for the hardware to rollback.
We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.
This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now). If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure. This also adds new tm abort cause codes to report the reason of the
persistent error to the user.
Crappy test case here http://neuling.org/devel/junkcode/aligntm.c
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes. Unfortunately we don't respect this currently.
Below fixes this to move our cause codes to below this region.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The only reason uaccess routines might sleep
is if they fault. Make this explicit.
Arnd Bergmann suggested that the following code
if (!is_kernel_addr((unsigned long)__pu_addr))
might_fault();
can be further simplified by adding a version of might_fault
that includes the kernel addr check.
Will be considered as a further optimization in future.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1369577426-26721-7-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This moves the quirk itself to pci_64.c as to get built on all ppc64
platforms (the only ones with a pci_dn), factors the two implementations
of get_pdn() into a single pci_get_dn() and use the quirk to do 32-bit
MSIs on IODA based powernv platforms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In commit 9353374 "Context switch the new EBB SPRs" we added support for
context switching some new EBB SPRs. However despite four of us signing
off on that patch we missed some. To be fair these are not actually new
SPRs, but they are now potentially user accessible so need to be context
switched.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
commit 0430499ce9d78691f3985962021b16bf8f8a8048
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
commit bf5a3c13b939813d28ce26c01425054c740d6731
TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Future firmwares will support that new version
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch brings online all threads which are present but not online
prior to migration/hibernation. After migration/hibernation those
threads are taken back offline.
During migration/hibernation all online CPUs must call H_JOIN, this is
required by the hypervisor. Without this patch, threads that are offline
(H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
deadlocked (all threads either JOIN'd or CEDE'd).
Cc: <stable@kernel.org>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Our pgtable are 2*sizeof(pte_t)*PTRS_PER_PTE which is PTE_FRAG_SIZE.
Instead of depending on frag size, mask with PMD_MASKED_BITS.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
lockdep.c has this:
/*
* So we're supposed to get called after you mask local IRQs,
* but for some reason the hardware doesn't quite think you did
* a proper job.
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return;
Since irqs_disabled() is based on soft_enabled(), that (not just the
hard EE bit) needs to be 0 before we call trace_hardirqs_off.
Signed-off-by: Scott Wood <scottwood@freescale.com>
|
|
We add a machine_shutdown hook that frees the OPAL interrupts
(so they get masked at the source and don't fire while kexec'ing)
and which triggers an IODA reset on all the PCIe host bridges
which will have the effect of blocking all DMAs and subsequent
PCIs interrupts.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch adds a new udbg early debug console which utilises
statically defined input and output buffers stored within the kernel
BSS. It is primarily designed to assist with bring up of new hardware
which may not have a working console but which has a method of
reading/writing kernel memory.
This version incorporates comments made by Ben H (thanks!).
Changes from v1:
- Add memory barriers.
- Ensure updating of read/write positions is atomic.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
If hard_irq_disable() is called while interrupts are already soft-disabled
(which is the most common case) all is already well.
However you can (and in some cases want) to call it while everything is
enabled (to make sure you don't get a lazy even, for example before entry
into KVM guests) and in this case we need to inform the irq tracer that
the irqs are going off.
We have to change the inline into a macro to avoid an include circular
dependency hell hole.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.
This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.
This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Also, make HTM's presence dependent on the .config option.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().
From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The following patch implements a new PAPR change which allows
the OS to force the use of 32 bit MSIs, regardless of what
the PCI capabilities indicate. This is required for some
devices that advertise support for 64 bit MSIs but don't
actually support them.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.
Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.
A simple test was created to verify the fix:
http://ozlabs.org/~anton/junkcode/user_dscr_test.c
Without the patch we get a SIGILL and it passes with the patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
"The main highlights this time around are:
- A pile of addition POWER8 bits and nits, such as updated
performance counter support (Michael Ellerman), new branch history
buffer support (Anshuman Khandual), base support for the new PCI
host bridge when not using the hypervisor (Gavin Shan) and other
random related bits and fixes from various contributors.
- Some rework of our page table format by Aneesh Kumar which fixes a
thing or two and paves the way for THP support. THP itself will
not make it this time around however.
- More Freescale updates, including Altivec support on the new e6500
cores, new PCI controller support, and a pile of new boards support
and updates.
- The usual batch of trivial cleanups & fixes"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
powerpc: Fix build error for book3e
powerpc: Context switch the new EBB SPRs
powerpc: Turn on the EBB H/FSCR bits
powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
powerpc: Setup BHRB instructions facility in HFSCR for POWER8
powerpc: Fix interrupt range check on debug exception
powerpc: Update tlbie/tlbiel as per ISA doc
powerpc: Print page size info during boot
powerpc: print both base and actual page size on hash failure
powerpc: Fix hpte_decode to use the correct decoding for page sizes
powerpc: Decode the pte-lp-encoding bits correctly.
powerpc: Use encode avpn where we need only avpn values
powerpc: Reduce PTE table memory wastage
powerpc: Move the pte free routines from common header
powerpc: Reduce the PTE_INDEX_SIZE
powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
powerpc: New hugepage directory format
powerpc: Don't truncate pgd_index wrongly
powerpc: Don't hard code the size of pte page
powerpc: Save DAR and DSISR in pt_regs on MCE
...
|
|
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it. The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.
The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source. The attribute number is the same as the interrupt
source number.
This does not support irq routing or irqfd yet.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
|