Age | Commit message (Collapse) | Author |
|
In commit 11f1a4b9755f ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.
However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case. And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.
Which brings us to Andy Lutomirski.
He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work. My bad. The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem. But this just fixes the bug without
any bigger cleanup surgery.
Reported-and-tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Two scsiback fixes (resource leak and spurious warning).
- Fix DMA mapping of compound pages on arm/arm64.
- Fix some pciback regressions in MSI-X handling.
- Fix a pcifront crash due to some uninitialize state.
* tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
xen/pcifront: Report the errors better.
xen/pciback: Save the number of MSI-X entries to be copied later.
xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
xen: fix potential integer overflow in queue_reply
xen/arm: correctly handle DMA mapping of compound pages
xen/scsiback: avoid warnings when adding multiple LUNs to a domain
xen/scsiback: correct frontend counting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This is unusually large, partly due to the EFI fixes that prevent
accidental deletion of EFI variables through efivarfs that may brick
machines. These fixes are somewhat involved to maintain compatibility
with existing install methods and other usage modes, while trying to
turn off the 'rm -rf' bricking vector.
Other fixes are for large page ioremap()s and for non-temporal
user-memcpy()s"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix vmalloc_fault() to handle large pages properly
hpet: Drop stale URLs
x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A handful of CPU hotplug related fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Plug potential memory leak in CPU_UP_PREPARE
perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
perf/core: Remove bogus UP_CANCELED hotplug state
perf/x86/amd/uncore: Plug reference leak
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
- Fix dedotify for binutils >= 2.26 from Andreas Schwab
- Don't trace hcalls on offline CPUs from Denis Kirjanov
- eeh: Fix stale cached primary bus from Gavin Shan
- eeh: Fix stale PE primary bus from Gavin Shan
- mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
- ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy
* tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/ioda: Set "read" permission when "write" is set
powerpc/mm: Fix Multi hit ERAT cause by recent THP update
powerpc/powernv: Fix stale PE primary bus
powerpc/eeh: Fix stale cached primary bus
powerpc/pseries: Don't trace hcalls on offline CPUs
powerpc: Fix dedotify for binutils >= 2.26
powerpc/book3s_32: Fix build error with checkpoint restart
|
|
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: slab: free kmem_cache_node after destroy sysfs file
ipc/shm: handle removed segments gracefully in shm_mmap()
MAINTAINERS: update Kselftest Framework mailing list
devm_memremap_release(): fix memremap'd addr handling
mm/hugetlb.c: fix incorrect proc nr_hugepages value
mm, x86: fix pte_page() crash in gup_pte_range()
fsnotify: turn fsnotify reaper thread into a workqueue job
Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
mm: fix regression in remap_file_pages() emulation
thp, dax: do not try to withdraw pgtable from non-anon VMA
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here are some more arm64 fixes for 4.5. This has mostly come from
Yang Shi, who saw some issues under -rt that also affect mainline.
The rest of it is pretty small, but still worth having.
We've got an old issue outstanding with valid_user_regs which will
likely wait until 4.6 (since it would really benefit from some time in
-next) and another issue with kasan and idle which should be fixed
next week.
Apart from that, pretty quiet here (and still no sign of the THP issue
reported on s390...)
Summary:
- Allow EFI stub to use strnlen(), which is required by recent libfdt
- Avoid smp_processor_id() in preempt context during unwinding
- Avoid false Kasan warnings during unwinding
- Ensure early devices are picked up by the IOMMU DMA ops
- Avoid rebuilding the kernel for the 'install' target
- Run fixup handlers for alignment faults on userspace access"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: allow the kernel to handle alignment faults on user accesses
arm64: kbuild: make "make install" not depend on vmlinux
arm64: dma-mapping: fix handling of devices registered before arch_initcall
arm64/efi: Make strnlen() available to the EFI namespace
arm/arm64: crypto: assure that ECB modes don't require an IV
arm64: make irq_stack_ptr more robust
arm64: debug: re-enable irqs before sending breakpoint SIGTRAP
arm64: disable kasan when accessing frame->fp in unwind_frame
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Several bug fixes:
- There are four different stack tracers, and three of them have
bugs. For 4.5 the bugs are fixed and we prepare a cleanup patch
for the next merge window.
- Three bug fixes for the dasd driver in regard to parallel access
volumes and the new max_dev_sectors block device queue limit
- The irq restore optimization needs a fixup for memcpy_real
- The diagnose trace code has a conflict with lockdep"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: fix performance drop
s390/maccess: reduce stnsm instructions
s390/diag: avoid lockdep recursion
s390/dasd: fix refcount for PAV reassignment
s390/dasd: prevent incorrect length error under z/VM after PAV changes
s390: fix DAT off memory access, e.g. on kdump
s390/oprofile: fix address range for asynchronous stack
s390/perf_event: fix address range for asynchronous stack
s390/stacktrace: add save_stack_trace_regs()
s390/stacktrace: save full stack traces
s390/stacktrace: add missing end marker
s390/stacktrace: fix address ranges for asynchronous and panic stack
s390/stacktrace: fix save_stack_trace_tsk() for current task
|
|
Although we don't expect to take alignment faults on access to normal
memory, misbehaving (i.e. buggy) user code can pass MMIO pointers into
system calls, leading to things like get_user accessing device memory.
Rather than OOPS the kernel, allow any exception fixups to run and
return something like -EFAULT back to userspace. This makes the
behaviour more consistent with userspace, even though applications with
access to device mappings can easily cause other issues if they try
hard enough.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Eun Taik Lee <eun.taik.lee@samsung.com>
[will: dropped __kprobes annotation and rewrote commit mesage]
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For the same reason as commit 19514fc665ff ("arm, kbuild: make "make
install" not depend on vmlinux"), the install targets should never
trigger the rebuild of the kernel.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:
- regression (from 4.4) fix for ordering issue, introduced by an
earlier ftrace change, that broke live patching of modules.
The fix replaces the ftrace module notifier by direct call in order
to make the ordering guaranteed and well-defined. The patch, from
Jessica Yu, has been acked both by Steven and Rusty
- error message fix from Miroslav Benes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
ftrace/module: remove ftrace module notifier
livepatch: change the error message in asm/livepatch.h header files
|
|
Commit 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") has
moved up the pte_page(pte) in x86's fast gup_pte_range(), for no
discernible reason: put it back where it belongs, after the pte_flags
check and the pfn_valid cross-check.
That may be the cause of the NULL pointer dereference in
gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a
qemu-kvm based VM.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Michael Long <Harn-Solo@gmx.de>
Tested-by: Michael Long <Harn-Solo@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:
BUG: unable to handle kernel paging request at ffff880840000ff8
IP: vmalloc_fault+0x1be/0x300
PGD c7f03a067 PUD 0
Oops: 0000 [#1] SM
Call Trace:
__do_page_fault+0x285/0x3e0
do_page_fault+0x2f/0x80
? put_prev_entity+0x35/0x7a0
page_fault+0x28/0x30
? memcpy_erms+0x6/0x10
? schedule+0x35/0x80
? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
? schedule_timeout+0x183/0x240
btt_log_read+0x63/0x140 [nd_btt]
:
? __symbol_put+0x60/0x60
? kernel_read+0x50/0x80
SyS_finit_module+0xb9/0xf0
entry_SYSCALL_64_fastpath+0x1a/0xa4
Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.
vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes. pgd_ctor() sets the kernel's PGD entries to
user's during fork(). When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it. If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.
Following changes are made to vmalloc_fault().
64-bit:
- No change for the PGD sync operation as it handles large
pages already.
- Add pud_huge() and pmd_huge() to the validation code to
handle large pages.
- Change pud_page_vaddr() to pud_pfn() since an ioremap range
is not directly mapped (while the if-statement still works
with a bogus addr).
- Change pmd_page() to pmd_pfn() since an ioremap range is not
backed by struct page (while the if-statement still works
with a bogus addr).
32-bit:
- No change for the sync operation since the index3 PGD entry
covers the entire vmalloc range, which is always valid.
(A separate change to sync PGD entry is necessary if this
memory layout is changed regardless of the page size.)
- Add pmd_huge() to the validation code to handle large pages.
This is for completeness since vmalloc_fault() won't happen
in ioremap'd ranges as its PGD entry is always valid.
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Quite often drivers set only "write" permission assuming that this
includes "read" permission as well and this works on plenty of
platforms. However IODA2 is strict about this and produces an EEH when
"read" permission is not set and reading happens.
This adds a workaround in the IODA code to always add the "read" bit
when the "write" bit is set.
Fixes: 10b35b2b7485 ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE")
Cc: stable@vger.kernel.org # 4.2+
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
This patch ensures that devices, which got registered before arch_initcall
will be handled correctly by IOMMU-based DMA-mapping code.
Cc: <stable@vger.kernel.org>
Fixes: 13b8629f6511 ("arm64: Add IOMMU dma_ops")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Looks like the HPET spec at intel.com got moved.
It isn't hard to find so drop the link, just mention
the revision assumed.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
__copy_user_nocache()
Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices. This problem
is reproducible.
The BTT driver calls pmem_rw_bytes() to update data in pmem
devices. This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.
__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes). The
BTT driver updates the BTT map table, which entry size is
4 bytes. Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.
Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes. The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.
Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirements.
Also change numeric branch target labels to named local labels.
No code changed:
arch/x86/lib/copy_user_64.o:
text data bss dec hex filename
1239 0 0 1239 4d7 copy_user_64.o.before
1239 0 0 1239 4d7 copy_user_64.o.after
md5:
58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.before.asm
58bed94c2db98c1ca9a2d46d0680aaae copy_user_64.o.after.asm
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brian.boylston@hpe.com
Cc: dan.j.williams@intel.com
Cc: linux-nvdimm@lists.01.org
Cc: micah.parrish@hpe.com
Cc: ross.zwisler@linux.intel.com
Cc: vishal.l.verma@intel.com
Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
[ Small readability edits and added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
When fixing the DAT off bug ("s390: fix DAT off memory access, e.g.
on kdump") both Christian and I missed that we can save an additional
stnsm instruction.
This saves us a couple of cycles which could improve the speed of
memcpy_real.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
struct is freed, but the percpu pointer still references it. Set it to NULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull ARM KVM fixes from Paolo Bonzini:
- Fix for an unpleasant crash when the VM is created without a timer
- Allow HYP mode to access the full PA space, and not only 40bit
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
arm64: KVM: Configure TCR_EL2.PS at runtime
KVM: arm/arm64: Fix reference to uninitialised VGIC
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM fixes for 4.5-rc4
- Fix for an unpleasant crash when the VM is created without a timer
- Allow HYP mode to access the full PA space, and not only 40bit
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k fixes from Geert Uytterhoeven:
"Summary:
- Wire up new copy_file_range syscall
- Update defconfigs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.5-rc1
m68k: Wire up copy_file_range
|
|
Changes introduced in the upstream version of libfdt pulled in by commit
91feabc2e224 ("scripts/dtc: Update to upstream commit b06e55c88b9b") use
the strnlen() function, which isn't currently available to the EFI name-
space. Add it to the EFI namespace to avoid a linker error.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Rob Herring <robh@kernel.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
ECB modes don't use an initialization vector. The kernel
/proc/crypto interface doesn't reflect this properly.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
The messages should be different depending on the type of error.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
|
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.
Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.
Consider following race:
CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()
As result the THP will not be freed. The leak is detected by check_mm():
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
The above required us to not mark pmd none during a pmd split.
The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)
Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.
Fixes: eef1b3ba053a ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When PCI bus is unplugged during full hotplug for EEH recovery,
the platform PE instance (struct pnv_ioda_pe) isn't released and
it dereferences the stale PCI bus that has been released. It leads
to kernel crash when referring to the stale PCI bus.
This fixes the issue by correcting the PE's primary bus when it's
oneline at plugging time, in pnv_pci_dma_bus_setup() which is to
be called by pcibios_fixup_bus().
Cc: stable@vger.kernel.org # v4.1+
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
When PE is created, its primary bus is cached to pe->bus. At later
point, the cached primary bus is returned from eeh_pe_bus_get().
However, we could get stale cached primary bus and run into kernel
crash in one case: full hotplug as part of fenced PHB error recovery
releases all PCI busses under the PHB at unplugging time and recreate
them at plugging time. pe->bus is still dereferencing the PCI bus
that was released.
This adds another PE flag (EEH_PE_PRI_BUS) to represent the validity
of pe->bus. pe->bus is updated when its first child EEH device is
online and the flag is set. Before unplugging in full hotplug for
error recovery, the flag is cleared.
Fixes: 8cdb2833 ("powerpc/eeh: Trace PCI bus from PE")
Cc: stable@vger.kernel.org #v3.11+
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
If a cpu is hotplugged while the hcall trace points are active, it's
possible to hit a warning from RCU due to the trace points calling into
RCU from an offline cpu, eg:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
Make the hypervisor tracepoints conditional by using
TRACE_EVENT_FN_COND.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irqchip fixes from Thomas Gleixner:
"Another set of ARM SoC related irqchip fixes:
- Plug a memory leak in gicv3-its
- Limit features to the root gic interrupt controller
- Add a missing barrier in the gic-v3 IAR access
- Another compile test fix for sun4i"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
irqchip/gic: Only set the EOImodeNS bit for the root controller
irqchip/gic: Only populate set_affinity for the root controller
irqchip/gicv3-its: Fix memory leak in its_free_tables()
irqchip/sun4i: Fix compilation outside of arch/arm
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two small fixlets for x86:
- Prevent a KASAN false positive in thread_saved_pc()
- Fix a 32-bit truncation problem in the x86 numa code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels
x86: Fix KASAN false positives in thread_saved_pc()
|
|
Pull MIPS fixes from Ralf Baechle:
"Here's the first round of MIPS fixes after the merge window:
- Detect Octeon III's PCI correctly.
- Fix return value of the MT7620 probing function.
- Wire up the copy_file_range syscall.
- Fix 64k page support on 32 bit kernels.
- Fix the early Coherency Manager probe.
- Allow only hardware-supported page sizes to be selected for R6000.
- Fix corner cases for the RDHWR nstruction emulation on old hardware.
- Fix FPU handling corner cases.
- Remove stale entry for BCM33xx from the MAINTAINERS file.
- 32 and 64 bit ELF headers are different, handle them correctly"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
mips: Differentiate between 32 and 64 bit ELF header
MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III
MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe()
MIPS: Fix early CM probing
MIPS: Wire up copy_file_range syscall.
MIPS: Fix 64k page support for 32 bit kernels.
MIPS: R6000: Don't allow 64k pages for R6000.
MIPS: traps.c: Correct microMIPS RDHWR emulation
MIPS: traps.c: Don't emulate RDHWR in the CpU #0 exception handler
MAINTAINERS: Remove stale entry for BCM33xx chips
MIPS: Fix FPU disable with preemption
MIPS: Properly disable FPU in start_thread()
MIPS: Fix buffer overflow in syscall_get_arguments()
|
|
Pull ARM fixes from Russell King:
"A couple of ARM fixes from Linus for the ICST clock generator code"
[ "Linus" here is Linus Walleij. Name-stealer.
Linus "there can be only one" Torvalds ]
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8519/1: ICST: try other dividends than 1
ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
"I've been sitting on some of these fixes for a while.
- Corner case of returning to delay slot from interrupt
- Changing default interrupt prioiry level
- Kconfig'ize support for super pages
- Other minor fixes"
* tag 'arc-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: Introduce explicit super page size support
ARCv2: intc: Allow interruption by lowest priority interrupt
ARCv2: Check for LL-SC livelock only if LLSC is enabled
ARC: shrink cpuinfo by not saving full timer BCR
ARCv2: clocksource: Rename GRTC -> GFRC ...
ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
|
|
Merge fixes from Andrew Morton:
"10 fixes"
The lockdep hlist conversion is in the locking tree too, waiting for the
next merge window. Andrew thought it should go in now. I'll take it,
since it fixes a real problem and looks trivially correct (famous last
words).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
arch/x86/Kconfig: CONFIG_X86_UV should depend on CONFIG_EFI
mm: fix pfn_t vs highmem
kernel/locking/lockdep.c: convert hash tables to hlists
mm,thp: fix spellos in describing __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
mm,thp: khugepaged: call pte flush at the time of collapse
mm/backing-dev.c: fix error path in wb_init()
mm, dax: check for pmd_none() after split_huge_pmd()
vsprintf: kptr_restrict is okay in IRQ when 2
mm: fix filemap.c kernel doc warning
ubsan: cosmetic fix to Kconfig text
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull fbdev fixes from Tomi Valkeinen:
- fix omap2plus_defconfig to enable omapfb as it was in v4.4
- ocfb: fix timings for margins
- s6e8ax0, da8xx-fb: fix compile warnings
- mmp: fix build failure caused by bad printk parameters
- imxfb: fix clock issue which kept the display off
* tag 'fbdev-fixes-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
video: fbdev: imxfb: Provide a reset mechanism
fbdev: mmp: print IRQ resource using %pR format string
fbdev: da8xx-fb: remove incorrect type cast
fbdev: s6e8ax0: avoid unused function warnings
ocfb: fix tgdel and tvdel timing parameters
ARM: omap2plus_defconfig: update display configs
|
|
Switching between stacks is only valid if we are tracing ourselves while on the
irq_stack, so it is only valid when in current and non-preemptible context,
otherwise is is just zeroed off.
Fixes: 132cd887b5c5 ("arm64: Modify stack trace and dump for use with irq_stack")
Acked-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M]
So far Linux supported a single super page size for a given Normal page,
depending on the software page walking address split.
e.g. we had 11:8:13 address split for 8K page, which meant super page
was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT)
Now we turn this around, by allowing multiple Super Pages in Kconfig
(currently 2M and 16M only) and forcing page walker address split to
PGDIR_SHIFT and PAGE_SHIFT
For configs without Super page, things are same as before and
PGDIR_SHIFT can be hacked to get non default address split
The motivation for this change is a customer who needs 16M super page
and a 8K Normal page combo.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
|
|
arch/x86/built-in.o: In function `uv_bios_call':
(.text+0xeba00): undefined reference to `efi_call'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since the dawn of time the ICST code has only supported divide
by one or hang in an eternal loop. Luckily we were always dividing
by one because the reference frequency for the systems using
the ICSTs is 24MHz and the [min,max] values for the PLL input
if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
will always terminate immediately without assigning any divisor
for the reference frequency.
But for the code to make sense, let's insert the missing i++
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
|
Setting TCR_EL2.PS to 40 bits is wrong on systems with less that
less than 40 bits of physical addresses. and breaks KVM on systems
where the RAM is above 40 bits.
This patch uses ID_AA64MMFR0_EL1.PARange to set TCR_EL2.PS dynamically,
just like we already do for VTCR_EL2.PS.
[Marc: rewrote commit message, patch tidy up]
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
The diagnose tracer will indirectly call back into the lockdep code
when lockdep does not expect it (arch_spinlock). This causes lockdep
to disable itself and therefore we don't have a working lock
dependency validator anymore.
This patch effectively disables tracing of diag 0x9c and 0x44 if
lockdep is enabled. If however lockdep is enabled spinlocks are
mainly implemented using a trylock variant, which will not issue any
diag 0x9c or 0x44. So this change has hardly any effect on tracing
except when arch_spinlock and friends are explicitly used.
Reported-and-Tested-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
commit 204ee2c56431 ("s390/irqflags: optimize irq restore") optimized
irqrestore to really only care about interrupts and adapted the
remaining low level users. One spot (memcpy_real) was not touched,
though - fix it. Otherwise a kdump kernel will fail while reading
the old kernel. As we re-enable irqs with a non-standard function
we have to tell lockdep about that.
Fixes: 204ee2c56431 ("s390/irqflags: optimize irq restore")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Depending on the configuration either the 32 or 64 bit version of
elf_check_arch() is defined. parse_crash_elf{32|64}_headers() does
some basic verification of the ELF header via
vmcore_elf{32|64}_check_arch() which happen to map to elf_check_arch().
Since the implementation 32 and 64 bit version of elf_check_arch()
differ, we use the wrong type:
In file included from include/linux/elf.h:4:0,
from fs/proc/vmcore.c:13:
fs/proc/vmcore.c: In function 'parse_crash_elf64_headers':
>> arch/mips/include/asm/elf.h:228:23: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
struct elfhdr *__h = (hdr); \
^
include/linux/crash_dump.h:41:37: note: in expansion of macro 'elf_check_arch'
#define vmcore_elf64_check_arch(x) (elf_check_arch(x) || vmcore_elf_check_arch_cross(x))
^
fs/proc/vmcore.c:1015:4: note: in expansion of macro 'vmcore_elf64_check_arch'
!vmcore_elf64_check_arch(&ehdr) ||
^
Therefore, we rather define vmcore_elf{32|64}_check_arch() as a
basic machine check and use it also in binfm_elf?32.c as well.
Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Suggested-by: Maciej W. Rozycki <macro@imgtec.com>
Reviewed-by: Maciej W. Rozycki <macro@imgtec.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/12529/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
The ARM GICv3 specification mentions the need for dsb after a read
from the ICC_IAR1_EL1 register:
4.1.1 Physical CPU Interface:
The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1
on the state of a returned INTID are not guaranteed
to be visible until after the execution of a DSB.
Not having this could result in missed interrupts, so let's add the
required barrier.
[Marc: fixed commit message]
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
|
|
force_sig_info can sleep under an -rt kernel, so attempting to send a
breakpoint SIGTRAP with interrupts disabled yields the following BUG:
BUG: sleeping function called from invalid context at
/kernel-source/kernel/locking/rtmutex.c:917
in_atomic(): 0, irqs_disabled(): 128, pid: 551, name: test.sh
CPU: 5 PID: 551 Comm: test.sh Not tainted 4.1.13-rt13 #7
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
dump_backtrace+0x0/0x128
show_stack+0x24/0x30
dump_stack+0x80/0xa0
___might_sleep+0x128/0x1a0
rt_spin_lock+0x2c/0x40
force_sig_info+0xcc/0x210
brk_handler.part.2+0x6c/0x80
brk_handler+0xd8/0xe8
do_debug_exception+0x58/0xb8
This patch fixes the problem by ensuring that interrupts are enabled
prior to sending the SIGTRAP if they were already enabled in the user
context.
Reported-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
Currently the driver tries to probe the pci driver and oops.
Add CN7XXX to case so that driver probes the pcie driver.
Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Cc: david.daney@cavium.com
Cc: matt.redfearn@imgtec.com
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12530/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
In case of error, the function devm_ioremap_resource() returns
ERR_PTR() and never returns NULL. The NULL test in the return
value check should be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: John Crispin <blogic@openwrt.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12451/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
git commit dc7ee00d4771 ("s390: lowcore stack pointer offsets")
introduced a regression in regard to s390_backtrace(). The stack
pointer for the asynchronous stack in the lowcore now has an
additional offset applied. This offset needs to be taken into account
in the calculation for the low and high address for the stack.
This bug was already partially fixed with commit 9cc5c206d9b4
("s390/dumpstack: fix address ranges for asynchronous and panic
stack"). This patch fixes it also for the oprofile code.
Fixes: dc7ee00d4771 ("s390: lowcore stack pointer offsets")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|