summaryrefslogtreecommitdiff
path: root/arch/x86/mm
AgeCommit message (Collapse)Author
2016-02-08x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug()Ingo Molnar
numa_clear_kernel_node_hotplug() uses memblock_set_node() without checking for failures. memblock_set_node() is a complex function that might extend the memblock array - which extension might fail - so check for this possibility. It's not supposed to happen (because realistically if we have so little memory that this fails then we likely won't be able to boot anyway), but do the check nevertheless. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Brad Spengler <spender@grsecurity.net> Cc: Chen Tang <imtangchen@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: PaX Team <pageexec@freemail.hu> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: y14sg1 <y14sg1@comcast.net> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-08x86/mm/numa: Clean up numa_clear_kernel_node_hotplug()Ingo Molnar
So we fixed an overflow bug in numa_clear_kernel_node_hotplug(): 2b54ab3c66d4 ("x86/mm/numa: Fix memory corruption on 32-bit NUMA kernels") ... and the bug was indirectly caused by poor coding style, such as using start/end local variables unnecessarily, which lost the physaddr_t type. So make the code more readable and try to fully comment all the thinking behind the logic. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Brad Spengler <spender@grsecurity.net> Cc: Chen Tang <imtangchen@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: PaX Team <pageexec@freemail.hu> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: y14sg1 <y14sg1@comcast.net> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-08Merge branch 'x86/urgent' into x86/mm, to pick up dependent fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-08x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernelsIngo Molnar
The following commit: a0acda917284 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable") Introduced numa_clear_kernel_node_hotplug(), which function is executed during early bootup, and which marks all currently reserved memblock regions as hot-memory-unswappable as well. y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels, the grsecurity/PAX kernel patch flagged a size overflow in this function: PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...] ... the reason for the overflow is that memblock_clear_hotplug() takes physical addresses as arguments, while the start/end variables used by numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE kernels, but which has 64-bit physical addresses. So on 32-bit PAE kernels that have physical memory above the 4GB boundary, we truncate a 64-bit physical address range to 32 bits and pass it to memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug bugfix from working, but might have other side effects as well. The fix is to use the proper type to handle physical addresses, phys_addr_t. Reported-by: y14sg1 <y14sg1@comcast.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Brad Spengler <spender@grsecurity.net> Cc: Chen Tang <imtangchen@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: PaX Team <pageexec@freemail.hu> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-06mm, hugetlb: don't require CMA for runtime gigantic pagesVlastimil Babka
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") has added the runtime gigantic page allocation via alloc_contig_range(), making this support available only when CONFIG_CMA is enabled. Because it doesn't depend on MIGRATE_CMA pageblocks and the associated infrastructure, it is possible with few simple adjustments to require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA. After this patch, alloc_contig_range() and related functions are available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION enabled. Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION. This allows supporting runtime gigantic pages without the CMA-specific checks in page allocator fastpaths. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-03Merge branch 'linus' into efi/core, to refresh the branch and to pick up ↵Ingo Molnar
recent fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A bit on the largish side due to a series of fixes for a regression in the x86 vector management which was introduced in 4.3. This work was started in December already, but it took some time to fix all corner cases and a couple of older bugs in that area which were detected while at it Aside of that a few platform updates for intel-mid, quark and UV and two fixes for in the mm code: - Use proper types for pgprot values to avoid truncation - Prevent a size truncation in the pageattr code when setting page attributes for large mappings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/mm/pat: Avoid truncation when converting cpa->numpages to address x86/mm: Fix types used in pgprot cacheability flags translations x86/platform/quark: Print boundaries correctly x86/platform/UV: Remove EFI memmap quirk for UV2+ x86/platform/intel-mid: Join string and fix SoC name x86/platform/intel-mid: Enable 64-bit build x86/irq: Plug vector cleanup race x86/irq: Call irq_force_move_complete with irq descriptor x86/irq: Remove outgoing CPU from vector cleanup mask x86/irq: Remove the cpumask allocation from send_cleanup_vector() x86/irq: Clear move_in_progress before sending cleanup IPI x86/irq: Remove offline cpus from vector cleanup x86/irq: Get rid of code duplication x86/irq: Copy vectormask instead of an AND operation x86/irq: Check vector allocation early x86/irq: Reorganize the search in assign_irq_vector x86/irq: Reorganize the return path in assign_irq_vector x86/irq: Do not use apic_chip_data.old_domain as temporary buffer x86/irq: Validate that irq descriptor is still active x86/irq: Fix a race in x86_vector_free_irqs() ...
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29x86/mm/pat: Avoid truncation when converting cpa->numpages to addressMatt Fleming
There are a couple of nasty truncation bugs lurking in the pageattr code that can be triggered when mapping EFI regions, e.g. when we pass a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting left by PAGE_SHIFT will truncate the resultant address to 32-bits. Viorel-Cătălin managed to trigger this bug on his Dell machine that provides a ~5GB EFI region which requires 1236992 pages to be mapped. When calling populate_pud() the end of the region gets calculated incorrectly in the following buggy expression, end = start + (cpa->numpages << PAGE_SHIFT); And only 188416 pages are mapped. Next, populate_pud() gets invoked for a second time because of the loop in __change_page_attr_set_clr(), only this time no pages get mapped because shifting the remaining number of pages (1048576) by PAGE_SHIFT is zero. At which point the loop in __change_page_attr_set_clr() spins forever because we fail to map progress. Hitting this bug depends very much on the virtual address we pick to map the large region at and how many pages we map on the initial run through the loop. This explains why this issue was only recently hit with the introduction of commit a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down") It's interesting to note that safe uses of cpa->numpages do exist in the pageattr code. If instead of shifting ->numpages we multiply by PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and so the result is unsigned long. To avoid surprises when users try to convert very large cpa->numpages values to addresses, change the data type from 'int' to 'unsigned long', thereby making it suitable for shifting by PAGE_SHIFT without any type casting. The alternative would be to make liberal use of casting, but that is far more likely to cause problems in the future when someone adds more code and fails to cast properly; this bug was difficult enough to track down in the first place. Reported-and-tested-by: Viorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131 Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-20x86/mm: Make kmap_prot into a #defineAndy Lutomirski
The value (once we initialize it) is a foregone conclusion. Make it a #define to save a tiny amount of text and data size and to make it more comprehensible. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0850eb0213de9da88544ff7fae72dc6d06d2b441.1453239349.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-20x86/mm/32: Set NX in __supported_pte_mask before enabling pagingAndy Lutomirski
There's a short window in which very early mappings can end up with NX clear because they are created before we've noticed that we have NX. It turns out that we detect NX very early, so there's no need to defer __supported_pte_mask setup. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2b544627345f7110160545a3f47031eb45c3ad4f.1453239349.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-19x86/mm: Streamline and restore probe_memory_block_size()Seth Jennings
The cumulative effect of the following two commits: bdee237c0343 ("x86: mm: Use 2GB memory block size on large-memory x86-64 systems") 982792c782ef ("x86, mm: probe memory block size for generic x86 64bit") ... is some pretty convoluted code. The first commit also removed code for the UV case without stated reason, which might lead to unexpected change in behavior. This commit has no other (intended) functional change; just seeks to simplify and make the code more understandable, beyond restoring the UV behavior. The whole section with the "tail size" doesn't seem to be reachable, since both the >= 64GB and < 64GB case return, so it was removed. Signed-off-by: Seth Jennings <sjennings@variantweb.net> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1448902063-18885-1-git-send-email-sjennings@variantweb.net [ Rewrote the title and changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-16mm, x86: get_user_pages() for dax mappingsDan Williams
A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver has established a devm_memremap_pages() mapping, i.e. when the pfn_t return from ->direct_access() has PFN_DEV and PFN_MAP set. Later, when encountering _PAGE_DEVMAP during a page table walk we lookup and pin a struct dev_pagemap instance to keep the result of pfn_to_page() valid until put_page(). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Hansen <dave@sr71.net> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16mm, dax: convert vmf_insert_pfn_pmd() to pfn_tDan Williams
Similar to the conversion of vm_insert_mixed() use pfn_t in the vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the pfn is backed by a devm_memremap_pages() mapping. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16x86, mm: introduce vmem_altmap to augment vmemmap_populate()Dan Williams
In support of providing struct page for large persistent memory capacities, use struct vmem_altmap to change the default policy for allocating memory for the memmap array. The default vmemmap_populate() allocates page table storage area from the page allocator. Given persistent memory capacities relative to DRAM it may not be feasible to store the memmap in 'System Memory'. Instead vmem_altmap represents pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf() requests. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: kbuild test robot <lkp@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16x86, thp: remove infrastructure for handling splitting PMDsKirill A. Shutemov
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16mm: drop tail page refcountingKirill A. Shutemov
Tail page refcounting is utterly complicated and painful to support. It uses ->_mapcount on tail pages to store how many times this page is pinned. get_page() bumps ->_mapcount on tail page in addition to ->_count on head. This information is required by split_huge_page() to be able to distribute pins from head of compound page to tails during the split. We will need ->_mapcount to account PTE mappings of subpages of the compound page. We eliminate need in current meaning of ->_mapcount in tail pages by forbidding split entirely if the page is pinned. The only user of tail page refcounting is THP which is marked BROKEN for now. Let's drop all this mess. It makes get_page() and put_page() much simpler. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-15x86: mm: support ARCH_MMAP_RND_BITSDaniel Cashman
x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for 64-bit, to generate the random offset for the mmap base address. This value represents a compromise between increased ASLR effectiveness and avoiding address-space fragmentation. Replace it with a Kconfig option, which is sensibly bounded, so that platform developers may choose where to place this compromise. Keep default values as new minimums. Signed-off-by: Daniel Cashman <dcashman@google.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Don Zickus <dzickus@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Rientjes <rientjes@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Nick Kralevich <nnk@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Borislav Petkov <bp@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-12x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNEDKefeng Wang
Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <guohanjun@huawei.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12x86/mm/pat: Make split_page_count() check for empty levels to fix ↵Dave Jones
/proc/meminfo output In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages. Unfortunatly when we split up mappings during boot, split_page_count() doesn't take this into account, and starts decrementing an empty direct_pages_count[] level. This results in /proc/meminfo showing crazy things like: DirectMap2M: 18446744073709543424 kB Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12Merge commit 'linus' into x86/urgent, to pick up recent x86 changesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - make the debugfs 'kernel_page_tables' file read-only, as it only has read ops. (Borislav Petkov) - micro-optimize clflush_cache_range() (Chris Wilson) - swiotlb enhancements, which fixes certain KVM emulated devices (Igor Mammedov) - fix an LDT related debug message (Jan Beulich) - modularize CONFIG_X86_PTDUMP (Kees Cook) - tone down an overly alarming warning (Laura Abbott) - Mark variable __initdata (Rasmus Villemoes) - PAT additions (Toshi Kani)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Micro-optimise clflush_cache_range() x86/mm/pat: Change free_memtype() to support shrinking case x86/mm/pat: Add untrack_pfn_moved for mremap x86/mm: Drop WARN from multi-BAR check x86/LDT: Print the real LDT base address x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN x86/mm: Introduce max_possible_pfn x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata x86/mm: Turn CONFIG_X86_PTDUMP into a module
2016-01-11x86/mm: Add barriers and document switch_mm()-vs-flush synchronizationAndy Lutomirski
When switch_mm() activates a new PGD, it also sets a bit that tells other CPUs that the PGD is in use so that TLB flush IPIs will be sent. In order for that to work correctly, the bit needs to be visible prior to loading the PGD and therefore starting to fill the local TLB. Document all the barriers that make this work correctly and add a couple that were missing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-08x86/mm: Micro-optimise clflush_cache_range()Chris Wilson
Whilst inspecting the asm for clflush_cache_range() and some perf profiles that required extensive flushing of single cachelines (from part of the intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading boot_cpu_data.x86_clflush_size on every iteration of the loop. We can manually hoist that read which perf regarded as taking ~25% of the function time for a single cacheline flush. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@suse.de> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com> Link: http://lkml.kernel.org/r/1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-05x86/mm/pat: Change free_memtype() to support shrinking caseToshi Kani
Using mremap() to shrink the map size of a VM_PFNMAP range causes the following error message, and leaves the pfn range allocated. x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff] This is because rbt_memtype_erase(), called from free_memtype() with spin_lock held, only supports to free a whole memtype node in memtype_rbroot. Therefore, this patch changes rbt_memtype_erase() to support a request that shrinks the size of a memtype node for mremap(). memtype_rb_exact_match() is renamed to memtype_rb_match(), and is enhanced to support EXACT_MATCH and END_MATCH in @match_type. Since the memtype_rbroot tree allows overlapping ranges, rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free a whole node for the munmap case. If no such entry is found, it then checks with END_MATCH, i.e. shrink the size of a node from the end for the mremap case. On the mremap case, rbt_memtype_erase() proceeds in two steps, 1) remove the node, and then 2) insert the updated node. This allows proper update of augmented values, subtree_max_end, in the tree. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: stsp@list.ru Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-05x86/mm/pat: Add untrack_pfn_moved for mremapToshi Kani
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following WARN_ON_ONCE() message in untrack_pfn(). WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0() Call Trace: [<ffffffff817729ea>] dump_stack+0x45/0x57 [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0 [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0 [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860 [<ffffffff811d3725>] unmap_vmas+0x55/0xb0 [<ffffffff811d916c>] unmap_region+0xac/0x120 [<ffffffff811db86a>] do_munmap+0x28a/0x460 [<ffffffff811dec33>] move_vma+0x1b3/0x2e0 [<ffffffff811df113>] SyS_mremap+0x3b3/0x510 [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71 MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is called with the old vma after its pfnmap page table has been removed, which causes follow_phys() to fail. The new vma has a new pfnmap to the same pfn & cache type with VM_PAT set. Therefore, we only need to clear VM_PAT from the old vma in this case. Add untrack_pfn_moved(), which clears VM_PAT from a given old vma. move_vma() is changed to call this function with the old vma when VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn() is a no-op since VM_PAT is cleared. Reported-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-29x86/mm: Drop WARN from multi-BAR checkLaura Abbott
ioremapping multiple BARs produces a warning with a message "Your kernel is fine". This message mostly serves to comfort kernel developers. Users do not read the message, they only see the big scary warning which means something must be horribly broken with their system. Less dramatically, the warn also sets the taint flag which makes it difficult to differentiate problems. If the kernel is actually fine as the warning claims it doesn't make sense for it to be tainted. Change the WARN_ONCE to a pr_warn with the caller of the ioremap. Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Link: http://lkml.kernel.org/r/1450728074-31029-1-git-send-email-labbott@fedoraproject.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19x86/cpufeature: Remove unused and seldomly used cpu_has_xx macrosBorislav Petkov
Those are stupid and code should use static_cpu_has_safe() or boot_cpu_has() instead. Kill the least used and unused ones. The remaining ones need more careful inspection before a conversion can happen. On the TODO. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de Cc: David Sterba <dsterba@suse.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matt Mackall <mpm@selenic.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19Merge branch 'linus' into x86/cleanupsThomas Gleixner
Pull in upstream changes so we can apply depending patches.
2015-12-15Fix user-visible spelling errorLinus Torvalds
Pavel Machek reports a warning about W+X pages found in the "Persisent" kmap area. After grepping for it (using the correct spelling), and not finding it, I noticed how the debug printk was just misspelled. Fix it. The actual mapping bug that Pavel reported is still open. It's apparently a separate issue from the known EFI page tables, looks like it's related to the HIGHMEM mappings. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-06x86/mm: Introduce max_possible_pfnIgor Mammedov
max_possible_pfn will be used for tracking max possible PFN for memory that isn't present in E820 table and could be hotplugged later. By default max_possible_pfn is initialized with max_pfn, but later it could be updated with highest PFN of hotpluggable memory ranges declared in ACPI SRAT table if any present. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: fujita.tomonori@lab.ntt.co.jp Cc: konrad.wilk@oracle.com Cc: pbonzini@redhat.com Cc: revers@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-05x86/mpx: Fix instruction decoder conditionDave Hansen
MPX decodes instructions in order to tell which bounds register was violated. Part of this decoding involves looking at the "REX prefix" which is a special instrucion prefix used to retrofit support for new registers in to old instructions. The X86_REX_*() macros are defined to return actual bit values: #define X86_REX_R(rex) ((rex) & 4) *not* boolean values. However, the MPX code was checking for them like they were booleans. This might have led to us mis-decoding the "REX prefix" and giving false information out to userspace about bounds violations. X86_REX_B() actually is bit 1, so this is really only broken for the X86_REX_X() case. Fix the conditionals up to tolerate the non-boolean values. Fixes: fcc7ffd67991 "x86, mpx: Decode MPX instruction to get bound violation information" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151201003113.D800C1E0@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-04x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-onlyBorislav Petkov
File should be created with S_IRUSR and not with S_IWUSR too because writing to it doesn't make any sense. I mean, we don't have a ->write method anyway but let's have the permissions correct too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448885579-32506-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29x86/mm/pat: Ensure cpa->pfn only contains page frame numbersMatt Fleming
The x86 pageattr code is confused about the data that is stored in cpa->pfn, sometimes it's treated as a page frame number, sometimes it's treated as an unshifted physical address, and in one place it's treated as a pte. The result of this is that the mapping functions do not map the intended physical address. This isn't a problem in practice because most of the addresses we're mapping in the EFI code paths are already mapped in 'trampoline_pgd' and so the pageattr mapping functions don't actually do anything in this case. But when we move to using a separate page table for the EFI runtime this will be an issue. Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-25x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_deferJuergen Gross
pte_update_defer can be removed as it is always set to the same function as pte_update. So any usage of pte_update_defer() can be replaced by pte_update(). pmd_update and pmd_update_defer are always set to paravirt_nop, so they can just be nuked. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: jeremy@goop.org Cc: chrisw@sous-sol.org Cc: akataria@vmware.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xen.org Cc: konrad.wilk@oracle.com Cc: david.vrabel@citrix.com Cc: boris.ostrovsky@oracle.com Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-23x86/mm: Turn CONFIG_X86_PTDUMP into a moduleKees Cook
Being able to examine page tables is handy, so make this a module that can be loaded as needed. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-22Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This update contains: - MPX updates for handling 32bit processes - A fix for a long standing bug in 32bit signal frame handling related to FPU/XSAVE state - Handle get_xsave_addr() correctly in KVM - Fix SMAP check under paravirtualization - Add a comment to the static function trace entry to avoid further confusion about the difference to dynamic tracing" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Fix SMAP check in PVOPS environments x86/ftrace: Add comment on static function tracing x86/fpu: Fix get_xsave_addr() behavior under virtualization x86/fpu: Fix 32-bit signal frame handling x86/mpx: Fix 32-bit address space calculation x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels
2015-11-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A couple of fixes and updates related to x86: - Fix the W+X check regression on XEN - The real fix for the low identity map trainwreck - Probe legacy PIC early instead of unconditionally allocating legacy irqs - Add cpu verification to long mode entry - Adjust the cache topology to AMD Fam17H systems - Let Merrifield use the TSC across S3" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Call verify_cpu() after having entered long mode too x86/setup: Fix low identity map for >= 2GB kernel range x86/mm: Skip the hypervisor range when walking PGD x86/AMD: Fix last level cache topology for AMD Fam17h systems x86/irq: Probe for PIC presence before allocating descs for legacy IRQs x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
2015-11-12x86/mpx: Fix 32-bit address space calculationDave Hansen
I received a bug report that running 32-bit MPX binaries on 64-bit kernels was broken. I traced it down to this little code snippet. We were switching our "number of bounds directory entries" calculation correctly. But, we didn't switch the other side of the calculation: the virtual space size. This meant that we were calculating an absurd size for bd_entry_virt_space() on 32-bit because we used the 64-bit virt_space. This was _also_ broken for 32-bit kernels running on 64-bit hardware since boot_cpu_data.x86_virt_bits=48 even when running in 32-bit mode. Correct that and properly handle all 3 possible cases: 1. 32-bit binary on 64-bit kernel 2. 64-bit binary on 64-bit kernel 3. 32-bit binary on 32-bit kernel This manifested in having bounds tables not properly unmapped. It "leaked" memory but had no functional impact otherwise. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-12x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernelsDave Hansen
When you call get_user(foo, bar), you effectively do a copy_from_user(&foo, bar, sizeof(*bar)); Note that the sizeof() is implicit. When we reach out to userspace to try to zap an entire "bounds table" we need to go read a "bounds directory entry" in order to locate the table's address. The size of a "directory entry" depends on the binary being run and is always the size of a pointer. But, when we have a 64-bit kernel and a 32-bit application, the directory entry is still only 32-bits long, but we fetch it with a 64-bit pointer which makes get_user() does a 64-bit fetch. Reading 4 extra bytes isn't harmful, unless we are at the end of and run off the table. It might also cause the zero page to get faulted in unnecessarily even if you are not at the end. Fix it up by doing a special 32-bit get_user() via a cast when we have 32-bit userspace. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave@sr71.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-10Merge tag 'libnvdimm-for-4.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "Outside of the new ACPI-NFIT hot-add support this pull request is more notable for what it does not contain, than what it does. There were a handful of development topics this cycle, dax get_user_pages, dax fsync, and raw block dax, that need more more iteration and will wait for 4.5. The patches to make devm and the pmem driver NUMA aware have been in -next for several weeks. The hot-add support has not, but is contained to the NFIT driver and is passing unit tests. The coredump support is straightforward and was looked over by Jeff. All of it has received a 0day build success notification across 107 configs. Summary: - Add support for the ACPI 6.0 NFIT hot add mechanism to process updates of the NFIT at runtime. - Teach the coredump implementation how to filter out DAX mappings. - Introduce NUMA hints for allocations made by the pmem driver, and as a side effect all devm allocations now hint their NUMA node by default" * tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: coredump: add DAX filtering for FDPIC ELF coredumps coredump: add DAX filtering for ELF coredumps acpi: nfit: Add support for hot-add nfit: in acpi_nfit_init, break on a 0-length table pmem, memremap: convert to numa aware allocations devm_memremap_pages: use numa_mem_id devm: make allocations numa aware by default devm_memremap: convert to return ERR_PTR devm_memunmap: use devres_release() pmem: kill memremap_pmem() x86, mm: quiet arch_add_memory()
2015-11-09kmap_atomic_to_page() has no users, remove itNicolas Pitre
Removal started in commit 5bbeed12bdc3 ("sparc32: drop unused kmap_atomic_to_page"). Let's do it across the whole tree. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-07x86/mm: Skip the hypervisor range when walking PGDBoris Ostrovsky
The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved for hypervisor and therefore we should not try to follow PGD's indexes corresponding to those addresses. While this has always been a problem, with the new W+X warning mechanism ptdump_walk_pgd_level_core() can now be called during boot, causing a PV Xen guest to crash. [ tglx: Replaced the macro with a readable inline ] Fixes: e1a58320a38d "x86/mm: Warn on W^X mappings" Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/1446749795-27764-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-06kasan: update log messagesAndrey Konovalov
We decided to use KASAN as the short name of the tool and KernelAddressSanitizer as the full one. Update log messages according to that. Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Konstantin Serebryany <kcc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main changes are: continued PAT work by Toshi Kani, plus a new boot time warning about insecure RWX kernel mappings, by Stephen Smalley. The new CONFIG_DEBUG_WX=y warning is marked default-y if CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as these bugs are hard to notice and this check already found several live bugs" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Warn on W^X mappings x86/mm: Fix no-change case in try_preserve_large_page() x86/mm: Fix __split_large_page() to handle large PAT bit x86/mm: Fix try_preserve_large_page() to handle large PAT bit x86/mm: Fix gup_huge_p?d() to handle large PAT bit x86/mm: Fix slow_virt_to_phys() to handle large PAT bit x86/mm: Fix page table dump to show PAT bit x86/asm: Add pud_pgprot() and pmd_pgprot() x86/asm: Fix pud/pmd interfaces to handle large PAT bit x86/asm: Add pud/pmd mask interfaces to handle large PAT bit x86/asm: Move PUD_PAGE macros to page_types.h x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2015-11-04Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu changes from Ingo Molnar: "There are two main areas of changes: - Rework of the extended FPU state code to robustify the kernel's usage of cpuid provided xstate sizes - and related changes (Dave Hansen)" - math emulation enhancements: new modern FPU instructions support, with testcases, plus cleanups (Denys Vlasnko)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/fpu: Fixup uninitialized feature_name warning x86/fpu/math-emu: Add support for FISTTP instructions x86/fpu/math-emu, selftests: Add test for FISTTP instructions x86/fpu/math-emu: Add support for FCMOVcc insns x86/fpu/math-emu: Add support for F[U]COMI[P] insns x86/fpu/math-emu: Remove define layer for undocumented opcodes x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns x86/fpu/math-emu: Remove !NO_UNDOC_CODE x86/fpu: Check CPU-provided sizes against struct declarations x86/fpu: Check to ensure increasing-offset xstate offsets x86/fpu: Correct and check XSAVE xstate size calculations x86/fpu: Add C structures for AVX-512 state components x86/fpu: Rework YMM definition x86/fpu/mpx: Rework MPX 'xstate' types x86/fpu: Add xfeature_enabled() helper instead of test_bit() x86/fpu: Remove 'xfeature_nr' x86/fpu: Rework XSTATE_* macros to remove magic '2' x86/fpu: Rename XFEATURES_NR_MAX x86/fpu: Rename XSAVE macros x86/fpu: Remove partial LWP support definitions ...
2015-11-04Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...
2015-10-27Merge tag 'efi-next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi Pull EFI fix from Matt Fleming: - Fix a kernel panic by not passing EFI virtual mapping addresses to __pa() in the x86 pageattr code. Since these virtual addreses are not part of the direct mapping or kernel text mapping, passing them to __pa() will trigger a BUG_ON() when CONFIG_DEBUG_VIRTUAL is enabled. (Sai Praneeth Prakhya) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-25x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabledSai Praneeth
When CONFIG_DEBUG_VIRTUAL is enabled, all accesses to __pa(address) are monitored to see whether address falls in direct mapping or kernel text mapping (see Documentation/x86/x86_64/mm.txt for details), if it does not, the kernel panics. During 1:1 mapping of EFI runtime services we access virtual addresses which are == physical addresses, thus the 1:1 mapping and these addresses do not fall in either of the above two regions and hence when passed as arguments to __pa() kernel panics as reported by Dave Hansen here https://lkml.kernel.org/r/5462999A.7090706@intel.com. So, before calling __pa() virtual addresses should be validated which results in skipping call to split_page_count() and that should be fine because it is used to keep track of everything *but* 1:1 mappings. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Reported-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Glenn P Williamson <glenn.p.williamson@intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>