From 13bd817bb88499ce1dc1dfdaffcde17fa492aca5 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Mon, 11 May 2015 11:56:01 +0530 Subject: powerpc/thp: Serialize pmd clear against a linux page table walk. Serialize against find_linux_pte_or_hugepte() which does lock-less lookup in page tables with local interrupts disabled. For huge pages it casts pmd_t to pte_t. Since the format of pte_t is different from pmd_t we want to prevent transit from pmd pointing to page table to pmd pointing to huge page (and back) while interrupts are disabled. We clear pmd to possibly replace it with page table pointer in different code paths. So make sure we wait for the parallel find_linux_pte_or_hugepage() to finish. Without this patch, a find_linux_pte_or_hugepte() running in parallel to __split_huge_zero_page_pmd() or do_huge_pmd_wp_page_fallback() or zap_huge_pmd() can run into the above issue. With __split_huge_zero_page_pmd() and do_huge_pmd_wp_page_fallback() we clear the hugepage pte before inserting the pmd entry with a regular pgtable address. Such a clear need to wait for the parallel find_linux_pte_or_hugepte() to finish. With zap_huge_pmd(), we can run into issues, with a hugepage pte getting zapped due to a MADV_DONTNEED while other cpu fault it in as small pages. Reported-by: Kirill A. Shutemov Signed-off-by: Aneesh Kumar K.V Reviewed-by: Kirill A. Shutemov Signed-off-by: Michael Ellerman diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index 59daa5e..6bfadf1 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -839,6 +839,17 @@ pmd_t pmdp_get_and_clear(struct mm_struct *mm, * hash fault look at them. */ memset(pgtable, 0, PTE_FRAG_SIZE); + /* + * Serialize against find_linux_pte_or_hugepte which does lock-less + * lookup in page tables with local interrupts disabled. For huge pages + * it casts pmd_t to pte_t. Since format of pte_t is different from + * pmd_t we want to prevent transit from pmd pointing to page table + * to pmd pointing to huge page (and back) while interrupts are disabled. + * We clear pmd to possibly replace it with page table pointer in + * different code paths. So make sure we wait for the parallel + * find_linux_pte_or_hugepage to finish. + */ + kick_all_cpus_sync(); return old_pmd; } -- cgit v0.10.2 From 7b868e81be38d5ad4f4aa4be819a5fa543cc5ee8 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Mon, 11 May 2015 11:58:29 +0530 Subject: powerpc/mm: Return NULL for not present hugetlb page We need to check whether pte is present in follow_huge_addr() and properly return NULL if mapping is not present. Also use READ_ONCE when dereferencing pte_t address. Without this patch, we may wrongly return a zero pfn page in follow_huge_addr(). Reviewed-by: David Gibson Signed-off-by: Aneesh Kumar K.V Signed-off-by: Michael Ellerman diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 0ce968b..3385e3d 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -689,27 +689,34 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, struct page * follow_huge_addr(struct mm_struct *mm, unsigned long address, int write) { - pte_t *ptep; - struct page *page; + pte_t *ptep, pte; unsigned shift; unsigned long mask, flags; + struct page *page = ERR_PTR(-EINVAL); + + local_irq_save(flags); + ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift); + if (!ptep) + goto no_page; + pte = READ_ONCE(*ptep); /* + * Verify it is a huge page else bail. * Transparent hugepages are handled by generic code. We can skip them * here. */ - local_irq_save(flags); - ptep = find_linux_pte_or_hugepte(mm->pgd, address, &shift); + if (!shift || pmd_trans_huge(__pmd(pte_val(pte)))) + goto no_page; - /* Verify it is a huge page else bail. */ - if (!ptep || !shift || pmd_trans_huge(*(pmd_t *)ptep)) { - local_irq_restore(flags); - return ERR_PTR(-EINVAL); + if (!pte_present(pte)) { + page = NULL; + goto no_page; } mask = (1UL << shift) - 1; - page = pte_page(*ptep); + page = pte_page(pte); if (page) page += (address & mask) / PAGE_SIZE; +no_page: local_irq_restore(flags); return page; } -- cgit v0.10.2 From ffb2d78eca08a1451137583d4e435aecfd6af809 Mon Sep 17 00:00:00 2001 From: Daniel Axtens Date: Tue, 12 May 2015 13:23:59 +1000 Subject: powerpc/mce: fix off by one errors in mce event handling Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in save_mce_event, index got the value of mce_nest_count, and mce_nest_count was incremented *after* index was set. However, that patch changed the behaviour so that mce_nest count was incremented *before* setting index. This causes an off-by-one error, as get_mce_event sets index as mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads bogus data, causing warnings like "Machine Check Exception, Unknown event version 0 !" and breaking MCEs handling. Restore the old behaviour and unbreak MCE handling by subtracting one from the newly incremented value. The same broken change occured in machine_check_queue_event (which set a queue read by machine_check_process_queued_event). Fix that too, unbreaking printing of MCE information. Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") CC: stable@vger.kernel.org CC: Mahesh Salgaonkar CC: Christoph Lameter Signed-off-by: Daniel Axtens Signed-off-by: Michael Ellerman diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index 15c99b6..b2eb468 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -73,7 +73,7 @@ void save_mce_event(struct pt_regs *regs, long handled, uint64_t nip, uint64_t addr) { uint64_t srr1; - int index = __this_cpu_inc_return(mce_nest_count); + int index = __this_cpu_inc_return(mce_nest_count) - 1; struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]); /* @@ -184,7 +184,7 @@ void machine_check_queue_event(void) if (!get_mce_event(&evt, MCE_EVENT_RELEASE)) return; - index = __this_cpu_inc_return(mce_queue_count); + index = __this_cpu_inc_return(mce_queue_count) - 1; /* If queue is full, just return for now. */ if (index >= MAX_MC_EVT) { __this_cpu_dec(mce_queue_count); -- cgit v0.10.2 From 5e95235ccd5442d4a4fe11ec4eb99ba1b7959368 Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Thu, 14 May 2015 14:45:40 +1000 Subject: powerpc: Align TOC to 256 bytes Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in our linker script, otherwise pointers to our TOC variables (__toc_start, __prom_init_toc_start) could be incorrect. If they are bad, we die a few hundred instructions into boot. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard Signed-off-by: Michael Ellerman diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index f096e72..1db6851 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -213,6 +213,7 @@ SECTIONS *(.opd) } + . = ALIGN(256); .got : AT(ADDR(.got) - LOAD_OFFSET) { __toc_start = .; #ifndef CONFIG_RELOCATABLE -- cgit v0.10.2