From 831d52bc153971b70e64eccfbed2b232394f22f8 Mon Sep 17 00:00:00 2001 From: Suresh Siddha Date: Thu, 3 Feb 2011 12:20:04 -0800 Subject: x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha Acked-by: Ingo Molnar Cc: stable@kernel.org [v2.6.32+] Signed-off-by: Linus Torvalds diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4a2d4e0..8b5393e 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -36,8 +36,6 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, unsigned cpu = smp_processor_id(); if (likely(prev != next)) { - /* stop flush ipis for the previous mm */ - cpumask_clear_cpu(cpu, mm_cpumask(prev)); #ifdef CONFIG_SMP percpu_write(cpu_tlbstate.state, TLBSTATE_OK); percpu_write(cpu_tlbstate.active_mm, next); @@ -47,6 +45,9 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, /* Re-load page tables */ load_cr3(next->pgd); + /* stop flush ipis for the previous mm */ + cpumask_clear_cpu(cpu, mm_cpumask(prev)); + /* * load the LDT, if the LDT is different: */ -- cgit v0.10.2 From f266a5110d453b7987194460ac7edd31f1a5426c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 3 Feb 2011 15:09:41 +0100 Subject: lockdep, timer: Fix del_timer_sync() annotation Calling local_bh_enable() will want to actually start processing softirqs, which isn't a good idea since this can get called with IRQs disabled. Cure this by using _local_bh_enable() which doesn't start processing softirqs, and use raw_local_irq_save() to avoid any softirqs from happening without letting lockdep think IRQs are in fact disabled. Reported-by: Nick Bowler Signed-off-by: Peter Zijlstra Reviewed-by: Yong Zhang LKML-Reference: <20110203141548.039540914@chello.nl> Signed-off-by: Thomas Gleixner diff --git a/kernel/timer.c b/kernel/timer.c index 43ca993..d53ce66 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -969,10 +969,14 @@ EXPORT_SYMBOL(try_to_del_timer_sync); int del_timer_sync(struct timer_list *timer) { #ifdef CONFIG_LOCKDEP + unsigned long flags; + + raw_local_irq_save(flags); local_bh_disable(); lock_map_acquire(&timer->lockdep_map); lock_map_release(&timer->lockdep_map); - local_bh_enable(); + _local_bh_enable(); + raw_local_irq_restore(flags); #endif /* * don't use it in hardirq context, because it -- cgit v0.10.2