From 466318a87f28cb3ba0d08a3b7ef1a37ae73d5aa7 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 3 Jun 2013 10:33:55 -0400 Subject: xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU. The xen_play_dead is an undead function. When the vCPU is told to offline it ends up calling xen_play_dead wherin it calls the VCPUOP_down hypercall which offlines the vCPU. However, when the vCPU is onlined back, it resumes execution right after VCPUOP_down hypercall. That was OK (albeit the API for play_dead assumes that the CPU stays dead and never returns) but with commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe as said commit resets the ts->inidle which at the start of the cpu_idle loop was set. The net effect is that we get this warn: Broke affinity for irq 16 installing Xen timer for CPU 1 cpu 1 spinlock event irq 48 ------------[ cut here ]------------ WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0() Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1 Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009 ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0 ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010 0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0 Call Trace: [] dump_stack+0x19/0x1b [] warn_slowpath_common+0x6b/0xa0 [] warn_slowpath_null+0x15/0x20 [] tick_nohz_idle_exit+0x195/0x1b0 [] cpu_startup_entry+0x205/0x250 [] cpu_bringup_and_idle+0x13/0x15 ---[ end trace 915c8c486004dda1 ]--- b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround to call tick_nohz_idle_enter before returning from xen_play_dead() - and that is what this patch does and fixes the issue. We also add the stable part b/c git commit 4b0c0f294 is on the stable tree. CC: stable@vger.kernel.org Suggested-by: Thomas Gleixner Signed-off-by: Konrad Rzeszutek Wilk diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index fb44426..d99cae8 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -447,6 +448,13 @@ static void __cpuinit xen_play_dead(void) /* used only with HOTPLUG_CPU */ play_dead_common(); HYPERVISOR_vcpu_op(VCPUOP_down, smp_processor_id(), NULL); cpu_bringup(); + /* + * commit 4b0c0f294 (tick: Cleanup NOHZ per cpu data on cpu down) + * clears certain data that the cpu_idle loop (which called us + * and that we return from) expects. The only way to get that + * data back is to call: + */ + tick_nohz_idle_enter(); } #else /* !CONFIG_HOTPLUG_CPU */ -- cgit v0.10.2 From b2c75c446ae72387916e2caf6e6dca815b6e5e6e Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 7 Jun 2013 15:26:03 -0400 Subject: xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it. Commit 10a7a0771399a57a297fca9615450dbb3f88081a ("xen: tmem: enable Xen tmem shim to be built/loaded as a module") allows the tmem module to be loaded any time. For this work the frontswap API had to be able to asynchronously to call tmem_frontswap_init before or after the swap image had been set. That was added in git commit 905cd0e1bf9ffe82d6906a01fd974ea0f70be97a ("mm: frontswap: lazy initialization to allow tmem backends to build/run as modules"). Which means we could do this (The common case): modprobe tmem [so calls frontswap_register_ops, no ->init] modifies tmem_frontswap_poolid = -1 swapon /dev/xvda1 [__frontswap_init, calls -> init, tmem_frontswap_poolid is < 0 so tmem hypercall done] Or the failing one: swapon /dev/xvda1 [calls __frontswap_init, sets the need_init bitmap] modprobe tmem [calls frontswap_register_ops, -->init calls, finds out tmem_frontswap_poolid is 0, does not make a hypercall. Later in the module_init, sets tmem_frontswap_poolid=-1] Which meant that in the failing case we would not call the hypercall to initialize the pool and never be able to make any frontswap backend calls. Moving the frontswap_register_ops after setting the tmem_frontswap_poolid fixes it. Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Bob Liu diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c index cc072c6..0f0493c 100644 --- a/drivers/xen/tmem.c +++ b/drivers/xen/tmem.c @@ -379,10 +379,10 @@ static int xen_tmem_init(void) #ifdef CONFIG_FRONTSWAP if (tmem_enabled && frontswap) { char *s = ""; - struct frontswap_ops *old_ops = - frontswap_register_ops(&tmem_frontswap_ops); + struct frontswap_ops *old_ops; tmem_frontswap_poolid = -1; + old_ops = frontswap_register_ops(&tmem_frontswap_ops); if (IS_ERR(old_ops) || old_ops) { if (IS_ERR(old_ops)) return PTR_ERR(old_ops); -- cgit v0.10.2