summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-04-10signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10signals: Allow rt tasks to cache one sigqueue structThomas Gleixner
To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10posix-timers: Prevent broadcast signalsThomas Gleixner
Posix timers should not send broadcast signals and kernel only signals. Prevent it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10preempt-rt: Convert arm boot_lock to rawFrank Rowand
The arm boot_lock is used by the secondary processor startup code. The locking task is the idle thread, which has idle->sched_class == &idle_sched_class. idle_sched_class->enqueue_task == NULL, so if the idle task blocks on the lock, the attempt to wake it when the lock becomes available will fail: try_to_wake_up() ... activate_task() enqueue_task() p->sched_class->enqueue_task(rq, p, flags) Fix by converting boot_lock to a raw spin lock. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4E77B952.3010606@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10ptrace: fix ptrace vs tasklist_lock raceSebastian Andrzej Siewior
As explained by Alexander Fyodorov <halcy@yandex.ru>: |read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel, |and it can remove __TASK_TRACED from task->state (by moving it to |task->saved_state). If parent does wait() on child followed by a sys_ptrace |call, the following race can happen: | |- child sets __TASK_TRACED in ptrace_stop() |- parent does wait() which eventually calls wait_task_stopped() and returns | child's pid |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves | __TASK_TRACED flag to saved_state |- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive() The patch is based on his initial patch where an additional check is added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is taken in case the caller is interrupted between looking into ->state and ->saved_state. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10signal-revert-ptrace-preempt-magic.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10tracing: Account for preempt off in preempt_schedule()Steven Rostedt
The preempt_schedule() uses the preempt_disable_notrace() version because it can cause infinite recursion by the function tracer as the function tracer uses preempt_enable_notrace() which may call back into the preempt_schedule() code as the NEED_RESCHED is still set and the PREEMPT_ACTIVE has not been set yet. See commit: d1f74e20b5b064a130cd0743a256c2d3cfe84010 that made this change. The preemptoff and preemptirqsoff latency tracers require the first and last preempt count modifiers to enable tracing. But this skips the checks. Since we can not convert them back to the non notrace version, we can use the idle() hooks for the latency tracers here. That is, the start/stop_critical_timings() works well to manually start and stop the latency tracer for preempt off timings. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10mips-enable-interrupts-in-signal.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10vtime-split-lock-and-seqcount.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10timekeeping-split-jiffies-lock.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10block: Shorten interrupt disabled regionsThomas Gleixner
Moving the blk_sched_flush_plug() call out of the interrupt/preempt disabled region in the scheduler allows us to replace local_irq_save/restore(flags) by local_irq_disable/enable() in blk_flush_plug(). Now instead of doing this we disable interrupts explicitely when we lock the request_queue and reenable them when we drop the lock. That allows interrupts to be handled when the plug list contains requests for more than one queue. Aside of that this change makes the scope of the irq disabled region more obvious. The current code confused the hell out of me when looking at: local_irq_save(flags); spin_lock(q->queue_lock); ... queue_unplugged(q...); scsi_request_fn(); spin_unlock(q->queue_lock); spin_lock(shost->host_lock); spin_unlock_irq(shost->host_lock); -------------------^^^ ???? spin_lock_irq(q->queue_lock); spin_unlock(q->lock); local_irq_restore(flags); Also add a comment to __blk_run_queue() documenting that q->request_fn() can drop q->queue_lock and reenable interrupts, but must return with q->queue_lock held and interrupts disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de
2014-04-10sched: Consider pi boosting in setschedulerThomas Gleixner
If a PI boosted task policy/priority is modified by a setscheduler() call we unconditionally dequeue and requeue the task if it is on the runqueue even if the new priority is lower than the current effective boosted priority. This can result in undesired reordering of the priority bucket list. If the new priority is less or equal than the current effective we just store the new parameters in the task struct and leave the scheduler class and the runqueue untouched. This is handled when the task deboosts itself. Only if the new priority is higher than the effective boosted priority we apply the change immediately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-04-10sched: Queue RT tasks to head when prio dropsThomas Gleixner
The following scenario does not work correctly: Runqueue of CPUx contains two runnable and pinned tasks: T1: SCHED_FIFO, prio 80 T2: SCHED_FIFO, prio 80 T1 is on the cpu and executes the following syscalls (classic priority ceiling scenario): sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90); ... sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80); ... Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back to sleep the scheduler picks T2. Surprise! The same happens w/o actual preemption when T1 is forced into the scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes pick_next_task() which returns T2. So T1 gets preempted and scheduled out. This happens because sched_setscheduler() dequeues T1 from the prio 90 list and then enqueues it on the tail of the prio 80 list behind T2. This violates the POSIX spec and surprises user space which relies on the guarantee that SCHED_FIFO tasks are not scheduled out unless they give the CPU up voluntarily or are preempted by a higher priority task. In the latter case the preempted task must get back on the CPU after the preempting task schedules out again. We fixed a similar issue already in commit 60db48c (sched: Queue a deboosted task to the head of the RT prio queue). The same treatment is necessary for sched_setscheduler(). So enqueue to head of the prio bucket list if the priority of the task is lowered. It might be possible that existing user space relies on the current behaviour, but it can be considered highly unlikely due to the corner case nature of the application scenario. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-04-10sched: Adjust sched_reset_on_fork when nothing else changesThomas Gleixner
If the policy and priority remain unchanged a possible modification of sched_reset_on_fork gets lost in the early exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
2014-04-10sched: Better debug output for might sleepThomas Gleixner
might sleep can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched: Check for idle task in might_sleep()Thomas Gleixner
Idle is not allowed to call sleeping functions ever! Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10sched: Init idle->on_rq in init_idle()Thomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10Kind of revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"Wolfram Sang
This more or less reverts commit 6391f697d4892a6f233501beea553e13f7745a23. Instead of adding an unneeded 'default', mark the variable to prevent the false positive 'uninitialized var'. The other change (fixing the printout) needs revert, too. We want to know WHICH critical irq failed, not which level it had. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10kernel/SRCU: provide a static initializerSebastian Andrzej Siewior
There are macros for static initializer for the three out of four possible notifier types, that are: ATOMIC_NOTIFIER_HEAD() BLOCKING_NOTIFIER_HEAD() RAW_NOTIFIER_HEAD() This patch provides a static initilizer for the forth type to make it complete. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: convert ctx_alloc_lock raw_spinlock_tAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: convert spinlock_t to raw_spinlock_t in mmu_context_tAllen Pais
Issue debugged by Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc64: use generic rwsem spinlocks rtAllen Pais
Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10sparc: provide EARLY_PRINTK for SPARCKirill Tkhai
sparc does not have CONFIG_EARLY_PRINTK option. So early-printk-consolidate.patch breaks compilation: arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15e4): undefined reference to `early_console' arch/sparc/built-in.o: In function `setup_arch': (.init.text+0x15ec): undefined reference to `early_console' The below addition fixes that. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10early-printk-consolidate.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-04-10rcu: Don't activate RCU core on NO_HZ_FULL CPUsPaul E. McKenney
Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see if the RCU core needs anything from this CPU. If so, RCU raises RCU_SOFTIRQ to carry out any needed processing. This approach has worked well historically, but it is undesirable on NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time in userspace, so that scheduling-clock interrupts can be disabled while there is only one runnable task on the CPU in question. Unfortunately, raising any softirq has the potential to wake up ksoftirqd, which would provide the second runnable task on that CPU, preventing disabling of scheduling-clock interrupts. What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone, relying on the grace-period kthreads' quiescent-state forcing to do any needed RCU work on behalf of those CPUs. This commit therefore refrains from raising RCU_SOFTIRQ on any NO_HZ_FULL CPUs during any grace periods that have been in effect for less than one second. The one-second limit handles the case where an inappropriate workload is running on a NO_HZ_FULL CPU that features lots of scheduling-clock interrupts, but no idle or userspace time. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <bitbucket@online.de> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-04-10net: make neigh_priv_len in struct net_device 16bit instead of 8bitSebastian Siewior
neigh_priv_len is defined as u8. With all debug enabled struct ipoib_neigh has 200 bytes. The largest part is sk_buff_head with 96 bytes and here the spinlock with 72 bytes. The size value still fits in this u8 leaving some room for more. On -RT struct ipoib_neigh put on weight and has 392 bytes. The main reason is sk_buff_head with 288 and the fatty here is spinlock with 192 bytes. This does no longer fit into into neigh_priv_len and gcc complains. This patch changes neigh_priv_len from being 8bit to 16bit. Since the following element (dev_id) is 16bit followed by a spinlock which is aligned, the struct remains with a total size of 3200 (allmodconfig) / 2048 (with as much debug off as possible) bytes on x86-64. On x86-32 the struct is 1856 (allmodconfig) / 1216 (with as much debug off as possible) bytes long. The numbers were gained with and without the patch to prove that this change does not increase the size of the struct. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-10lockdep: Correctly annotate hardirq context in irq_exit()Peter Zijlstra
There was a reported deadlock on -rt which lockdep didn't report. It turns out that in irq_exit() we tell lockdep that the hardirq context ends and then do all kinds of locking afterwards. To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly recorded as happening from hardirq context. This however leads to the 'fun' little problem of running softirqs while in hardirq context. To cure this make the softirq code a little more complex (in the CONFIG_TRACE_IRQFLAGS case). Due to stack swizzling arch dependent trickery we cannot pass an argument to __do_softirq() to tell it if it was done from hardirq context or not; so use a side-band argument. When we do __do_softirq() from hardirq context, 'atomically' flip to softirq context and back, so that no locking goes without being in either hard- or soft-irq context. I didn't find any new problems in mainline using this patch, but it did show the -rt problem. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-03-24Linux 3.12.15Jiri Slaby
2014-03-24PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failuresZhang Rui
commit 89935315f192abf7068d0044cefc84f162c3c81f upstream. Before commit b355cee88e3b (ACPI / resources: ignore invalid ACPI device resources), if acpi_dev_resource_memory()/acpi_dev_resource_io() returns false, it means the the resource is not a memeory/IO resource. But after commit b355cee88e3b, those functions return false if the given memory/IO resource entry is invalid (the length of the resource is zero). This breaks pnpacpi_allocated_resource(), because it now recognizes the invalid memory/io resources as resources of unknown type. Thus users see confusing warning messages on machines with zero length ACPI memory/IO resources. Fix the problem by rearranging pnpacpi_allocated_resource() so that it calls acpi_dev_resource_memory() for memory type and IO type resources only, respectively. Fixes: b355cee88e3b (ACPI / resources: ignore invalid ACPI device resources) Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-and-tested-by: Julian Wollrath <jwollrath@web.de> Reported-and-tested-by: Paul Bolle <pebolle@tiscali.nl> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24arm64: mm: Add double logical invert to pte accessorsSteve Capper
commit 84fe6826c28f69d8708bd575faed7f75e6b6f57f upstream. Page table entries on ARM64 are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(*pte), 1); where pte_dirty(*pte) is downcast to an int. This patch adds a double logical invert to all the pte_ accessors to ensure predictable downcasting. Signed-off-by: Steve Capper <steve.capper@linaro.org> [steve.capper@linaro.org: rebased patch to leave pte_write alone to allow for merge with 3.13 stable] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24bio-integrity: Fix bio_integrity_verify segment start bugNicholas Bellinger
commit 5837c80e870bc3b12ac6a98cdc9ce7a9522a8fb6 upstream. This patch addresses a bug in bio_integrity_verify() code that has been causing DIF READ verify operations to be silently skipped. The issue is that bio->bi_idx will have been incremented within bio_advance() code in the normal blk_update_request() -> req_bio_endio() completion path, and bio_integrity_verify() is using bio_for_each_segment() which starts the bio segment walk at the current bio->bi_idx. So instead use bio_for_each_segment_all() to always start the bio segment walk from zero, regardless of the current bio->bi_idx value after bio_advance() has been called. (Context change for v3.10.y -> v3.13.y code - nab) Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24MIPS: include linux/types.hQais Yousef
commit 87c99203fea897fbdd84b681ad9fced2517dcf98 upstream. The file uses u16 type but doesn't include its definition explicitly I was getting this error when including this header in my driver: arch/mips/include/asm/mipsregs.h:644:33: error: unknown type name ‘u16’ Signed-off-by: Qais Yousef <qais.yousef@imgtec.com> Reviewed-by: Steven J. Hill <Steven.Hill@imgtec.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: John Crispin <blogic@openwrt.org> Patchwork: http://patchwork.linux-mips.org/patch/6212/ Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24ALSA: oxygen: modify adjust_dg_dac_routing functionRoman Volkov
commit 1f91ecc14deea9461aca93273d78871ec4d98fcd upstream. When selecting the audio output destinations (headphones, FP headphones, multichannel output), the channel routing should be changed depending on what destination selected. Also unnecessary I2S channels are digitally muted. This function called when the user selects the destination in the ALSA mixer. Signed-off-by: Roman Volkov <v1ron@mail.ru> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPUSuresh Siddha
commit 731bd6a93a6e9172094a2322bd0ee964bb1f4d63 upstream. For non-eager fpu mode, thread's fpu state is allocated during the first fpu usage (in the context of device not available exception). This (math_state_restore()) can be a blocking call and hence we enable interrupts (which were originally disabled when the exception happened), allocate memory and disable interrupts etc. But the eager-fpu mode, call's the same math_state_restore() from kernel_fpu_end(). The assumption being that tsk_used_math() is always set for the eager-fpu mode and thus avoid the code path of enabling interrupts, allocating fpu state using blocking call and disable interrupts etc. But the below issue was noticed by Maarten Baert, Nate Eldredge and few others: If a user process dumps core on an ecrypt fs while aesni-intel is loaded, we get a BUG() in __find_get_block() complaining that it was called with interrupts disabled; then all further accesses to our ecrypt fs hang and we have to reboot. The aesni-intel code (encrypting the core file that we are writing) needs the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(), the latter of which calls math_state_restore(). So after kernel_fpu_end(), interrupts may be disabled, which nobody seems to expect, and they stay that way until we eventually get to __find_get_block() which barfs. For eager fpu, most the time, tsk_used_math() is true. At few instances during thread exit, signal return handling etc, tsk_used_math() might be false. In kernel_fpu_end(), for eager-fpu, call math_state_restore() only if tsk_used_math() is set. Otherwise, don't bother. Kernel code path which cleared tsk_used_math() knows what needs to be done with the fpu state. Reported-by: Maarten Baert <maarten-baert@hotmail.com> Reported-by: Nate Eldredge <nate@thatsmathematics.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Suresh Siddha <sbsiddha@gmail.com> Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa Cc: George Spelvin <linux@horizon.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24SCSI: storvsc: NULL pointer dereference fixAles Novak
commit b12bb60d6c350b348a4e1460cd68f97ccae9822e upstream. If the initialization of storvsc fails, the storvsc_device_destroy() causes NULL pointer dereference. storvsc_bus_scan() scsi_scan_target() __scsi_scan_target() scsi_probe_and_add_lun(hostdata=NULL) scsi_alloc_sdev(hostdata=NULL) sdev->hostdata = hostdata now the host allocation fails __scsi_remove_device(sdev) calls sdev->host->hostt->slave_destroy() == storvsc_device_destroy(sdev) access of sdev->hostdata->request_mempool Signed-off-by: Ales Novak <alnovak@suse.cz> Signed-off-by: Thomas Abraham <tabraham@suse.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24SCSI: qla2xxx: Fix multiqueue MSI-X registration.Chad Dupuis
commit f324777ea88bab2522602671e46fc0851d7d5e35 upstream. This fixes requesting of the MSI-X vectors for the base response queue. The iteration in the for loop in qla24xx_enable_msix() was incorrect. We should only iterate of the first two MSI-X vectors and not the total number of MSI-X vectors that have given to the driver for this device from pci_enable_msix() in this function. Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24SCSI: qla2xxx: Poll during initialization for ISP25xx and ISP83xxGiridhar Malavali
commit b77ed25c9f8402e8b3e49e220edb4ef09ecfbb53 upstream. Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24SCSI: isci: correct erroneous for_each_isci_host macroLukasz Dorau
commit c59053a23d586675c25d789a7494adfdc02fba57 upstream. In the first place, the loop 'for' in the macro 'for_each_isci_host' (drivers/scsi/isci/host.h:314) is incorrect, because it accesses the 3rd element of 2 element array. After the 2nd iteration it executes the instruction: ihost = to_pci_info(pdev)->hosts[2] (while the size of the 'hosts' array equals 2) and reads an out of range element. In the second place, this loop is incorrectly optimized by GCC v4.8 (see http://marc.info/?l=linux-kernel&m=138998871911336&w=2). As a result, on platforms with two SCU controllers, the loop is executed more times than it can be (for i=0,1 and 2). It causes kernel panic during entering the S3 state and the following oops after 'rmmod isci': BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0 Oops: 0000 [#1] SMP RIP: 0010:[<ffffffff8131360b>] [<ffffffff8131360b>] __list_add+0x1b/0xc0 Call Trace: [<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0 [<ffffffff81661c3f>] mutex_lock+0x1f/0x30 [<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas] [<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas] [<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci] [<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci] [<ffffffff813391cb>] pci_device_remove+0x3b/0xb0 [<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0 [<ffffffff813fc8f8>] driver_detach+0xa8/0xb0 [<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120 [<ffffffff813fcf2c>] driver_unregister+0x2c/0x50 [<ffffffff813381f3>] pci_unregister_driver+0x23/0x80 [<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci] [<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0 [<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0 [<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b The loop has been corrected. This patch fixes kernel panic during entering the S3 state and the above oops. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Tested-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24SCSI: isci: fix reset timeout handlingDan Williams
commit ddfadd7736b677de2d4ca2cd5b4b655368c85a7a upstream. Remove an erroneous BUG_ON() in the case of a hard reset timeout. The reset timeout handler puts the port into the "awaiting link-up" state. The timeout causes the device to be disconnected and we need to be in the awaiting link-up state to re-connect the port. The BUG_ON() made the incorrect assumption that resets never timeout and we always complete the reset in the "resetting" state. Testing this patch also uncovered that libata continues to attempt to reset the port long after the driver has torn down the context. Once the driver has committed to abandoning the link it must indicate to libata that recovery ends by returning -ENODEV from ->lldd_I_T_nexus_reset(). Acked-by: Lukasz Dorau <lukasz.dorau@intel.com> Reported-by: David Milburn <dmilburn@redhat.com> Reported-by: Xun Ni <xun.ni@intel.com> Tested-by: Xun Ni <xun.ni@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24can: flexcan: flexcan_remove(): add missing netif_napi_del()Marc Kleine-Budde
commit d96e43e8fce28cf97df576a07af9d65657a41a6f upstream. This patch adds the missing netif_napi_del() to the flexcan_remove() function. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24can: flexcan: factor out transceiver {en,dis}able into seperate functionsMarc Kleine-Budde
commit f003698e23f6f56a791774f14d0ac35d04872490 upstream. This patch moves the transceiver enable and disable into seperate functions, where the NULL pointer check is hidden. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24can: flexcan: fix transition from and to low power mode in chip_{en,dis}ableMarc Kleine-Budde
commit 9b00b300e7bce032c467c36ca47fe2a776887fc2 upstream. In flexcan_chip_enable() and flexcan_chip_disable() fixed delays are used. Experiments have shown that the transition from and to low power mode may take several microseconds. This patch adds a while loop which polls the Low Power Mode ACK bit (LPM_ACK) that indicates a successfull mode change. If the function runs into a timeout a error value is returned. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() failsMarc Kleine-Budde
commit 7e9e148af01ef388efb6e2490805970be4622792 upstream. If flexcan_chip_start() in flexcan_open() fails, the interrupt is not freed, this patch adds the missing cleanup. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24can: flexcan: fix shutdown: first disable chip, then all interruptsMarc Kleine-Budde
commit 5be93bdda64e85450598c6e97f79fb8f6acf30e0 upstream. When shutting down the CAN interface (ifconfig canX down) during high CAN bus loads, the CAN core might hang and freeze the whole CPU. This patch fixes the shutdown sequence by first disabling the CAN core then disabling all interrupts. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24net: unix socket code abuses csum_partialAnton Blanchard
commit 0a13404dd3bf4ea870e3d96270b5a382edca85c0 upstream. The unix socket code is using the result of csum_partial to hash into a lookup table: unix_hash_fold(csum_partial(sunaddr, len, 0)); csum_partial is only guaranteed to produce something that can be folded into a checksum, as its prototype explains: * returns a 32-bit number suitable for feeding into itself * or csum_tcpudp_magic The 32bit value should not be used directly. Depending on the alignment, the ppc64 csum_partial will return different 32bit partial checksums that will fold into the same 16bit checksum. This difference causes the following testcase (courtesy of Gustavo) to sometimes fail: #include <sys/socket.h> #include <stdio.h> int main() { int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0); int i = 1; setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4); struct sockaddr addr; addr.sa_family = AF_LOCAL; bind(fd, &addr, 2); listen(fd, 128); struct sockaddr_storage ss; socklen_t sslen = (socklen_t)sizeof(ss); getsockname(fd, (struct sockaddr*)&ss, &sslen); fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0); if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){ perror(NULL); return 1; } printf("OK\n"); return 0; } As suggested by davem, fix this by using csum_fold to fold the partial 32bit checksum into a 16bit checksum before using it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24dm cache: fix access beyond end of origin deviceHeinz Mauelshagen
commit e893fba90c09f9b57fb97daae204ea9cc2c52fa5 upstream. In order to avoid wasting cache space a partial block at the end of the origin device is not cached. Unfortunately, the check for such a partial block at the end of the origin device was flawed. Fix accesses beyond the end of the origin device that occured due to attempted promotion of an undetected partial block by: - initializing the per bio data struct to allow cache_end_io to work properly - recognizing access to the partial block at the end of the origin device - avoiding out of bounds access to the discard bitset Otherwise, users can experience errors like the following: attempt to access beyond end of device dm-5: rw=0, want=20971520, limit=20971456 ... device-mapper: cache: promotion failed; couldn't copy block Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24dm cache: fix truncation bug when copying a block to/from >2TB fast deviceHeinz Mauelshagen
commit 8b9d96666529a979acf4825391efcc7c8a3e9f12 upstream. During demotion or promotion to a cache's >2TB fast device we must not truncate the cache block's associated sector to 32bits. The 32bit temporary result of from_cblock() caused a 32bit multiplication when calculating the sector of the fast device in issue_copy_real(). Use an intermediate 64bit type to store the 32bit from_cblock() to allow for proper 64bit multiplication. Here is an example of how this bug manifests on an ext4 filesystem: EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt. JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24dm space map metadata: fix refcount decrement below 0 which caused corruptionJoe Thornber
commit cebc2de44d3bce53e46476e774126c298ca2c8a9 upstream. This has been a relatively long-standing issue that wasn't nailed down until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014, see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html From that report: "When decreasing the reference count of a metadata block with its reference count equals 3, we will call dm_btree_remove() to remove this enrty from the B+tree which keeps the reference count info in metadata device. The B+tree will try to rebalance the entry of the child nodes in each node it traversed, and the rebalance process contains the following steps. (1) Finding the corresponding children in current node (shadow_current(s)) (2) Shadow the children block (issue BOP_INC) (3) redistribute keys among children, and free children if necessary (issue BOP_DEC) Since the update of a metadata block's reference count could be recursive, we will stash these reference count update operations in smm->uncommitted and then process them in a FILO fashion. The problem is that step(3) could free the children which is created in step(2), so the BOP_DEC issued in step(3) will be carried out before the BOP_INC issued in step(2) since these BOPs will be processed in FILO fashion. Once the BOP_DEC from step(3) tries to decrease the reference count of newly shadow block, it will report failure for its reference equals 0 before decreasing. It looks like we can solve this issue by processing these BOPs in a FIFO fashion instead of FILO." Commit 5b564d80 ("dm space map: disallow decrementing a reference count below zero") changed the code to report an error for this temporary refcount decrement below zero. So what was previously a harmless invalid refcount became a hard failure due to the new error path: device-mapper: space map common: unable to decrement a reference count below 0 device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22 device-mapper: thin: 253:6: switching pool to read-only mode This bug is in dm persistent-data code that is common to the DM thin and cache targets. So any users of those targets should apply this fix. Fix this by applying recursive space map operations in FIFO order rather than FILO. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801 Reported-by: Apollon Oikonomopoulos <apoikos@debian.org> Reported-by: edwillam1007@gmail.com Reported-by: Teng-Feng Yang <shinrairis@gmail.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24mm/compaction: break out of loop on !PageBuddy in isolate_freepages_blockLaura Abbott
commit 2af120bc040c5ebcda156df6be6a66610ab6957f upstream. We received several reports of bad page state when freeing CMA pages previously allocated with alloc_contig_range: BUG: Bad page state in process Binder_A pfn:63202 page:d21130b0 count:0 mapcount:1 mapping: (null) index:0x7dfbf page flags: 0x40080068(uptodate|lru|active|swapbacked) Based on the page state, it looks like the page was still in use. The page flags do not make sense for the use case though. Further debugging showed that despite alloc_contig_range returning success, at least one page in the range still remained in the buddy allocator. There is an issue with isolate_freepages_block. In strict mode (which CMA uses), if any pages in the range cannot be isolated, isolate_freepages_block should return failure 0. The current check keeps track of the total number of isolated pages and compares against the size of the range: if (strict && nr_strict_required > total_isolated) total_isolated = 0; After taking the zone lock, if one of the pages in the range is not in the buddy allocator, we continue through the loop and do not increment total_isolated. If in the last iteration of the loop we isolate more than one page (e.g. last page needed is a higher order page), the check for total_isolated may pass and we fail to detect that a page was skipped. The fix is to bail out if the loop immediately if we are in strict mode. There's no benfit to continuing anyway since we need all pages to be isolated. Additionally, drop the error checking based on nr_strict_required and just check the pfn ranges. This matches with what isolate_freepages_range does. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-24vmxnet3: fix building without CONFIG_PCI_MSIArnd Bergmann
commit 0a8d8c446b5429d15ff2d48f46e00d8a08552303 upstream. Since commit d25f06ea466e "vmxnet3: fix netpoll race condition", the vmxnet3 driver fails to build when CONFIG_PCI_MSI is disabled, because it unconditionally references the vmxnet3_msix_rx() function. To fix this, use the same #ifdef in the caller that exists around the function definition. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Shreyas Bhatewara <sbhatewara@vmware.com> Cc: "VMware, Inc." <pv-drivers@vmware.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>