summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-20powerpc: Update currituck pci/usb fixup for new board revisionAlistair Popple
The currituck board uses a different IRQ for the pci usb host controller depending on the board revision. This patch adds support for newer board revisions by retrieving the board revision from the FPGA and mapping the appropriate IRQ. Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Fix single step emulation of 32bit overflowed branchesMichael Neuling
Check truncate_if_32bit() on final write to nip. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Update default configurationsAlistair Popple
Update default configurations for systems with CONFIG_BOOTX_TEXT selected so that they continue to print early debug messages as is currently the case. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Add a configuration option for early BootX/OpenFirmware debugAlistair Popple
Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/prom: Scan reserved-ranges node for memory reservationsJeremy Kerr
Based on benh's proposal at https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-September/101237.html, this change provides support for reserving memory from the reserved-ranges node at the root of the device tree. We just call memblock_reserve on these ranges for now. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/mm: Make mmap_64.c compile on 32bit powerpcDaniel Walker
There appears to be no good reason to keep this as 64bit only. It works on 32bit also, and has checks so that it can work correctly with 32bit binaries on 64bit hardware which is why I think this works. I tested this on qemu using the virtex-ml507 machine type. Before, /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfd03000-bfd24000 rw-p 00000000 00:00 0 [stack] /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 0fe6e000-0ffd8000 r-xp 00000000 00:01 214 /lib/libc-2.11.3.so 0ffd8000-0ffe8000 ---p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffe8000-0ffed000 rw-p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffed000-0fff0000 rw-p 00000000 00:00 0 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48020000-48021000 rw-p 00000000 00:00 0 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bf98a000-bf9ab000 rw-p 00000000 00:00 0 [stack] /bin2 # ./test & cat /proc/${!}/maps 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 0fe6e000-0ffd8000 r-xp 00000000 00:01 214 /lib/libc-2.11.3.so 0ffd8000-0ffe8000 ---p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffe8000-0ffed000 rw-p 0016a000 00:01 214 /lib/libc-2.11.3.so 0ffed000-0fff0000 rw-p 00000000 00:00 0 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test 48000000-48020000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so 48020000-48021000 rw-p 00000000 00:00 0 48021000-48023000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfa54000-bfa75000 rw-p 00000000 00:00 0 [stack] After, bash-4.1# ./test & cat /proc/${!}/maps [7] 803 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7eb0000-b7ed0000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7ed1000-b7ed3000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfbc0000-bfbe1000 rw-p 00000000 00:00 0 [stack] bash-4.1# ./test & cat /proc/${!}/maps [8] 805 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7b03000-b7b23000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7b24000-b7b26000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bfc27000-bfc48000 rw-p 00000000 00:00 0 [stack] bash-4.1# ./test & cat /proc/${!}/maps [9] 807 00100000-00103000 r-xp 00000000 00:00 0 [vdso] 10000000-10007000 r-xp 00000000 00:01 454 /bin2/test 10017000-10018000 rw-p 00007000 00:01 454 /bin2/test b7f37000-b7f57000 r-xp 00000000 00:01 224 /lib/ld-2.11.3.so b7f58000-b7f5a000 rw-p 00021000 00:01 224 /lib/ld-2.11.3.so bff96000-bffb7000 rw-p 00000000 00:00 0 [stack] Signed-off-by: Daniel Walker <dwalker@fifo90.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Remove the unneeded trigger of decrementer interrupt in ↵Kevin Hao
decrementer_check_overflow Previously in order to handle the edge sensitive decrementers, we choose to set the decrementer to 1 to trigger a decrementer interrupt when re-enabling interrupts. But with the rework of the lazy EE, we would replay the decrementer interrupt when re-enabling interrupts if a decrementer interrupt occurs with irq soft-disabled. So there is no need to trigger a decrementer interrupt in this case any more. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/mm/nohash: Ignore NULL stale_map entriesScott Wood
This happens with threads that are offline due to CPU hotplug (including threads that were never "plugged in" to begin with because SMT is disabled). Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Move the single step enable code to a generic pathSuzuki K. Poulose
This patch moves the single step enable code used by kprobe to a generic routine header so that, it can be re-used by other code, in this case, uprobes. No functional changes. Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Ananth N Mavinakaynahalli <ananth@in.ibm.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@ozlabs.org Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/kprobes: Do not disable External interrupts during single stepSuzuki K. Poulose
External/Decrement exceptions have lower priority than the Debug Exception. So, we don't have to disable the External interrupts before a single step. However, on BookE, Critical Input Exception(CE) has higher priority than a Debug Exception. Hence we mask them. Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ananth N Mavinakaynahalli <ananth@in.ibm.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc: Mark low level irq handlers NO_THREADThomas Gleixner
These low level handlers cannot be threaded. Mark them NO_THREAD Reported-by: leroy christophe <christophe.leroy@c-s.fr> Tested-by: leroy christophe <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20mm/THP: deposit the transpare huge pgtable before set_pmdAneesh Kumar K.V
Architectures like powerpc use the deposited pgtable to store hash index values. We need to make the deposted pgtable is visible to other cpus before we are ready to take a hash fault. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20mm/THP: don't use HPAGE_SHIFT in transparent hugepage codeAneesh Kumar K.V
For architectures like powerpc that support multiple explicit hugepage sizes, HPAGE_SHIFT indicate the default explicit hugepage shift. For THP to work the hugepage size should be same as PMD_SIZE. So use PMD_SHIFT directly. So move the define outside CONFIG_TRANSPARENT_HUGEPAGE #ifdef because we want to use these defines in generic code with if (pmd_trans_huge()) conditional. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20mm/THP: withdraw the pgtable after pmdp related operationsAneesh Kumar K.V
For architectures like ppc64 we look at deposited pgtable when calling pmdp_get_and_clear. So do the pgtable_trans_huge_withdraw after finishing pmdp related operations. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20mm/THP: add pmd args to pgtable deposit and withdraw APIsAneesh Kumar K.V
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20mm/thp: use the correct function when updating access flagsAneesh Kumar K.V
We should use pmdp_set_access_flags to update access flags. Archs like powerpc use extra checks(_PAGE_BUSY) when updating a hugepage PTE. A set_pmd_at doesn't do those checks. We should use set_pmd_at only when updating a none hugepage PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com>a Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-13Merge tag 'acpi-3.10-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "This is an alternative fix for the regression introduced in 3.9 whose previous fix had to be reverted right before 3.10-rc5, because it broke one of the Tony's machines. In this one the check is confined to the ACPI video driver (which is the only one causing the problem to happen in the first place) and the Tony's box shouldn't even notice it. - ACPI fix for an issue causing ACPI video driver to attempt to bind to devices it shouldn't touch from Rafael J Wysocki." * tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / video: Do not bind to device objects with a scan handler
2013-06-13Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Another set of fixes, the biggest bit of this is yet another tweak to the UEFI anti-bricking code; apparently we finally got some feedback from Samsung as to what makes at least their systems fail. This set should actually fix the boot regressions that some other systems (e.g. SGI) have exhibited. Other than that, there is a patch to avoid a panic with particularly unhappy memory layouts and two minor protocol fixes which may or may not be manifest bugs" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix typo in kexec register clearing x86, relocs: Move __vvar_page from S_ABS to S_REL Modify UEFI anti-bricking code x86: Fix adjust_range_size_mask calling position
2013-06-13Merge branch 'rcu/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fixes from Paul McKenney: "I must confess that this past merge window was not RCU's best showing. This series contains three more fixes for RCU regressions: 1. A fix to __DECLARE_TRACE_RCU() that causes it to act as an interrupt from idle rather than as a task switch from idle. This change is needed due to the recent use of _rcuidle() tracepoints that can be invoked from interrupt handlers as well as from idle. Without this fix, invoking _rcuidle() tracepoints from interrupt handlers results in splats and (more seriously) confusion on RCU's part as to whether a given CPU is idle or not. This confusion can in turn result in too-short grace periods and therefore random memory corruption. 2. A fix to a subtle deadlock that could result due to RCU doing a wakeup while holding one of its rcu_node structure's locks. Although the probability of occurrence is low, it really does happen. The fix, courtesy of Steven Rostedt, uses irq_work_queue() to avoid the deadlock. 3. A fix to a silent deadlock (invisible to lockdep) due to the interaction of timeouts posted by RCU debug code enabled by CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU hotplug operations. This will not occur in production kernels, but really does occur in randconfig testing. Diagnosis courtesy of Steven Rostedt" * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration rcu: Don't call wakeup() with rcu_node structure ->lock held trace: Allow idle-safe tracepoints to be called from irq
2013-06-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three kvm related memory management fixes, a fix for show_trace, a fix for early console output and a patch from Ben to help prevent compile errors in regard to irq functions (or our lack thereof)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: Implement IRQ functions if !PCI s390/sclp: fix new line detection s390/pgtable: make pgste lock an explicit barrier s390/pgtable: Save pgste during modify_prot_start/commit s390/dumpstack: fix address ranges for asynchronous and panic stack s390/pgtable: Fix guest overindication for change bit
2013-06-13Merge tag 'asoc-v3.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound Pull ASoC sound updates from Mark Brown: "Takashi is travelling at the minute and it'd be good to get the MAINTAINERS update in here merged so sending directly. As well as the usual driver specifics we've got a couple of core fixes here, one fixing capabilities for unidirectional streams and the other fixing suspend while audio streams are active. The suspend fix is a little involved but mostly as a result of removing some special casing that was doing the wrong thing." * tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound: ASoC: tlv320aic3x: Remove deadlock from snd_soc_dapm_put_volsw_aic3x() ASoC: dapm: Treat DAI widgets like AIF widgets for power ASoC: arizona: Correct AEC loopback enable ASoC: pcm: Require both CODEC and CPU support when declaring stream caps MAINTAINERS: Remove myself from Wolfson maintainers ASoC: wm8994: Ensure microphone detection state is reset on removal ASoC: wm8994: Avoid leaking pm_runtime reference on removed jack race ASoC: cs42l52: fix hp_gain_enum shift value. ASoC: cs42l52: use correct PCM mixer TLV dB scale to match datasheet.
2013-06-13Merge tag 'md-3.10-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md bugfixes from Neil Brown: "A few bugfixes for md Some tagged for -stable" * tag 'md-3.10-fixes' of git://neil.brown.name/md: md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place md/raid1,raid10: use freeze_array in place of raise_barrier in various places. md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it. md: md_stop_writes() should always freeze recovery.
2013-06-13turbostat: Increase output buffer size to accommodate C8-C10Josh Triplett
On platforms with C8-C10 support, the additional C-states cause turbostat to overrun its output buffer of 128 bytes per CPU. Increase this to 256 bytes per CPU. [ As a bugfix, this should go into 3.10; however, since the C8-C10 support didn't go in until after 3.9, this need not go into any stable kernel. ] Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <len.brown@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13Merge tag 'efi-urgent' into x86/urgentH. Peter Anvin
* More tweaking to the EFI variable anti-bricking algorithm. Quite a few users were reporting boot regressions in v3.9. This has now been fixed with a more accurate "minimum storage requirement to avoid bricking" value from Samsung (5K instead of 50%) and code to trigger garbage collection when we near our limit - Matthew Garrett. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-13md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in placeH. Peter Anvin
There are cases where the kernel will believe that the WRITE SAME command is supported by a block device which does not, in fact, support WRITE SAME. This currently happens for SATA drivers behind a SAS controller, but there are probably a hundred other ways that can happen, including drive firmware bugs. After receiving an error for WRITE SAME the block layer will retry the request as a plain write of zeroes, but mdraid will consider the failure as fatal and consider the drive failed. This has the effect that all the mirrors containing a specific set of data are each offlined in very rapid succession resulting in data loss. However, just bouncing the request back up to the block layer isn't ideal either, because the whole initial request-retry sequence should be inside the write bitmap fence, which probably means that md needs to do its own conversion of WRITE SAME to write zero. Until the failure scenario has been sorted out, disable WRITE SAME for raid1, raid5, and raid10. [neilb: added raid5] This patch is appropriate for any -stable since 3.7 when write_same support was added. Cc: stable@vger.kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-13md/raid1,raid10: use freeze_array in place of raise_barrier in various places.NeilBrown
Various places in raid1 and raid10 are calling raise_barrier when they really should call freeze_array. The former is only intended to be called from "make_request". The later has extra checks for 'nr_queued' and makes a call to flush_pending_writes(), so it is safe to call it from within the management thread. Using raise_barrier will sometimes deadlock. Using freeze_array should not. As 'freeze_array' currently expects one request to be pending (in handle_read_error - the only previous caller), we need to pass it the number of pending requests (extra) to ignore. The deadlock was made particularly noticeable by commits 050b66152f87c7 (raid10) and 6b740b8d79252f13 (raid1) which appeared in 3.4, so the fix is appropriate for any -stable kernel since then. This patch probably won't apply directly to some early kernels and will need to be applied by hand. Cc: stable@vger.kernel.org Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-13md/raid1: consider WRITE as successful only if at least one non-Faulty and ↵Alex Lyakas
non-rebuilding drive completed it. Without that fix, the following scenario could happen: - RAID1 with drives A and B; drive B was freshly-added and is rebuilding - Drive A fails - WRITE request arrives to the array. It is failed by drive A, so r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B succeeds in writing it, so the same r1_bio is marked as R1BIO_Uptodate. - r1_bio arrives to handle_write_finished, badblocks are disabled, md_error()->error() does nothing because we don't fail the last drive of raid1 - raid_end_bio_io() calls call_bio_endio() - As a result, in call_bio_endio(): if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) clear_bit(BIO_UPTODATE, &bio->bi_flags); this code doesn't clear the BIO_UPTODATE flag, and the whole master WRITE succeeds, back to the upper layer. So we returned success to the upper layer, even though we had written the data onto the rebuilding drive only. But when we want to read the data back, we would not read from the rebuilding drive, so this data is lost. [neilb - applied identical change to raid10 as well] This bug can result in lost data, so it is suitable for any -stable kernel. Cc: stable@vger.kernel.org Signed-off-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-13md: md_stop_writes() should always freeze recovery.NeilBrown
__md_stop_writes() will currently sometimes freeze recovery. So any caller must be ready for that to happen, and indeed they are. However if __md_stop_writes() doesn't freeze_recovery, then a recovery could start before mddev_suspend() is called, which could be awkward. This can particularly cause problems or dm-raid. So change __md_stop_writes() to always freeze recovery. This is safe and more predicatable. Reported-by: Brassow Jonathan <jbrassow@redhat.com> Tested-by: Brassow Jonathan <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking update from David Miller: 1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to dump all objects properly, from Pablo Neira Ayuso. 2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is present. Fix from Phil Oester. 3) qdisc_get_rtab() looks for an existing matching rate table and uses that instead of creating a new one. However, it's key matching is incomplete, it fails to check to make sure the ->data[] array is identical too. Fix from Eric Dumazet. 4) ip_vs_dest_entry isn't fully initialized before copying back to userspace, fix from Dan Carpenter. 5) Fix ubuf reference counting regression in vhost_net, from Jason Wang. 6) When sock_diag dumps a socket filter back to userspace, we have to translate it out of the kernel's internal representation first. From Nicolas Dichtel. 7) davinci_mdio holds a spinlock while calling pm_runtime, which sleeps. Fix from Sebastian Siewior. 8) Timeout check in sh_eth_check_reset is off by one, from Sergei Shtylyov. 9) If sctp socket init fails, we can NULL deref during cleanup. Fix from Daniel Borkmann. 10) netlink_mmap() does not propagate errors properly, from Patrick McHardy. 11) Disable powersave and use minstrel by default in ath9k. From Sujith Manoharan. 12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets which prevents vhost from being able to use zerocopy. From Jason Wang. 13) Fix race between port lookup and TX path in team driver, from Jiri Pirko. 14) Missing length checks in bluetooth L2CAP packet parsing, from Johan Hedberg. 15) rtlwifi fails to connect to networking using any encryption method other than WPA2. Fix from Larry Finger. 16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power management stuff. From Yijing Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) b43: stop format string leaking into error msgs ath9k: Use minstrel rate control by default Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity" ath9k: Disable PowerSave by default net: wireless: iwlegacy: fix build error for il_pm_ops rtlwifi: Fix a false leak indication for PCI devices wl12xx/wl18xx: scan all 5ghz channels wl12xx: increase minimum singlerole firmware version required wl12xx: fix minimum required firmware version for wl127x multirole rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks mwifiex: debugfs: Fix out of bounds array access Bluetooth: Fix mgmt handling of power on failures Bluetooth: Fix missing length checks for L2CAP signalling PDUs Bluetooth: btmrvl: support Marvell Bluetooth device SD8897 Bluetooth: Fix checks for LE support on LE-only controllers team: fix checks in team_get_first_port_txable_rcu() team: move add to port list before port enablement team: check return value of team_get_port_by_index_rcu() for NULL tuntap: set SOCK_ZEROCOPY flag during open netlink: fix error propagation in netlink_mmap() ...
2013-06-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull input layer bugfix from Jiri Kosina: "Memory leak regression fix from Benjamin Tissoires" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: multitouch: prevent memleak with the allocated name
2013-06-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "Outside of bcache (which really isn't super big), these are all few-liners. There are a few important fixes in here: - Fix blk pm sleeping when holding the queue lock - A small collection of bcache fixes that have been done and tested since bcache was included in this merge window. - A fix for a raid5 regression introduced with the bio changes. - Two important fixes for mtip32xx, fixing an oops and potential data corruption (or hang) due to wrong bio iteration on stacked devices." * 'for-linus' of git://git.kernel.dk/linux-block: scatterlist: sg_set_buf() argument must be in linear mapping raid5: Initialize bi_vcnt pktcdvd: silence static checker warning block: remove refs to XD disks from documentation blkpm: avoid sleep when holding queue lock mtip32xx: Correctly handle bio->bi_idx != 0 conditions mtip32xx: Fix NULL pointer dereference during module unload bcache: Fix error handling in init code bcache: clarify free/available/unused space bcache: drop "select CLOSURES" bcache: Fix incompatible pointer type warning
2013-06-12Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge misc fixes from Andrew Morton: "Bunch of fixes and one little addition to math64.h" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits) include/linux/math64.h: add div64_ul() mm: memcontrol: fix lockless reclaim hierarchy iterator frontswap: fix incorrect zeroing and allocation size for frontswap_map kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() mm: migration: add migrate_entry_wait_huge() ocfs2: add missing lockres put in dlm_mig_lockres_handler mm/page_alloc.c: fix watermark check in __zone_watermark_ok() drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() aio: fix io_destroy() regression by using call_rcu() rtc-at91rm9200: use shadow IMR on at91sam9x5 rtc-at91rm9200: add shadow interrupt mask rtc-at91rm9200: refactor interrupt-register handling rtc-at91rm9200: add configuration support rtc-at91rm9200: add match-table compile guard fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree cciss: fix broken mutex usage in ioctl audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel ...
2013-06-12include/linux/math64.h: add div64_ul()Alex Shi
There is div64_long() to handle the s64/long division, but no mocro do u64/ul division. It is necessary in some scenarios, so add this function. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alex Shi <alex.shi@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12mm: memcontrol: fix lockless reclaim hierarchy iteratorJohannes Weiner
The lockless reclaim hierarchy iterator currently has a misplaced barrier that can lead to use-after-free crashes. The reclaim hierarchy iterator consist of a sequence count and a position pointer that are read and written locklessly, with memory barriers enforcing ordering. The write side sets the position pointer first, then updates the sequence count to "publish" the new position. Likewise, the read side must read the sequence count first, then the position. If the sequence count is up to date, it's guaranteed that the position is up to date as well: writer: reader: iter->position = position if iter->sequence == expected: smp_wmb() smp_rmb() iter->sequence = sequence position = iter->position However, the read side barrier is currently misplaced, which can lead to dereferencing stale position pointers that no longer point to valid memory. Fix this. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Reviewed-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Glauber Costa <glommer@parallels.com> Cc: <stable@kernel.org> [3.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12frontswap: fix incorrect zeroing and allocation size for frontswap_mapAkinobu Mita
The bitmap accessed by bitops must have enough size to hold the required numbers of bits rounded up to a multiple of BITS_PER_LONG. And the bitmap must not be zeroed by memset() if the number of bits cleared is not a multiple of BITS_PER_LONG. This fixes incorrect zeroing and allocation size for frontswap_map. The incorrect zeroing part doesn't cause any problem because frontswap_map is freed just after zeroing. But the wrongly calculated allocation size may cause the problem. For 32bit systems, the allocation size of frontswap_map is about twice as large as required size. For 64bit systems, the allocation size is smaller than requeired if the number of bits is not a multiple of BITS_PER_LONG. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()Chen Gang
audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect the rule itself freed in kill_rules(). The reason is when it is killed, the 'rule' itself may have already released, we should not access it. one example: we add a rule to an inode, just at the same time the other task is deleting this inode. The work flow for adding a rule: audit_receive() -> (need audit_cmd_mutex lock) audit_receive_skb() -> audit_receive_msg() -> audit_receive_filter() -> audit_add_rule() -> audit_add_tree_rule() -> (need audit_filter_mutex lock) ... unlock audit_filter_mutex get_tree() ... iterate_mounts() -> (iterate all related inodes) tag_mount() -> tag_trunk() -> create_trunk() -> (assume it is 1st rule) fsnotify_add_mark() -> fsnotify_add_inode_mark() -> (add mark to inode->i_fsnotify_marks) ... get_tree(); (each inode will get one) ... lock audit_filter_mutex The work flow for deleting an inode: __destroy_inode() -> fsnotify_inode_delete() -> __fsnotify_inode_delete() -> fsnotify_clear_marks_by_inode() -> (get mark from inode->i_fsnotify_marks) fsnotify_destroy_mark() -> fsnotify_destroy_mark_locked() -> audit_tree_freeing_mark() -> evict_chunk() -> ... tree->goner = 1 ... kill_rules() -> (assume current->audit_context == NULL) call_rcu() -> (rule->tree != NULL) audit_free_rule_rcu() -> audit_free_rule() ... audit_schedule_prune() -> (assume current->audit_context == NULL) kthread_run() -> (need audit_cmd_mutex and audit_filter_mutex lock) prune_one() -> (delete it from prue_list) put_tree(); (match the original get_tree above) Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12mm: migration: add migrate_entry_wait_huge()Naoya Horiguchi
When we have a page fault for the address which is backed by a hugepage under migration, the kernel can't wait correctly and do busy looping on hugepage fault until the migration finishes. As a result, users who try to kick hugepage migration (via soft offlining, for example) occasionally experience long delay or soft lockup. This is because pte_offset_map_lock() can't get a correct migration entry or a correct page table lock for hugepage. This patch introduces migration_entry_wait_huge() to solve this. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [2.6.35+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12ocfs2: add missing lockres put in dlm_mig_lockres_handlerXue jiufei
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12mm/page_alloc.c: fix watermark check in __zone_watermark_ok()Tomasz Stanislawski
The watermark check consists of two sub-checks. The first one is: if (free_pages <= min + lowmem_reserve) return false; The check assures that there is minimal amount of RAM in the zone. If CMA is used then the free_pages is reduced by the number of free pages in CMA prior to the over-mentioned check. if (!(alloc_flags & ALLOC_CMA)) free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); This prevents the zone from being drained from pages available for non-movable allocations. The second check prevents the zone from getting too fragmented. for (o = 0; o < order; o++) { free_pages -= z->free_area[o].nr_free << o; min >>= 1; if (free_pages <= min) return false; } The field z->free_area[o].nr_free is equal to the number of free pages including free CMA pages. Therefore the CMA pages are subtracted twice. This may cause a false positive fail of __zone_watermark_ok() if the CMA area gets strongly fragmented. In such a case there are many 0-order free pages located in CMA. Those pages are subtracted twice therefore they will quickly drain free_pages during the check against fragmentation. The test fails even though there are many free non-cma pages in the zone. This patch fixes this issue by subtracting CMA pages only for a purpose of (free_pages <= min + lowmem_reserve) check. Laura said: We were observing allocation failures of higher order pages (order 5 = 128K typically) under tight memory conditions resulting in driver failure. The output from the page allocation failure showed plenty of free pages of the appropriate order/type/zone and mostly CMA pages in the lower orders. For full disclosure, we still observed some page allocation failures even after applying the patch but the number was drastically reduced and those failures were attributed to fragmentation/other system issues. Signed-off-by: Tomasz Stanislawski <t.stanislaws@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Tested-by: Laura Abbott <lauraa@codeaurora.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()Dan Carpenter
The "info.fill" array isn't initialized so it can leak uninitialized stack information to user space. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Robin Holt <holt@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12aio: fix io_destroy() regression by using call_rcu()Kent Overstreet
There was a regression introduced by 36f5588905c1 ("aio: refcounting cleanup"), reported by Jens Axboe - the refcounting cleanup switched to using RCU in the shutdown path, but the synchronize_rcu() was done in the context of the io_destroy() syscall greatly increasing the time it could block. This patch switches it to call_rcu() and makes shutdown asynchronous (more asynchronous than it was originally; before the refcount changes io_destroy() would still wait on pending kiocbs). Note that there's a global quota on the max outstanding kiocbs, and that quota must be manipulated synchronously; otherwise io_setup() could return -EAGAIN when there isn't quota available, and userspace won't have any way of waiting until shutdown of the old kioctxs has finished (besides busy looping). So we release our quota before kioctx shutdown has finished, which should be fine since the quota never corresponded to anything real anyways. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jens Axboe <axboe@kernel.dk> Tested-by: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12rtc-at91rm9200: use shadow IMR on at91sam9x5Johan Hovold
Add support for the at91sam9x5-family which must use the shadow interrupt mask due to a hardware issue (causing RTC_IMR to always be zero). Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: Robert Nelson <Robert.Nelson@digikey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12rtc-at91rm9200: add shadow interrupt maskJohan Hovold
Add shadow interrupt-mask register which can be used on SoCs where the actual hardware register is broken. Note that some care needs to be taken to make sure the shadow mask corresponds to the actual hardware state. The added overhead is not an issue for the non-broken SoCs due to the relatively infrequent interrupt-mask updates. We do, however, only use the shadow mask value as a fall-back when it actually needed as there is still a theoretical possibility that the mask is incorrect (see the code for details). Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: Robert Nelson <Robert.Nelson@digikey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12rtc-at91rm9200: refactor interrupt-register handlingJohan Hovold
Add accessors for the interrupt register. This will allow us to easily add a shadow interrupt-mask register to use on SoCs where the interrupt-mask register cannot be used. Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: Robert Nelson <Robert.Nelson@digikey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12rtc-at91rm9200: add configuration supportJohan Hovold
Add configuration support which can be used to implement SoC-specific workarounds for broken hardware. Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: Robert Nelson <Robert.Nelson@digikey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12rtc-at91rm9200: add match-table compile guardJohan Hovold
The members of Atmel's at91sam9x5 family (9x5) have a broken RTC interrupt mask register (AT91_RTC_IMR). It does not reflect enabled interrupts but instead always returns zero. The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family. Currently when the date/time is set, an interrupt is generated and this driver neglects to handle the interrupt. The kernel complains about the un-handled interrupt and disables it henceforth. This not only breaks the RTC function, but since that interrupt is shared (Atmel's SYS interrupt) then other things break as well (e.g. the debug port no longer accepts characters). Tested on the at91sam9g25. Bug confirmed by Atmel. This patch (of 5): Add missing match-table compile guard. Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Ludovic Desroches <ludovic.desroches@atmel.com> Cc: Robert Nelson <Robert.Nelson@digikey.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directoryGoldwyn Rodrigues
While removing a non-empty directory, the kernel dumps a message: (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39 Suppress the error message from being printed in the dmesg so users don't panic. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12swap: avoid read_swap_cache_async() race to deadlock while waiting on ↵Rafael Aquini
discard I/O completion read_swap_cache_async() can race against get_swap_page(), and stumble across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the swapcache yet. This transient swap_map state is expected to be transitory, but the actual placement of discard at scan_swap_map() inserts a wait for I/O completion thus making the thread at read_swap_cache_async() to loop around its -EEXIST case, while the other end at get_swap_page() is scheduled away at scan_swap_map(). This can leave the system deadlocked if the I/O completion happens to be waiting on the CPU waitqueue where read_swap_cache_async() is busy looping and !CONFIG_PREEMPT. This patch introduces a cond_resched() call to make the aforementioned read_swap_cache_async() busy loop condition to bail out when necessary, thus avoiding the subtle race window. Signed-off-by: Rafael Aquini <aquini@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with ↵Tony Lindgren
device tree When booted in legacy mode device_init_wakeup() gets called by drivers/mfd/twl-core.c when the children are initialized. However, when booted using device tree, the children are created with of_platform_populate() instead add_children(). This means that the RTC driver will not have device_init_wakeup() set, and we need to call it from the driver probe like RTC drivers typically do. Without this we cannot test PM wake-up events on omaps for cases where there may not be any physical wake-up event. Signed-off-by: Tony Lindgren <tony@atomide.com> Reported-by: Kevin Hilman <khilman@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Jingoo Han <jg1.han@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12cciss: fix broken mutex usage in ioctlStephen M. Cameron
If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked (as is normal with the Array Configuration Utility) the process will hang as below. It attempts to acquire the same mutex twice, once in do_ioctl() and once in cciss_unlocked_open(). The BKL was recursive, the mutex isn't. Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013 [...] acu D 0000000000000001 0 3246 3191 0x00000080 Call Trace: schedule+0x29/0x70 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0x17b/0x220 mutex_lock+0x2b/0x50 cciss_unlocked_open+0x2f/0x110 [cciss] __blkdev_get+0xd3/0x470 blkdev_get+0x5c/0x1e0 register_disk+0x182/0x1a0 add_disk+0x17c/0x310 cciss_add_disk+0x13a/0x170 [cciss] cciss_update_drive_info+0x39b/0x480 [cciss] rebuild_lun_table+0x258/0x370 [cciss] cciss_ioctl+0x34f/0x470 [cciss] do_ioctl+0x49/0x70 [cciss] __blkdev_driver_ioctl+0x28/0x30 blkdev_ioctl+0x200/0x7b0 block_ioctl+0x3c/0x40 do_vfs_ioctl+0x89/0x350 SyS_ioctl+0xa1/0xb0 system_call_fastpath+0x16/0x1b This mutex usage was added into the ioctl path when the big kernel lock was removed. As it turns out, these paths are all thread safe anyway (or can easily be made so) and we don't want ioctl() to be single threaded in any case. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mike Miller <mike.miller@hp.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>