summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-04-19Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "Only one remaining fix for arm-soc platforms at this time, a small bugfix for cpu hotplug on highbank platforms that has become much easier to hit as of late. Details in the patch description, but it's small and well-contained and definitely impacts users of the platform, so 3.9 seems appropriate." * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: highbank: fix cache flush ordering for cpu hotplug
2013-04-19Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== If time allows, please consider pulling the following patchset contains two late Netfilter fixes, they are: * Skip broadcast/multicast locally generated traffic in the rpfilter, (closes netfilter bugzilla #814), from Florian Westphal. * Fix missing elements in the listing of ipset bitmap ip,mac set type with timeout support enabled, from Jozsef Kadlecsik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== A few stragglers hoping for 3.9, somewhat delayed due to my travels... On the mac80211 bits, Johannes says: "Sadly, I have another pull request -- the idle handling fix broke LED handling in some cases." and: "Yet one more! This fixes a fairly important/annoying bug -- when roaming between multiple APs of the same network, the system could get stuck thinking it was connected to the old one while it really wasn't." On top of that... Arend sends a brcmfmac patch that removes advertising a feature that isn't actually fully supported, and a brcmsmac patch that rearranges code to request firmware at IFF_UP to play more nicely with being built into the kernel. Felix gives us a minor ath9k_htc fix to support the newly released open source firmware, and an ath9k_hw initvals fix to improve device stability. Rafał Miłecki provides a fix for an ssb regression that caused a serious performance problem with b43. Zefir Kurtisi offers an ath9k fix to change some kmalloc flags to allow the DFS detector to be called in softirq context. Please let me know if there are problems. If these don't make 3.9, I'll just pull them into wireless-next -- just let me know if you want to do it that way! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19tcp: call tcp_replace_ts_recent() from tcp_ack()Eric Dumazet
commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming()) introduced a TS ecr bug in slow path processing. 1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200> 2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001> 3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> 4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200> (ecr 200 should be ecr 300 in packets 3 & 4) Problem is tcp_ack() can trigger send of new packets (retransmits), reflecting the prior TSval, instead of the TSval contained in the currently processed incoming packet. Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the checks, but before the actions. Reported-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19mtdchar: remove no-longer-used vma helpersLinus Torvalds
With the conversion to vm_iomap_memory(), these vma helpers are no longer used. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helperLinus Torvalds
This is my example conversion of a few existing mmap users. The pcm mmap case is one of the more straightforward ones. Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19vm: convert fb_mmap to vm_iomap_memory() helperLinus Torvalds
This is my example conversion of a few existing mmap users. The fb_mmap() case is a good example because it is a bit more complicated than some: fb_mmap() mmaps one of two different memory areas depending on the page offset of the mmap (but happily there is never any mixing of the two, so the helper function still works). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19vm: convert mtdchar mmap to vm_iomap_memory() helperLinus Torvalds
This is my example conversion of a few existing mmap users. The mtdchar case is actually disabled right now (and stays disabled), but I did it because it showed up on my "git grep", and I was familiar with the code due to fixing an overflow problem in the code in commit 9c603e53d380 ("mtdchar: fix offset overflow detection"). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19vm: convert HPET mmap to vm_iomap_memory() helperLinus Torvalds
This is my example conversion of a few existing mmap users. The HPET case is simple, widely available, and easy to test (Clemens Ladisch sent a trivial test-program for it). Test-program-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Two more small fixups to the wacom driver" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - fix "can not retrieve extra class descriptor" for DTH2242 Input: wacom - DTH2242 Grip Pen id was off by one bit
2013-04-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse build fix from Miklos Szeredi: "This fixes android builds. The patch appears large, but is just search & replace." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix type definitions in uapi header
2013-04-19Input: wacom - fix "can not retrieve extra class descriptor" for DTH2242Ping Cheng
Same as Cintiq 24HDT, DTH2242 has two interfaces sharing one configuration. This patch ignores the second interface. Signed-off-by: Ping Cheng <pingc@wacom.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-04-19Input: wacom - DTH2242 Grip Pen id was off by one bitPing Cheng
Signed-off-by: Ping Cheng <pingc@wacom.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2013-04-19xen: resolve section mismatch warnings in xen-acpi-processorBen Guthro
The following resolves a section mismatch warning below in xen-acpi-processor introduced by 3fac10145b766a2244422788f62dc35978613fd8 [13/13] xen: Re-upload processor PM data to hypervisor after S3 resume (v2) Warning: WARNING: drivers/xen/built-in.o(.text+0x2056a): Section mismatch in reference from the function xen_upload_processor_pm_data() to the function .init.text:read_acpi_id() The function xen_upload_processor_pm_data() references the function __init read_acpi_id(). This is often because xen_upload_processor_pm_data lacks a __init annotation or the annotation of read_acpi_id is wrong. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-04-19mutex: Back out architecture specific check for negative mutex countWaiman Long
Linus suggested that probably all the supported architectures can allow a negative mutex count without incorrect behavior, so we can then back out the architecture specific change and allow the mutex count to go to any negative number. That should further reduce contention for non-x86 architecture. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19mutex: Queue mutex spinners with MCS lock to reduce cacheline contentionWaiman Long
The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option turned on) allow multiple tasks to spin on a single mutex concurrently. A potential problem with the current approach is that when the mutex becomes available, all the spinning tasks will try to acquire the mutex more or less simultaneously. As a result, there will be a lot of cacheline bouncing especially on systems with a large number of CPUs. This patch tries to reduce this kind of contention by putting the mutex spinners into a queue so that only the first one in the queue will try to acquire the mutex. This will reduce contention and allow all the tasks to move forward faster. The queuing of mutex spinners is done using an MCS lock based implementation which will further reduce contention on the mutex cacheline than a similar ticket spinlock based implementation. This patch will add a new field into the mutex data structure for holding the MCS lock. This expands the mutex size by 8 bytes for 64-bit system and 4 bytes for 32-bit system. This overhead will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off. The following table shows the jobs per minute (JPM) scalability data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The numactl command is used to restrict the running of the fserver workloads to 1/2/4/8 nodes with hyperthreading off. +-----------------+-----------+-----------+-------------+----------+ | Configuration | Mean JPM | Mean JPM | Mean JPM | % Change | | | w/o patch | patch 1 | patches 1&2 | 1->1&2 | +-----------------+------------------------------------------------+ | | User Range 1100 - 2000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 227972 | 227237 | 305043 | +34.2% | | 4 nodes, HT off | 393503 | 381558 | 394650 | +3.4% | | 2 nodes, HT off | 334957 | 325240 | 338853 | +4.2% | | 1 node , HT off | 198141 | 197972 | 198075 | +0.1% | +-----------------+------------------------------------------------+ | | User Range 200 - 1000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 282325 | 312870 | 332185 | +6.2% | | 4 nodes, HT off | 390698 | 378279 | 393419 | +4.0% | | 2 nodes, HT off | 336986 | 326543 | 340260 | +4.2% | | 1 node , HT off | 197588 | 197622 | 197582 | 0.0% | +-----------------+-----------+-----------+-------------+----------+ At low user range 10-100, the JPM differences were within +/-1%. So they are not that interesting. The fserver workload uses mutex spinning extensively. With just the mutex change in the first patch, there is no noticeable change in performance. Rather, there is a slight drop in performance. This mutex spinning patch more than recovers the lost performance and show a significant increase of +30% at high user load with the full 8 nodes. Similar improvements were also seen in a 3.8 kernel. The table below shows the %time spent by different kernel functions as reported by perf when running the fserver workload at 1500 users with all 8 nodes. +-----------------------+-----------+---------+-------------+ | Function | % time | % time | % time | | | w/o patch | patch 1 | patches 1&2 | +-----------------------+-----------+---------+-------------+ | __read_lock_failed | 34.96% | 34.91% | 29.14% | | __write_lock_failed | 10.14% | 10.68% | 7.51% | | mutex_spin_on_owner | 3.62% | 3.42% | 2.33% | | mspin_lock | N/A | N/A | 9.90% | | __mutex_lock_slowpath | 1.46% | 0.81% | 0.14% | | _raw_spin_lock | 2.25% | 2.50% | 1.10% | +-----------------------+-----------+---------+-------------+ The fserver workload for an 8-node system is dominated by the contention in the read/write lock. Mutex contention also plays a role. With the first patch only, mutex contention is down (as shown by the __mutex_lock_slowpath figure) which help a little bit. We saw only a few percents improvement with that. By applying patch 2 as well, the single mutex_spin_on_owner figure is now split out into an additional mspin_lock figure. The time increases from 3.42% to 11.23%. It shows a great reduction in contention among the spinners leading to a 30% improvement. The time ratio 9.9/2.33=4.3 indicates that there are on average 4+ spinners waiting in the spin_lock loop for each spinner in the mutex_spin_on_owner loop. Contention in other locking functions also go down by quite a lot. The table below shows the performance change of both patches 1 & 2 over patch 1 alone in other AIM7 workloads (at 8 nodes, hyperthreading off). +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | alltests | 0.0% | -0.8% | +0.6% | | five_sec | -0.3% | +0.8% | +0.8% | | high_systime | +0.4% | +2.4% | +2.1% | | new_fserver | +0.1% | +14.1% | +34.2% | | shared | -0.5% | -0.3% | -0.4% | | short | -1.7% | -9.8% | -8.3% | +--------------+---------------+----------------+-----------------+ The short workload is the only one that shows a decline in performance probably due to the spinner locking and queuing overhead. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19mutex: Make more scalable by doing less atomic operationsWaiman Long
In the __mutex_lock_common() function, an initial entry into the lock slow path will cause two atomic_xchg instructions to be issued. Together with the atomic decrement in the fast path, a total of three atomic read-modify-write instructions will be issued in rapid succession. This can cause a lot of cache bouncing when many tasks are trying to acquire the mutex at the same time. This patch will reduce the number of atomic_xchg instructions used by checking the counter value first before issuing the instruction. The atomic_read() function is just a simple memory read. The atomic_xchg() function, on the other hand, can be up to 2 order of magnitude or even more in cost when compared with atomic_read(). By using atomic_read() to check the value first before calling atomic_xchg(), we can avoid a lot of unnecessary cache coherency traffic. The only downside with this change is that a task on the slow path will have a tiny bit less chance of getting the mutex when competing with another task in the fast path. The same is true for the atomic_cmpxchg() function in the mutex-spin-on-owner loop. So an atomic_read() is also performed before calling atomic_cmpxchg(). The mutex locking and unlocking code for the x86 architecture can allow any negative number to be used in the mutex count to indicate that some tasks are waiting for the mutex. I am not so sure if that is the case for the other architectures. So the default is to avoid atomic_xchg() if the count has already been set to -1. For x86, the check is modified to include all negative numbers to cover a larger case. The following table shows the jobs per minutes (JPM) scalability data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The numactl command is used to restrict the running of the high_systime workloads to 1/2/4/8 nodes with hyperthreading on and off. +-----------------+-----------+------------+----------+ | Configuration | Mean JPM | Mean JPM | % Change | | | w/o patch | with patch | | +-----------------+-----------------------------------+ | | User Range 1100 - 2000 | +-----------------+-----------------------------------+ | 8 nodes, HT on | 36980 | 148590 | +301.8% | | 8 nodes, HT off | 42799 | 145011 | +238.8% | | 4 nodes, HT on | 61318 | 118445 | +51.1% | | 4 nodes, HT off | 158481 | 158592 | +0.1% | | 2 nodes, HT on | 180602 | 173967 | -3.7% | | 2 nodes, HT off | 198409 | 198073 | -0.2% | | 1 node , HT on | 149042 | 147671 | -0.9% | | 1 node , HT off | 126036 | 126533 | +0.4% | +-----------------+-----------------------------------+ | | User Range 200 - 1000 | +-----------------+-----------------------------------+ | 8 nodes, HT on | 41525 | 122349 | +194.6% | | 8 nodes, HT off | 49866 | 124032 | +148.7% | | 4 nodes, HT on | 66409 | 106984 | +61.1% | | 4 nodes, HT off | 119880 | 130508 | +8.9% | | 2 nodes, HT on | 138003 | 133948 | -2.9% | | 2 nodes, HT off | 132792 | 131997 | -0.6% | | 1 node , HT on | 116593 | 115859 | -0.6% | | 1 node , HT off | 104499 | 104597 | +0.1% | +-----------------+------------+-----------+----------+ At low user range 10-100, the JPM differences were within +/-1%. So they are not that interesting. AIM7 benchmark run has a pretty large run-to-run variance due to random nature of the subtests executed. So a difference of less than +-5% may not be really significant. This patch improves high_systime workload performance at 4 nodes and up by maintaining transaction rates without significant drop-off at high node count. The patch has practically no impact on 1 and 2 nodes system. The table below shows the percentage time (as reported by perf record -a -s -g) spent on the __mutex_lock_slowpath() function by the high_systime workload at 1500 users for 2/4/8-node configurations with hyperthreading off. +---------------+-----------------+------------------+---------+ | Configuration | %Time w/o patch | %Time with patch | %Change | +---------------+-----------------+------------------+---------+ | 8 nodes | 65.34% | 0.69% | -99% | | 4 nodes | 8.70% | 1.02% | -88% | | 2 nodes | 0.41% | 0.32% | -22% | +---------------+-----------------+------------------+---------+ It is obvious that the dramatic performance improvement at 8 nodes was due to the drastic cut in the time spent within the __mutex_lock_slowpath() function. The table below show the improvements in other AIM7 workloads (at 8 nodes, hyperthreading off). +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | alltests | +0.6% | +104.2% | +185.9% | | five_sec | +1.9% | +0.9% | +0.9% | | fserver | +1.4% | -7.7% | +5.1% | | new_fserver | -0.5% | +3.2% | +3.1% | | shared | +13.1% | +146.1% | +181.5% | | short | +7.4% | +5.0% | +4.2% | +--------------+---------------+----------------+-----------------+ Signed-off-by: Waiman Long <Waiman.Long@hp.com> Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Norton: Scott J <scott.norton@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19mutex: Move mutex spinning code from sched/core.c back to mutex.cWaiman Long
As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler feature bit was really just an early hack to make with/without mutex-spinning testable. So it is no longer necessary. This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and move the mutex spinning code from kernel/sched/core.c back to kernel/mutex.c which is where they should belong. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-19Merge branch 'userns-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux Pull user-namespace fixes from Andy Lutomirski. * 'userns-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux: userns: Changing any namespace id mappings should require privileges userns: Check uid_map's opener's fsuid, not the current fsuid userns: Don't let unprivileged users trick privileged users into setting the id_map
2013-04-18Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86Linus Torvalds
Pull x86 platform driver revert from Matthew Garrett: "It turns out that one of the hp-wmi patches this cycle breaks some other HP laptops. I think we have a good idea how to work on it for 3.10, but it's safer to just revert it for now." * 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86: Revert "hp-wmi: Add support for SMBus hotkeys"
2013-04-18netfilter: xt_rpfilter: skip locally generated broadcast/multicast, tooFlorian Westphal
Alex Efros reported rpfilter module doesn't match following packets: IN=br.qemu SRC=192.168.2.1 DST=192.168.2.255 [ .. ] (netfilter bugzilla #814). Problem is that network stack arranges for the locally generated broadcasts to appear on the interface they were sent out, so the IFF_LOOPBACK check doesn't trigger. As -m rpfilter is restricted to PREROUTING, we can check for existing rtable instead, it catches locally-generated broad/multicast case, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-18Revert "hp-wmi: Add support for SMBus hotkeys"Matthew Garrett
This reverts commit fabf85e3ca15d5b94058f391dac8df870cdd427a which breaks hotkey support on some other HP laptops. We'll try doing this differently in 3.10. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
2013-04-18netfilter: ipset: bitmap:ip,mac: fix listing with timeoutJozsef Kadlecsik
The type when timeout support was enabled, could not list all elements, just the first ones which could fit into one netlink message: it just did not continue listing after the first message. Reported-by: Yoann JUET <yoann.juet@univ-nantes.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Tested-by: Yoann JUET <yoann.juet@univ-nantes.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-18bonding: fix l23 and l34 load balancing in forwarding pathEric Dumazet
Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing) bonding doesn't properly hash traffic in forwarding setups. Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in this case. More generally, the transport header might not be in the skb head. Use pskb_may_pull() & skb_header_pointer() to get it right, and use proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for more protocols than TCP and UDP. Reported-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: John Eaglesham <linux@8192.net> Tested-by: Vitaly V. Bursov <vitalyb@telenet.dn.ua> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18bnx2x: Fix status blocks configurationAriel Elior
This fixes 2 issues regarding bnx2x's status blocks: 1. ethtool -c caused corruption of status blocks in FW RAM. 2. when using multi-CoS, the configuration of the timeout values of status blocks is incorrect, harming the coalescing of interrupts for such CoSs. Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18bnx2x: Prevent UNDI FW illegal host accessDmitry Kravkov
When loading after UNDI (e.g., Boot from SAN) the UNDI does not gracefully yield its resources; The bnx2x driver handles that release itself. During the manipulation required to release those resources, it's possible for the UNDI to try and write to memory regions which are no longer accessible, causing the PCI bus to prevent further writes from the chip. This would in turn cause DMAE timeouts later on in the driver, as the driver will be unable to use the chip's DMA engines. This patch prevents the chip from actually writing through the PCI bus in said scenario, thus allowing the release without the unfortunate by-product. Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2013-04-18Merge branch 'qlogic'David S. Miller
Shahed Shaikh says: ==================== This patch series contains bug fixes for - * Loopback test failure while traffic is running. * Tx timeout and subsequent firmware reset by removing check for '(adapter->netdev->features & (NETIF_F_TSO | NETIF_F_TSO6)' from tx fast path, as per Eric's suggestion. * Typo in logs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18qlcnic: Fix typo in logsShahed Shaikh
o Debug logs were not matching with code functionality. o Changed dev_info to netdev_err Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18qlcnic: fix TSO race conditionSritej Velaga
When driver receives a packet with gso size > 0 and when TSO is disabled, it should be transmitted as a TSO packet to prevent Tx timeout and subsequent firmware reset. Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18qlcnic: Stop traffic before performing loopback testJitendra Kalsaria
Before conducting loopback test by sending packets, driver should stop transmit queue and turn off carrier. Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18ixgbe: Fix a bug in setting VF VLAN via PFGreg Rose
The PF driver does not check if the administrator has already set a VF VLAN via the PF driver before setting the new VLAN. This results in the following scenario: A) Administrator sets VF <n> to VLAN 100 B) Administrator sets VF <x> to VLAN 100 C) Administrator sets VF <n> to VLAN 200 D) The VF <n> driver continues to be able to receive traffic on VLAN 100 because the VLVFB pool enable bit for that VF was left set instead of being cleared as it should be. This fix ensures that the old VLAN filter for VF <n> is first removed and the pool bit enable for VF <n> is cleared so that it no longer receives traffic on VLAN 100. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18igb: Revert support for build_skb in igbAlexander Duyck
This patch actually reverts: igb: Support using build_skb in the case that jumbo frames are disabled The reason for reverting this patch is that it can lead to data corruption. The following flow was pointed out by Ben Hutchings: 1. skb is forwarded to another device 2. Packet headers are modified and it's put into a queue 3. Second packet is received into the other half of this page 4. Page cannot be reused, so is DMA-unmapped 5. The DMA mapping was non-coherent, so unmap copies or invalidates cache The headers added in step 2 get trashed in step 5. Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included changes: - fix MAC address check in case of multiple mesh interfaces Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18[SCSI] megaraid_sas: Use correct #define for MSI-X capabilityBjorn Helgaas
Previously we used PCI_MSI_FLAGS to locate a register in the MSI-X capability. This did work because the MSI and MSI-X flags happen to be at the same offsets, but was confusing. PCI_MSIX_FLAGS_ENABLE is already defined in include/uapi/linux/pci_regs.h, so no need to define it again. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Adam Radford <aradford@gmail.com>
2013-04-18ARM: highbank: fix cache flush ordering for cpu hotplugRob Herring
The L1 data cache flush needs to be after highbank_set_cpu_jump call which pollutes the cache with the l2x0_lock. This causes other cores to deadlock waiting for the l2x0_lock. Moving the flush of the entire data cache after highbank_set_cpu_jump fixes the problem. Use flush_cache_louis instead of flush_cache_all are that is sufficient to flush only the L1 data cache. flush_cache_louis did not exist when highbank_cpu_die was originally written. With PL310 errata 769419 enabled, a wmb is inserted into idle which takes the l2x0_lock. This makes the problem much more easily hit and causes reset to hang. Reported-by: Paolo Pisati <p.pisati@gmail.com> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2013-04-18Revert "block: add missing block_bio_complete() tracepoint"Linus Torvalds
This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f. Wanlong Gao reports that it causes a kernel panic on his machine several minutes after boot. Reverting it removes the panic. Jens says: "It's not quite clear why that is yet, so I think we should just revert the commit for 3.9 final (which I'm assuming is pretty close). The wifi is crap at the LSF hotel, so sending this email instead of queueing up a revert and pull request." Reported-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Requested-by: Jens Axboe <axboe@kernel.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-18x86, hyperv: Handle Xen emulation of Hyper-V more gracefullyK. Y. Srinivasan
Install the Hyper-V specific interrupt handler only when needed. This would permit us to get rid of the Xen check. Note that when the vmbus drivers invokes the call to register its handler, we are sure to be running on Hyper-V. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Link: http://lkml.kernel.org/r/1366299886-6399-1-git-send-email-kys@microsoft.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-18kprobes: Fix a double lock bug of kprobe_mutexMasami Hiramatsu
Fix a double locking bug caused when debug.kprobe-optimization=0. While the proc_kprobes_optimization_handler locks kprobe_mutex, wait_for_kprobe_optimizer locks it again and that causes a double lock. To fix the bug, this introduces different mutex for protecting sysctl parameter and locks it in proc_kprobes_optimization_handler. Of course, since we need to lock kprobe_mutex when touching kprobes resources, that is done in *optimize_all_kprobes(). This bug was introduced by commit ad72b3bea744 ("kprobes: fix wait_for_kprobe_optimizer()") Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-18dmaengine: at_hdmac: fix race condition in atc_advance_work()Ludovic Desroches
The BUG_ON() directive is triggered probably due to a latency modification following inclusion of commit c10d73671ad3 ("softirq: reduce latencies"). This condition has not been met before 3.9-rc1 and doesn't trigger without this patch. We now make sure that DMA channel is idle before calling atc_complete_all() which makes the BUG_ON() "protection" useless. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-18Merge branch 'fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== Two small bug fixes for net/3.9 including the issue previously discussed where allocation of netlink notifications can fail after changes have been committed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle patches from Len Brown: "A pair of small patches for 3.9-rc7. This CPU-id should have been included in the ones that we updated earlier in 3.9. This pair of patches will allow this flavor of Haswell to behave like the other flavors." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: additional Haswell CPU-id intel_idle: additional Haswell CPU-id
2013-04-17Merge branch 'akpm' (fixes from Andrew)Linus Torvalds
Pull misc fixes from Andrew Morton: "Ten fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kernel/signal.c: stop info leak via the tkill and the tgkill syscalls mm/vmscan: fix error return in kswapd_run() hfsplus: fix potential overflow in hfsplus_file_truncate() avr32: fix build error in atstk1006_defconfig hugetlbfs: add swap entry check in follow_hugetlb_page() fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size() hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB) checkpatch: fix stringification macro defect drivers/video/mmp/core.c: fix use-after-free bug thinkpad-acpi: kill hotkey_thread_mutex
2013-04-17kernel/signal.c: stop info leak via the tkill and the tgkill syscallsEmese Revfy
This fixes a kernel memory contents leak via the tkill and tgkill syscalls for compat processes. This is visible in the siginfo_t->_sifields._rt.si_sigval.sival_ptr field when handling signals delivered from tkill. The place of the infoleak: int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from) { ... put_user_ex(ptr_to_compat(from->si_ptr), &to->si_ptr); ... } Signed-off-by: Emese Revfy <re.emese@gmail.com> Reviewed-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17mm/vmscan: fix error return in kswapd_run()Xishi Qiu
Fix the error return value in kswapd_run(). The bug was introduced by commit d5dc0ad928fb ("mm/vmscan: fix error number for failed kthread"). Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17hfsplus: fix potential overflow in hfsplus_file_truncate()Vyacheslav Dubeyko
Change a u32 to loff_t hfsplus_file_truncate(). Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17avr32: fix build error in atstk1006_defconfigJosh Wu
fixed the following compile error when use avr32 atstk1006_defconfig: drivers/mtd/nand/atmel_nand.c: In function 'pmecc_err_location': drivers/mtd/nand/atmel_nand.c:639: error: implicit declaration of function 'writel_relaxed' which was introduced by commit 1c7b874d33b4 ("mtd: at91: atmel_nand: add Programmable Multibit ECC controller support"). The PMECC for nand flash code uses writel_relaxed(). But in avr32, there is no macro "writel_relaxed" defined. This patch add writex_relaxed macro definitions. Signed-off-by: Josh Wu <josh.wu@atmel.com> Acked-by: Havard Skinnemoen <havard@skinnemoen.net> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17hugetlbfs: add swap entry check in follow_hugetlb_page()Naoya Horiguchi
With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory error happens on a hugepage and the affected processes try to access the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in get_page(). The reason for this bug is that coredump-related code doesn't recognise "hugepage hwpoison entry" with which a pmd entry is replaced when a memory error occurs on a hugepage. In other words, physical address information is stored in different bit layout between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() which is called in get_dump_page() returns a wrong page from a given address. The expected behavior is like this: absent is_swap_pte FOLL_DUMP Expected behavior ------------------------------------------------------------------- true false false hugetlb_fault false true false hugetlb_fault false false false return page true false true skip page (to avoid allocation) false true true hugetlb_fault false false true return page With this patch, we can call hugetlb_fault() and take proper actions (we wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for hwpoisoned entries,) and as the result we can dump all hugepages except for hwpoisoned ones. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [2.6.34+?] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17fs/binfmt_elf.c: fix hugetlb memory check in vma_dump_size()Naoya Horiguchi
Documentation/filesystems/proc.txt says about coredump_filter bitmask, Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only effected by bit 5-6. However current code can go into the subsequent flag checks of bit 0-4 for vma(VM_HUGETLB). So this patch inserts 'return' and makes it work as written in the document. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-17hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)Naoya Horiguchi
Currently we fail to include any data on hugepages into coredump, because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently introduced by commit 314e51b9851b ("mm: kill vma flag VM_RESERVED and mm->reserved_vm counter"). This looks to me a serious regression, so let's fix it. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>