summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-12-02tcp: fix crashes in do_tcp_sendpages()Eric Dumazet
Recent network changes allowed high order pages being used for skb fragments. This uncovered a bug in do_tcp_sendpages() which was assuming its caller provided an array of order-0 page pointers. We only have to deal with a single page in this function, and its order is irrelevant. Reported-by: Willy Tarreau <w@1wt.eu> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-11-29bonding: fix race condition in bonding_store_slaves_activenikolay@redhat.com
Race between bonding_store_slaves_active() and slave manipulation functions. The bond_for_each_slave use in bonding_store_slaves_active() is not protected by any synchronization mechanism. NULL pointer dereference is easy to reach. Fixed by acquiring the bond->lock for the slave walk. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29bonding: make arp_ip_target parameter checks consistent with sysfsnikolay@redhat.com
The module can be loaded with arp_ip_target="255.255.255.255" which makes it impossible to remove as the function in sysfs checks for that value, so we make the parameter checks consistent with sysfs. v2: Fix formatting v3: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29bonding: fix miimon and arp_interval delayed work race conditionsnikolay@redhat.com
First I would give three observations which will be used later. Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq) This usage is wrong because the pending bit is cleared just before the work's fn is executed and if the function re-arms itself we might end up with the work still running. It's safe to call cancel_delayed_work_sync() even if the work is not queued at all. Observation 2: Use of INIT_DELAYED_WORK() Work needs to be initialized only once prior to (de/en)queueing. Observation 3: IFF_UP is set only after ndo_open is called Related race conditions: 1. Race between bonding_store_miimon() and bonding_store_arp_interval() Because of Obs.1 we can end up having both works enqueued. 2. Multiple races with INIT_DELAYED_WORK() Since the works are not protected by anything between INIT_DELAYED_WORK() and calls to (en/de)queue it is possible for races between the following functions: (races are also possible between the calls to INIT_DELAYED_WORK() and workqueue code) bonding_store_miimon() - bonding_store_arp_interval(), bond_close(), bond_open(), enqueued functions bonding_store_arp_interval() - bonding_store_miimon(), bond_close(), bond_open(), enqueued functions 3. By Obs.1 we need to change bond_cancel_all() Bugs 1 and 2 are fixed by moving all work initializations in bond_open which by Obs. 2 and Obs. 3 and the fact that we make sure that all works are cancelled in bond_close(), is guaranteed not to have any work enqueued. Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so they can't race with bond_close and bond_open. The opposing work is cancelled only if the IFF_UP flag is set and it is cancelled unconditionally. The opposing work is already cancelled if the interface is down so no need to cancel it again. This way we don't need new synchronizations for the bonding workqueue. These bugs (and fixes) are tied together and belong in the same patch. Note: I have left 1 line intentionally over 80 characters (84) because I didn't like how it looks broken down. If you'd prefer it otherwise, then simply break it. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Some more fixes trickled in over the past few days: 1) PIM device names can overflow the IFNAMSIZ buffer unless we properly limit the allowed indexes, fix from Eric Dumazet. 2) Under heavy load we can OOPS in icmp reply processing due to an unchecked inet_putpeer() call. Fix from Neal Cardwell. 3) SCTP round trip calculations need to use 64-bit math to avoid overflows, fix from Schoch Christian. 4) Fix a memory leak and an error return flub in SCTP and IRDA triggerable by userspace. Fix from Tommi Rantala and found by the syscall fuzzer (trinity). 5) MLX4 driver gives bogus size to memcpy() call, fix from Amir Vadai. 6) Fix length calculation in VHOST descriptor translation, from Michael S Tsirkin. 7) Ambassador ATM driver loops forever while loading firmware, fix from Dan Carpenter. 8) Over MTU packets in openvswitch warn about wrong device, fix from Jesse Gross. 9) Netfilter IPSET's netlink code can overrun a string buffer because it's not properly limited to IFNAMSIZ. Fix from Florian Westphal. 10) PCAN USB driver sets wrong timestamp in SKB, from Oliver Hartkopp. 11) Make sure the RX ifindex always has a valid value in the CAN BCM driver, even if we haven't received a frame yet. Fix also from Oliver Hartkopp." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: team: fix hw_features setup atm: forever loop loading ambassador firmware vhost: fix length for cross region descriptor irda: irttp: fix memory leak in irttp_open_tsap() error path net: qmi_wwan: add Huawei E173 net/mlx4_en: Can set maxrate only for TC0 sctp: Error in calculation of RTTvar sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails net: ipmr: limit MRT_TABLE identifiers ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow() can: bcm: initialize ifindex for timeouts without previous frame reception can: peak_usb: fix hwtstamp assignment netfilter: ipset: fix netiface set name overflow openvswitch: Store flow key len if ARP opcode is not request or reply. openvswitch: Print device when warning about over MTU packets.
2012-11-28Merge branch 'fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Two small openswitch fixes from Jesse Gross. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28team: fix hw_features setupJiri Pirko
Do this in the same way bonding does. This fixed setup resolves performance issues when using some cards with certain offloading. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28atm: forever loop loading ambassador firmwareDan Carpenter
There was a forever loop introduced here when we converted this to request_firmware() back in 2008. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
An interface name overflow fix in netfilter via Pablo Neira Ayuso. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28vhost: fix length for cross region descriptorMichael S. Tsirkin
If a single descriptor crosses a region, the second chunk length should be decremented by size translated so far, instead it includes the full descriptor length. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28irda: irttp: fix memory leak in irttp_open_tsap() error pathTommi Rantala
Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit this error path. The leak goes back to at least 1da177e4 ("Linux-2.6.12-rc2"). Discovered with Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28net: qmi_wwan: add Huawei E173Bjørn Mork
The Huawei E173 is a QMI/wwan device which normally appear as 12d1:1436 in Linux. The descriptors displayed in that mode will be picked up by cdc_ether. But the modem has another mode with a different device ID and a slightly different set of descriptors. This is the mode used by Windows like this: 3Modem: USB\VID_12D1&PID_140C&MI_00\6&3A1D2012&0&0000 Networkcard: USB\VID_12D1&PID_140C&MI_01\6&3A1D2012&0&0001 Appli.Inter: USB\VID_12D1&PID_140C&MI_02\6&3A1D2012&0&0002 PC UI Inter: USB\VID_12D1&PID_140C&MI_03\6&3A1D2012&0&0003 Reported-by: Thomas Schäfer <tschaefer@t-online.de> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28net/mlx4_en: Can set maxrate only for TC0Amir Vadai
Had a typo in memcpy. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28sctp: Error in calculation of RTTvarSchoch Christian
The calculation of RTTVAR involves the subtraction of two unsigned numbers which may causes rollover and results in very high values of RTTVAR when RTT > SRTT. With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at 4 times the clock granularity. Change Notes: v2) *Replaced abs() by abs64() and long by __s64, changed patch description. Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: Neil Horman <nhorman@tuxdriver.com> CC: linux-sctp@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscallTommi Rantala
Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space ↵Tommi Rantala
fails Trinity (the syscall fuzzer) discovered a memory leak in SCTP, reproducible e.g. with the sendto() syscall by passing invalid user space pointer in the second argument: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } As far as I can tell, the leak has been around since ~2003. Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28percpu-rwsem: use synchronize_sched_expeditedMikulas Patocka
Use synchronize_sched_expedited() instead of synchronize_sched() to improve mount speed. This patch improves mount time from 0.500s to 0.013s for Jeff's test-case. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For some media fixes: - dvb_usb_v2: some fixes at the core - Some fixes on some embedded drivers: soc_camera, adv7604, omap3isp, exynos/s5p - Several Exynos4/5 camera fixes - a fix at stv0900 driver - a few USB ID additions to detect more variants of rtl28xxu-based sticks" * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits) [media] rtl28xxu: 0ccd:00d7 TerraTec Cinergy T Stick+ [media] rtl28xxu: 1d19:1102 Dexatek DK mini DVB-T Dongle [media] mt9v022: fix the V4L2_CID_EXPOSURE control [media] mx2_camera: fix missing unlock on error in mx2_start_streaming() [media] media: omap1_camera: fix const cropping related warnings [media] media: mx1_camera: use the default .set_crop() implementation [media] media: mx2_camera: fix const cropping related warnings [media] media: mx3_camera: fix const cropping related warnings [media] media: pxa_camera: fix const cropping related warnings [media] media: sh_mobile_ceu_camera: fix const cropping related warnings [media] media: sh_vou: fix const cropping related warnings [media] adv7604: restart STDI once if format is not found [media] adv7604: use presets where possible [media] adv7604: Replace prim_mode by mode [media] adv7604: cleanup references [media] dvb_usb_v2: switch interruptible mutex to normal [media] dvb_usb_v2: fix pid_filter callback error logging [media] exynos-gsc: change driver compatible string [media] omap3isp: Fix warning caused by bad subdev events operations prototypes [media] omap3isp: video: Fix warning caused by bad vidioc_s_crop prototype ...
2012-11-27Merge branch 'akpm' (Fixes from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "8 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches) futex: avoid wake_futex() for a PI futex_q watchdog: using u64 in get_sample_period() writeback: put unused inodes to LRU after writeback completion mm: vmscan: check for fatal signals iff the process was throttled Revert "mm: remove __GFP_NO_KSWAPD" proc: check vma->vm_file before dereferencing UAPI: strip the _UAPI prefix from header guards during header installation include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
2012-11-27Merge tag 'tty-3.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY fix from Greg Kroah-Hartman: "Here is a single fix for a reported regression in 3.7-rc5 for the tty layer. This fix has been in the linux-next tree and solves the reported problem. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: tty vt: Fix a regression in command line edition
2012-11-27Merge tag 'mfd-for-linus-3.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD fixes from Samuel Ortiz: - A twl fix preventing a buffer overflow. - A wm5102 register patch fix. - A wm5110 error misreport fix. - Arizona fixes: Use the right array size when adding subdevices, correctly report underclocked events, synchronize register cache after reset. - A twl4030 fix for preventing the system to hang from an interrupt flood. * tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: twl4030: Fix chained irq handling on resume from suspend mfd: arizona: Sync regcache after reset mfd: arizona: Correctly report when AIF2/AIF1 is underclocked mfd: arizona: Use correct array for ARRAY_SIZE in mfd_add_devices call mfd: wm5110: Disable control interface error report for WM5110 rev B mfd: wm5102: Update register patch for latest evaluation mfd: twl-core: Fix chip ID for the twl6030-pwm module
2012-11-27Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "Not much here, just a couple minor/cosmetic fixes and a patch for the decompressor which fixes problems with modern GCC and CPUs." * 'fixes' of git://git.linaro.org/people/rmk/linux-arm: ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above ARM: 7572/1: proc-v6.S: fix comment ARM: 7570/1: quiet down the non make -s output
2012-11-27Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 regression fix from Jan Kara: "Fix an ext3 regression introduced during 3.7 merge window. It leads to deadlock if you stress the filesystem in the right way (luckily only if blocksize < pagesize)." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: jbd: Fix lock ordering bug in journal_unmap_buffer()
2012-11-27futex: avoid wake_futex() for a PI futex_qDarren Hart
Dave Jones reported a bug with futex_lock_pi() that his trinity test exposed. Sometime between queue_me() and taking the q.lock_ptr, the lock_ptr became NULL, resulting in a crash. While futex_wake() is careful to not call wake_futex() on futex_q's with a pi_state or an rt_waiter (which are either waiting for a futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and futex_requeue() do not perform the same test. Update futex_wake_op() and futex_requeue() to test for q.pi_state and q.rt_waiter and abort with -EINVAL if detected. To ensure any future breakage is caught, add a WARN() to wake_futex() if the same condition is true. This fix has seen 3 hours of testing with "trinity -c futex" on an x86_64 VM with 4 CPUS. [akpm@linux-foundation.org: tidy up the WARN()] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Reported-by: Dave Jones <davej@redat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Kacur <jkacur@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27watchdog: using u64 in get_sample_period()Chuansheng Liu
In get_sample_period(), unsigned long is not enough: watchdog_thresh * 2 * (NSEC_PER_SEC / 5) case1: watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800 case2: set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000 In case2, we need use u64 to express the sample period. Otherwise, changing the threshold thru proc often can not be successful. Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Acked-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27writeback: put unused inodes to LRU after writeback completionJan Kara
Commit 169ebd90131b ("writeback: Avoid iput() from flusher thread") removed iget-iput pair from inode writeback. As a side effect, inodes that are dirty during iput_final() call won't be ever added to inode LRU (iput_final() doesn't add dirty inodes to LRU and later when the inode is cleaned there's noone to add the inode there). Thus inodes are effectively unreclaimable until someone looks them up again. The practical effect of this bug is limited by the fact that inodes are pinned by a dentry for long enough that the inode gets cleaned. But still the bug can have nasty consequences leading up to OOM conditions under certain circumstances. Following can easily reproduce the problem: for (( i = 0; i < 1000; i++ )); do mkdir $i for (( j = 0; j < 1000; j++ )); do touch $i/$j echo 2 > /proc/sys/vm/drop_caches done done then one needs to run 'sync; ls -lR' to make inodes reclaimable again. We fix the issue by inserting unused clean inodes into the LRU after writeback finishes in inode_sync_complete(). Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27mm: vmscan: check for fatal signals iff the process was throttledMel Gorman
Commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage") introduced a check for fatal signals after a process gets throttled for network storage. The intention was that if a process was throttled and got killed that it should not trigger the OOM killer. As pointed out by Minchan Kim and David Rientjes, this check is in the wrong place and too broad. If a system is in am OOM situation and a process is exiting, it can loop in __alloc_pages_slowpath() and calling direct reclaim in a loop. As the fatal signal is pending it returns 1 as if it is making forward progress and can effectively deadlock. This patch moves the fatal_signal_pending() check after throttling to throttle_direct_reclaim() where it belongs. If the process is killed while throttled, it will return immediately without direct reclaim except now it will have TIF_MEMDIE set and will use the PFMEMALLOC reserves. Minchan pointed out that it may be better to direct reclaim before returning to avoid using the reserves because there may be pages that can easily reclaim that would avoid using the reserves. However, we do no such targetted reclaim and there is no guarantee that suitable pages are available. As it is expected that this throttling happens when swap-over-NFS is used there is a possibility that the process will instead swap which may allocate network buffers from the PFMEMALLOC reserves. Hence, in the swap-over-nfs case where a process can be throtted and be killed it can use the reserves to exit or it can potentially use reserves to swap a few pages and then exit. This patch takes the option of using the reserves if necessary to allow the process exit quickly. If this patch passes review it should be considered a -stable candidate for 3.6. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27Revert "mm: remove __GFP_NO_KSWAPD"Mel Gorman
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures" reverted, Zdenek Kabelac reported the following Hmm, so it's just took longer to hit the problem and observe kswapd0 spinning on my CPU again - it's not as endless like before - but still it easily eats minutes - it helps to turn off Firefox or TB (memory hungry apps) so kswapd0 stops soon - and restart those apps again. (And I still have like >1GB of cached memory) kswapd0 R running task 0 30 2 0x00000000 Call Trace: preempt_schedule+0x42/0x60 _raw_spin_unlock+0x55/0x60 put_super+0x31/0x40 drop_super+0x22/0x30 prune_super+0x149/0x1b0 shrink_slab+0xba/0x510 The sysrq+m indicates the system has no swap so it'll never reclaim anonymous pages as part of reclaim/compaction. That is one part of the problem but not the root cause as file-backed pages could also be reclaimed. The likely underlying problem is that kswapd is woken up or kept awake for each THP allocation request in the page allocator slow path. If compaction fails for the requesting process then compaction will be deferred for a time and direct reclaim is avoided. However, if there are a storm of THP requests that are simply rejected, it will still be the the case that kswapd is awake for a prolonged period of time as pgdat->kswapd_max_order is updated each time. This is noticed by the main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead it will loopp, shrinking a small number of pages and calling shrink_slab() on each iteration. The temptation is to supply a patch that checks if kswapd was woken for THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not backed up by proper testing. As 3.7 is very close to release and this is not a bug we should release with, a safer path is to revert "mm: remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing out the balance_pgdat() logic in general. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Zdenek Kabelac <zkabelac@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27proc: check vma->vm_file before dereferencingStanislav Kinsbursky
Commit 7b540d0646ce ("proc_map_files_readdir(): don't bother with grabbing files") switched proc_map_files_readdir() to use @f_mode directly instead of grabbing @file reference, but same time the test for @vm_file presence was lost leading to nil dereference. The patch brings the test back. The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped (which is set to 'n' by default) so the bug doesn't affect regular kernels. The regression is 3.7-rc1 only as far as I can tell. [gorcunov@openvz.org: provided changelog] Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27UAPI: strip the _UAPI prefix from header guards during header installationDavid Howells
Strip the _UAPI prefix from header guards during header installation so that any userspace dependencies aren't affected. glibc, for example, checks for linux/types.h, linux/kernel.h, linux/compiler.h and linux/list.h by their guards - though the last two aren't actually exported. libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -Wall -Werror -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fno-delete-null-pointer-checks -fstack-protector -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c child.c -fPIC -DPIC -o .libs/child.o In file included from cli.c:20:0: common.h:152:8: error: redefinition of 'struct sysinfo' In file included from /usr/include/linux/kernel.h:4:0, from /usr/include/linux/sysctl.h:25, from /usr/include/sys/sysctl.h:43, from common.h:50, from cli.c:20: /usr/include/linux/sysinfo.h:7:8: note: originally defined here Reported-by: Tomasz Torcz <tomek@pipebreaker.pl> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALIDTushar Behera
Commit baf05aa9271b ("bug: introduce BUILD_BUG_ON_INVALID() macro") introduces this macro only when _CHECKER_ is not defined. Define a silent macro in the else condition to fix following sparse warning: mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID' include/linux/mm.h:419:9: error: not a function <noident> Signed-off-by: Tushar Behera <tushar.behera@linaro.org> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26net: ipmr: limit MRT_TABLE identifiersEric Dumazet
Name of pimreg devices are built from following format : char name[IFNAMSIZ]; // IFNAMSIZ == 16 sprintf(name, "pimreg%u", mrt->id); We must therefore limit mrt->id to 9 decimal digits or risk a buffer overflow and a crash. Restrict table identifiers in [0 ... 999999999] interval. Reported-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow()Neal Cardwell
inet_getpeer_v4() can return NULL under OOM conditions, and while inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will crash. This code path now uses the same idiom as the others from: 1d861aa4b3fb08822055345f480850205ffe6170 ("inet: Minimize use of cached route inetpeer."). Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-26can: bcm: initialize ifindex for timeouts without previous frame receptionOliver Hartkopp
Set in the rx_ifindex to pass the correct interface index in the case of a message timeout detection. Usually the rx_ifindex value is set at receive time. But when no CAN frame has been received the RX_TIMEOUT notification did not contain a valid value. Cc: linux-stable <stable@vger.kernel.org> Reported-by: Andre Naujoks <nautsch2@googlemail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-26can: peak_usb: fix hwtstamp assignmentOliver Hartkopp
The skb->tstamp is set to the hardware timestamp when available in the USB urb message. This leads to user visible timestamps which contain the 'uptime' of the USB adapter - and not the usual system generated timestamp. Fix this wrong assignment by applying the available hardware timestamp to the skb_shared_hwtstamps data structure - which is intended for this purpose. Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-11-26mac80211: fix remain-on-channel (non-)cancellingJohannes Berg
Felix Liao reported that when an interface is set DOWN while another interface is executing a ROC, the warning in ieee80211_start_next_roc() (about the first item on the list having started already) triggers. This is because ieee80211_roc_purge() calls it even if it never actually changed the list of ROC items. To fix this, simply remove the function call. If it is needed then it will be done by the ieee80211_sw_roc_work() function when the ROC item that is being removed while active is cleaned up. Cc: stable@vger.kernel.org Reported-by: Felix Liao <Felix.Liao@watchguard.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-11-26Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes
2012-11-26Linux 3.7-rc7Linus Torvalds
2012-11-26Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc EEH bugfixes from Benjamin Herrenschmidt. Two one-liner fixes for the new EEH code. * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/eeh: Do not invalidate PE properly powerpc/pseries: Fix oops with MSIs when missing EEH PEs
2012-11-26Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds
Pull MIPS fixes from Ralf Baechle: "Three issues fixed accross the field: - Some functions that were recently outlined as part of a preemption fix were causing problems with function tracing. - The recently merged in-kernel MPI library uses very outdated headers that contain MIPS-specific code which won't build on with gcc 4.4 or newer. - The MIPS non-NUMA memory initialization was making only a very half-baked attempt at merging adjacent memory ranges. This kept the code simple enough but is now causing issues with kexec." * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MPI: Fix compilation on MIPS with GCC 4.4 and newer MIPS: Fix crash that occurs when function tracing is enabled MIPS: Merge overlapping bootmem ranges
2012-11-25powerpc/eeh: Do not invalidate PE properlyGavin Shan
While the EEH does recovery on the specific PE that has PCI errors, the PCI devices belonging to the PE will be removed and the PE will be marked as invalid since we still need the information stored in the PE. We only invalidate the PE when it doesn't have associated EEH devices and valid child PEs. However, the code used to check that is wrong. The patch fixes that. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-11-24netfilter: ipset: fix netiface set name overflowFlorian Westphal
attribute is copied to IFNAMSIZ-size stack variable, but IFNAMSIZ is smaller than IPSET_MAXNAMELEN. Fortunately nfnetlink needs CAP_NET_ADMIN. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-11-24Merge tag 'sound-3.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound build error fix from Takashi Iwai: "Only a single commit for fixing the build error without CONFIG_PM in hda driver." * tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Fix build without CONFIG_PM
2012-11-24ALSA: hda - Fix build without CONFIG_PMTakashi Iwai
I forgot this again... codec->in_pm is in #ifdef CONFIG_PM Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2012-11-24Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 arch fixes from Peter Anvin: "Here is a collection of fixes for 3.7-rc7. This is a superset of tglx' earlier pull request." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions x86, microcode, AMD: Add support for family 16h processors x86-32: Export kernel_stack_pointer() for modules x86-32: Fix invalid stack address while in softirq x86, efi: Fix processor-specific memcpy() build error x86: remove dummy long from EFI stub x86, mm: Correct vmflag test for checking VM_HUGETLB x86, amd: Disable way access filter on Piledriver CPUs x86/mce: Do not change worker's running cpu in cmci_rediscover(). x86/ce4100: Fix PCI configuration register access for devices without interrupts x86/ce4100: Fix reboot by forcing the reboot method to be KBD x86/ce4100: Fix pm_poweroff MAINTAINERS: Update email address for Robert Richter x86, microcode_amd: Change email addresses, MAINTAINERS entry MAINTAINERS: Change Boris' email address EDAC: Change Boris' email address x86, AMD: Change Boris' email address
2012-11-24Merge tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6Linus Torvalds
Pull MTD fixes from David Woodhouse: "The most important part of this is that it fixes a regression in Samsung NAND chip detection, introduced by some rework which went into 3.7. The initial fix wasn't quite complete, so it's in two parts. In fact the first part is committed twice (Artem committed his own copy of the same patch) and I've merged Artem's tree into mine which already had that fix. I'd have recommitted that to make it somewhat cleaner, but figured by this point in the release cycle it was better to merge *exactly* the commits which have been in linux-next. If I'd recommitted, I'd also omit the sparse warning fix. But it's there, and it's harmless — just marking one function as 'static' in onenand code. This also includes a couple more fixes for stable: an AB-BA deadlock in JFFS2, and an invalid range check in slram." * tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6: mtd: nand: fix Samsung SLC detection regression mtd: nand: fix Samsung SLC NAND identification regression jffs2: Fix lock acquisition order bug in jffs2_write_begin mtd: onenand: Make flexonenand_set_boundary static mtd: slram: invalid checking of absolute end address mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions() mtd: nand: fix Samsung SLC NAND identification regression
2012-11-23Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull device tree regression fix from Grant Likely: "Simple build regression fix for DT device drivers on Sparc. An earlier change had masked out the of_iomap() helper on SPARC." * tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux-2.6: of/address: sparc: Declare of_iomap as an extern function for sparc again
2012-11-23Merge tag 'pm-for-3.7-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management update from Rafael Wysocki: "Fix for an incorrect error condition check in device PM QoS code that may lead to an Oops from Guennadi Liakhovetski." * tag 'pm-for-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / QoS: fix wrong error-checking condition
2012-11-23Merge tag 'md-3.7-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from NeilBrown: "Several bug fixes for md in 3.7: - raid5 discard has problems - raid10 replacement devices have problems - bad block lock seqlock usage has problems - dm-raid doesn't free everything" * tag 'md-3.7-fixes' of git://neil.brown.name/md: md/raid10: decrement correct pending counter when writing to replacement. md/raid10: close race that lose writes lost when replacement completes. md/raid5: Make sure we clear R5_Discard when discard is finished. md/raid5: move resolving of reconstruct_state earlier in stripe_handle. md/raid5: round discard alignment up to power of 2. md: make sure everything is freed when dm-raid stops an array. md: Avoid write invalid address if read_seqretry returned true. md: Reassigned the parameters if read_seqretry returned true in func md_is_badblock.