summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-03-23cgroups: if you list_empty() a head then don't list_del() itPhil Carmody
list_del() leaves poison in the prev and next pointers. The next list_empty() will compare those poisons, and say the list isn't empty. Any list operations that assume the node is on a list because of such a check will be fooled into dereferencing poison. One needs to INIT the node after the del, and fortunately there's already a wrapper for that - list_del_init(). Some of the dels are followed by deallocations, so can be ignored, and one can be merged with an add to make a move. Apart from that, I erred on the side of caution in making nodes list_empty()-queriable. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23oom: avoid deferring oom killer if exiting task is being tracedDavid Rientjes
The oom killer naturally defers killing anything if it finds an eligible task that is already exiting and has yet to detach its ->mm. This avoids unnecessarily killing tasks when one is already in the exit path and may free enough memory that the oom killer is no longer needed. This is detected by PF_EXITING since threads that have already detached its ->mm are no longer considered at all. The problem with always deferring when a thread is PF_EXITING, however, is that it may never actually exit when being traced, specifically if another task is tracing it with PTRACE_O_TRACEEXIT. The oom killer does not want to defer in this case since there is no guarantee that thread will ever exit without intervention. This patch will now only defer the oom killer when a thread is PF_EXITING and no ptracer has stopped its progress in the exit path. It also ensures that a child is sacrificed for the chosen parent only if it has a different ->mm as the comment implies: this ensures that the thread group leader is always targeted appropriately. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23oom: skip zombies when iterating tasklistAndrey Vagin
We shouldn't defer oom killing if a thread has already detached its ->mm and still has TIF_MEMDIE set. Memory needs to be freed, so find kill other threads that pin the same ->mm or find another task to kill. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23oom: prevent unnecessary oom kills or kernel panicsDavid Rientjes
This patch prevents unnecessary oom kills or kernel panics by reverting two commits: 495789a5 (oom: make oom_score to per-process value) cef1d352 (oom: multi threaded process coredump don't make deadlock) First, 495789a5 (oom: make oom_score to per-process value) ignores the fact that all threads in a thread group do not necessarily exit at the same time. It is imperative that select_bad_process() detect threads that are in the exit path, specifically those with PF_EXITING set, to prevent needlessly killing additional tasks. If a process is oom killed and the thread group leader exits, select_bad_process() cannot detect the other threads that are PF_EXITING by iterating over only processes. Thus, it currently chooses another task unnecessarily for oom kill or panics the machine when nothing else is eligible. By iterating over threads instead, it is possible to detect threads that are exiting and nominate them for oom kill so they get access to memory reserves. Second, cef1d352 (oom: multi threaded process coredump don't make deadlock) erroneously avoids making the oom killer a no-op when an eligible thread other than current isfound to be exiting. We want to detect this situation so that we may allow that exiting thread time to exit and free its memory; if it is able to exit on its own, that should free memory so current is no loner oom. If it is not able to exit on its own, the oom killer will nominate it for oom kill which, in this case, only means it will get access to memory reserves. Without this change, it is easy for the oom killer to unnecessarily target tasks when all threads of a victim don't exit before the thread group leader or, in the worst case, panic the machine. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23mm: swap: unlock swapfile inode mutex before closing file on bad swapfilesMel Gorman
If an administrator tries to swapon a file backed by NFS, the inode mutex is taken (as it is for any swapfile) but later identified to be a bad swapfile due to the lack of bmap and tries to cleanup. During cleanup, an attempt is made to close the file but with inode->i_mutex still held. Closing an NFS file syncs it which tries to acquire the inode mutex leading to deadlock. If lockdep is enabled the following appears on the console; ============================================= [ INFO: possible recursive locking detected ] 2.6.38-rc8-autobuild #1 --------------------------------------------- swapon/2192 is trying to acquire lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c but task is already holding lock: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 other info that might help us debug this: 1 lock held by swapon/2192: #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7 stack backtrace: Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1 Call Trace: __lock_acquire+0x2eb/0x1623 find_get_pages_tag+0x14a/0x174 pagevec_lookup_tag+0x25/0x2e vfs_fsync_range+0x47/0x7c lock_acquire+0xd3/0x100 vfs_fsync_range+0x47/0x7c nfs_flush_one+0x0/0xdf [nfs] mutex_lock_nested+0x40/0x2b1 vfs_fsync_range+0x47/0x7c vfs_fsync_range+0x47/0x7c vfs_fsync+0x1c/0x1e nfs_file_flush+0x64/0x69 [nfs] filp_close+0x43/0x72 sys_swapon+0xa39/0xae7 sysret_check+0x2e/0x69 system_call_fastpath+0x16/0x1b This patch releases the mutex if its held before calling filep_close() so swapon fails as expected without deadlock when the swapfile is backed by NFS. If accepted for 2.6.39, it should also be considered a -stable candidate for 2.6.38 and 2.6.37. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@kernel.org> [2.6.37+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23include/asm-generic/unistd.h: fix syncfs syscall numberAndrew Morton
syncfs() is duplicating name_to_handle_at() due to a merging mistake. Cc: Sage Weil <sage@newdream.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22Merge branch 'slab/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Add statistics for this_cmpxchg_double failures slub: Add missing irq restore for the OOM path
2011-03-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: [net/9p]: Introduce basic flow-control for VirtIO transport. 9p: use the updated offset given by generic_write_checks [net/9p] Don't re-pin pages on retrying virtqueue_add_buf(). [net/9p] Set the condition just before waking up. [net/9p] unconditional wake_up to proc waiting for space on VirtIO ring fs/9p: Add v9fs_dentry2v9ses fs/9p: Attach writeback_fid on first open with WR flag fs/9p: Open writeback fid in O_SYNC mode fs/9p: Use truncate_setsize instead of vmtruncate net/9p: Fix compile warning net/9p: Convert the in the 9p rpc call path to GFP_NOFS fs/9p: Fix race in initializing writeback fid
2011-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use watch/notify for changes in rbd header libceph: add lingering request and watch/notify event framework rbd: update email address in Documentation ceph: rename dentry_release -> d_release, fix comment ceph: add request to the tail of unsafe write list ceph: remove request from unsafe list if it is canceled/timed out ceph: move readahead default to fs/ceph from libceph ceph: add ino32 mount option ceph: update common header files ceph: remove debugfs debug cruft libceph: fix osd request queuing on osdmap updates ceph: preserve I_COMPLETE across rename libceph: Fix base64-decoding when input ends in newline.
2011-03-22tty: stop using "delayed_work" in the tty layerLinus Torvalds
Using delayed-work for tty flip buffers ends up causing us to wait for the next tick to complete some actions. That's usually not all that noticeable, but for certain latency-critical workloads it ends up being totally unacceptable. As an extreme case of this, passing a token back-and-forth over a pty will take two ticks per iteration, so even just a thousand iterations will take 8 seconds assuming a common 250Hz configuration. Avoiding the whole delayed work issue brings that ping-pong test-case down to 0.009s on my machine. In more practical terms, this latency has been a performance problem for things like dive computer simulators (simulating the serial interface using the ptys) and for other environments (Alan mentions a CP/M emulator). Reported-by: Jef Driesen <jefdriesen@telenet.be> Acked-by: Greg KH <gregkh@suse.de> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22[net/9p]: Introduce basic flow-control for VirtIO transport.Venkateswararao Jujjuri (JV)
Recent zerocopy work in the 9P VirtIO transport maps and pins user buffers into kernel memory for the server to work on them. Since the user process can initiate this kind of pinning with a simple read/write call, thousands of IO threads initiated by the user process can hog the system resources and could result into denial of service. This patch introduces flow control to avoid that extreme scenario. The ceiling limit to avoid denial of service attacks is set to relatively high (nr_free_pagecache_pages()/4) so that it won't interfere with regular usage, but can step in extreme cases to limit the total system hang. Since we don't have a global structure to accommodate this variable, I choose the virtio_chan as the home for this. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-229p: use the updated offset given by generic_write_checksM. Mohan Kumar
Without this fix, even if a file is opened in O_APPEND mode, data will be written at current file position instead of end of file. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().Venkateswararao Jujjuri (JV)
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] Set the condition just before waking up.Venkateswararao Jujjuri (JV)
Given that the sprious wake-ups are common, we need to move the condition setting right next to the wake_up(). After setting the condition to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the virtqueue back on the free list for someone else to use. This may result in kernel panic while relasing the pinned pages in p9_release_req_pages(). Also rearranged the while loop in req_done() for better redability. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22[net/9p] unconditional wake_up to proc waiting for space on VirtIO ringVenkateswararao Jujjuri (JV)
Process may wait to get space on VirtIO ring to send a transaction to VirtFS server. Current code just does a conditional wake_up() which means only one process will be woken up even if multiple processes are waiting. This fix makes the wake_up unconditional. Hence we won't have any processes waiting for-ever. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22fs/9p: Add v9fs_dentry2v9sesAneesh Kumar K.V
Add the new static inline and use the same Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22fs/9p: Attach writeback_fid on first open with WR flagAneesh Kumar K.V
We don't need writeback fid if we are only doing O_RDONLY open Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22fs/9p: Open writeback fid in O_SYNC modeAneesh Kumar K.V
Older version of protocol don't support tsyncfs operation. So for them force a O_SYNC flag on the server Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22fs/9p: Use truncate_setsize instead of vmtruncateAneesh Kumar K.V
convert vmtruncate usage to truncate_setsize. We also writeback all dirty pages before doing 9p operations and on success call truncate_setsize. This ensure that we continue sanely on failed truncate on the server. The disadvantage is that we are now going to write back the content that get thrown away later as a part of truncate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22net/9p: Fix compile warningAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22net/9p: Convert the in the 9p rpc call path to GFP_NOFSAneesh Kumar K.V
Without this we can cause reclaim allocation in writepage. [ 3433.448430] ================================= [ 3433.449117] [ INFO: inconsistent lock state ] [ 3433.449117] 2.6.38-rc5+ #84 [ 3433.449117] --------------------------------- [ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. [ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3433.449117] (iprune_sem){+++++-}, at: [<ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1 [ 3433.449117] {RECLAIM_FS-ON-W} state was registered at: [ 3433.449117] [<ffffffff8107fe5f>] mark_held_locks+0x52/0x70 [ 3433.449117] [<ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f [ 3433.449117] [<ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c [ 3433.449117] [<ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2 [ 3433.449117] [<ffffffff8127be77>] idr_pre_get+0x2d/0x6f [ 3433.449117] [<ffffffff815434eb>] p9_idpool_get+0x30/0xae [ 3433.449117] [<ffffffff81540123>] p9_client_rpc+0xd7/0x9b0 [ 3433.449117] [<ffffffff815427b0>] p9_client_clunk+0x88/0xdb [ 3433.449117] [<ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48 [ 3433.449117] [<ffffffff810eb511>] evict+0x1f/0x87 [ 3433.449117] [<ffffffff810eb5c0>] dispose_list+0x47/0xe3 [ 3433.449117] [<ffffffff810eb8da>] evict_inodes+0x138/0x14f [ 3433.449117] [<ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8 [ 3433.449117] [<ffffffff810d91e8>] kill_anon_super+0x11/0x50 [ 3433.449117] [<ffffffff811d4951>] v9fs_kill_super+0x49/0xab [ 3433.449117] [<ffffffff810d926e>] deactivate_locked_super+0x21/0x46 [ 3433.449117] [<ffffffff810d9e84>] deactivate_super+0x40/0x44 [ 3433.449117] [<ffffffff810ef848>] mntput_no_expire+0x100/0x109 [ 3433.449117] [<ffffffff810f0aeb>] sys_umount+0x2f1/0x31c [ 3433.449117] [<ffffffff8102c87b>] system_call_fastpath+0x16/0x1b [ 3433.449117] irq event stamp: 192941 [ 3433.449117] hardirqs last enabled at (192941): [<ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30 [ 3433.449117] hardirqs last disabled at (192940): [<ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5 [ 3433.449117] softirqs last enabled at (188470): [<ffffffff8105fd65>] __do_softirq+0x133/0x152 [ 3433.449117] softirqs last disabled at (188455): [<ffffffff8102d7cc>] call_softirq+0x1c/0x28 [ 3433.449117] [ 3433.449117] other info that might help us debug this: [ 3433.449117] 1 lock held by kswapd0/505: [ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810b52e2>] shrink_slab+0x38/0x15f [ 3433.449117] [ 3433.449117] stack backtrace: [ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84 [ 3433.449117] Call Trace: [ 3433.449117] [<ffffffff8107fbce>] ? valid_state+0x17e/0x191 [ 3433.449117] [<ffffffff81036896>] ? save_stack_trace+0x28/0x45 [ 3433.449117] [<ffffffff81080426>] ? check_usage_forwards+0x0/0x87 [ 3433.449117] [<ffffffff8107fcf4>] ? mark_lock+0x113/0x22c [ 3433.449117] [<ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7 [ 3433.449117] [<ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c [ 3433.449117] [<ffffffff81081077>] ? __lock_acquire+0x392/0xcf7 [ 3433.449117] [<ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28 [ 3433.449117] [<ffffffff81081a33>] ? lock_acquire+0x57/0x6d [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff81567d85>] ? down_read+0x47/0x5c [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810b5385>] ? shrink_slab+0xdb/0x15f [ 3433.449117] [<ffffffff810b69bc>] ? kswapd+0x574/0x96a [ 3433.449117] [<ffffffff810b6448>] ? kswapd+0x0/0x96a [ 3433.449117] [<ffffffff810714e2>] ? kthread+0x7d/0x85 [ 3433.449117] [<ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10 [ 3433.449117] [<ffffffff81569200>] ? restore_args+0x0/0x30 [ 3433.449117] [<ffffffff81071465>] ? kthread+0x0/0x85 [ 3433.449117] [<ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22fs/9p: Fix race in initializing writeback fidAneesh Kumar K.V
When two process open the same file we can end up with both of them allocating the writeback_fid. Add a new mutex which can be used for synchronizing v9fs_inode member values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-03-22slub: Add statistics for this_cmpxchg_double failuresChristoph Lameter
Add some statistics for debugging. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-03-22slub: Add missing irq restore for the OOM pathChristoph Lameter
OOM path is missing the irq restore in the CONFIG_CMPXCHG_LOCAL case. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-03-22rbd: use watch/notify for changes in rbd headerYehuda Sadeh
Send notifications when we change the rbd header (e.g. create a snapshot) and wait for such notifications. This allows synchronizing the snapshot creation between different rbd clients/rools. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22libceph: add lingering request and watch/notify event frameworkYehuda Sadeh
Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: make fuse_dentry_revalidate() RCU aware fuse: make fuse_permission() RCU aware fuse: wakeup pollers on connection release/abort fuse: reduce size of struct fuse_request
2011-03-22Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: update mask_rw_pte after kernel page tables init changes xen: set max_pfn_mapped to the last pfn mapped x86: Cleanup highmap after brk is concluded Fix up trivial onflict (added header file includes) in arch/x86/mm/init_64.c
2011-03-22Merge branch 'next-samsung' of git://git.fluff.org/bjdooks/linuxLinus Torvalds
* 'next-samsung' of git://git.fluff.org/bjdooks/linux: ARM: H1940/RX1950: Change default LED triggers ARM: S3C2442: RX1950: Add support for LED blinking ARM: S3C2442: RX1950: Retain LEDs state in suspend ARM: S3C2410: H1940: Fix lcd_power_set function ARM: S3C2410: H1940: Add battery support ARM: S3C2410: H1940: Use leds-gpio driver for LEDs managing ARM: S3C2410: H1940: Make h1940-bluetooth.c compile again ARM: S3C2410: H1940: Add keys device
2011-03-22Merge branch 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linuxLinus Torvalds
* 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linux: i2c-pxa2xx: Don't clear isr bits too early i2c-pxa2xx: Fix register offsets i2c-pxa2xx: pass of_node from platform driver to adapter and publish i2c-pxa2xx: check timeout correctly i2c-pxa2xx: add support for shared IRQ handler i2c-pxa2xx: Add PCI support for PXA I2C controller ARM: pxa2xx: reorganize I2C files i2c-pxa2xx: use dynamic register layout i2c-mxs: set controller to pio queue mode after reset i2c-eg20t: support new device OKI SEMICONDUCTOR ML7213 IOH i2c/busses: Add support for Diolan U2C-12 USB-I2C adapter
2011-03-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Dont define useless label in the !CONFIG_CMPXCHG_LOCAL case slab,rcu: don't assume the size of struct rcu_head slub,rcu: don't assume the size of struct rcu_head slub: automatically reserve bytes at the end of slab Lockless (and preemptless) fastpaths for slub slub: Get rid of slab_free_hook_irq() slub: min_partial needs to be in first cacheline slub: fix ksize() build error slub: fix kmemcheck calls to match ksize() hints Revert "slab: Fix missing DEBUG_SLAB last user" mm: Remove support for kmem_cache_name()
2011-03-22sd: Fail discard requests when logical block provisioning has been disabledMartin K. Petersen
Ensure that we kill discard requests after logical block provisioning has been disabled in sysfs. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits) IPVS: Use global mutex in ip_vs_app.c ipvs: fix a typo in __ip_vs_control_init() veth: Fix the byte counters net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries. macvlan: Fix use after free of struct macvlan_port. net: fix incorrect spelling in drop monitor protocol can: c_can: Do basic c_can configuration _before_ enabling the interrupts net/appletalk: fix atalk_release use after free ipx: fix ipx_release() snmp: SNMP_UPD_PO_STATS_BH() always called from softirq l2tp: fix possible oops on l2tp_eth module unload xfrm: Fix initialize repl field of struct xfrm_state netfilter: ipt_CLUSTERIP: fix buffer overflow netfilter: xtables: fix reentrancy netfilter: ipset: fix checking the type revision at create command netfilter: ipset: fix address ranges at hash:*port* types niu: Rename NIU parent platform device name to fix conflict. r8169: fix a bug in rtl8169_init_phy() bonding: fix a typo in a comment ftmac100: use resource_size() ...
2011-03-22IPVS: Use global mutex in ip_vs_app.cSimon Horman
As part of the work to make IPVS network namespace aware __ip_vs_app_mutex was replaced by a per-namespace lock, ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes. Unfortunately this implementation results in ipvs->app_key residing in non-static storage which at the very least causes a lockdep warning. This patch takes the rather heavy-handed approach of reinstating __ip_vs_app_mutex which will cover access to the ipvs->list_head of all network namespaces. [ 12.610000] IPVS: Creating netns size=2456 id=0 [ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) [ 12.640000] BUG: key ffff880003bbf1a0 not in .data! [ 12.640000] ------------[ cut here ]------------ [ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570() [ 12.640000] Hardware name: Bochs [ 12.640000] Pid: 1, comm: swapper Tainted: G W 2.6.38-kexec-06330-g69b7efe-dirty #122 [ 12.650000] Call Trace: [ 12.650000] [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0 [ 12.650000] [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20 [ 12.650000] [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570 [ 12.650000] [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10 [ 12.650000] [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50 [ 12.650000] [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70 [ 12.650000] [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1c33>] T.620+0x43/0x170 [ 12.660000] [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.670000] [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40 [ 12.670000] [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12 [ 12.670000] [<ffffffff81685a87>] ip_vs_init+0x4c/0xff [ 12.670000] [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e [ 12.670000] [<ffffffff8166583e>] kernel_init+0x13e/0x1c2 [ 12.670000] [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10 [ 12.670000] [<ffffffff8128ad40>] ? restore_args+0x0/0x30 [ 12.680000] [<ffffffff81665700>] ? kernel_init+0x0/0x1c2 [ 12.680000] [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0 Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipvs: fix a typo in __ip_vs_control_init()Eric Dumazet
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22veth: Fix the byte countersEric W. Biederman
Commit 44540960 "veth: move loopback logic to common location" introduced a bug in the packet counters. I don't understand why that happened as it is not explained in the comments and the mut check in dev_forward_skb retains the assumption that skb->len is the total length of the packet. I just measured this emperically by setting up a veth pair between two noop network namespaces setting and attempting a telnet connection between the two. I saw three packets in each direction and the byte counters were exactly 14*3 = 42 bytes high in each direction. I got the actual packet lengths with tcpdump. So remove the extra ETH_HLEN from the veth byte count totals. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.Eric W. Biederman
When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh by adding a mount point it appears I missed a critical ordering issue, in the ipv6 initialization. I had not realized that ipv6_sysctl_register is called at the very end of the ipv6 initialization and in particular after we call neigh_sysctl_register from ndisc_init. "neigh" needs to be initialized in ipv6_static_sysctl_register which is the first ipv6 table to initialized, and definitely before ndisc_init. This removes the weirdness of duplicate tables while still providing a "neigh" mount point which prevents races in sysctl unregistering. This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232 Reported-by: sunkan@zappa.cx Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22macvlan: Fix use after free of struct macvlan_port.Eric W. Biederman
When the macvlan driver was extended to call unregisgter_netdevice_queue in 23289a37e2b127dfc4de1313fba15bb4c9f0cd5b, a use after free of struct macvlan_port was introduced. The code in dellink relied on unregister_netdevice actually unregistering the net device so it would be safe to free macvlan_port. Since unregister_netdevice_queue can just queue up the unregister instead of performing the unregiser immediately we free the macvlan_port too soon and then the code in macvlan_stop removes the macaddress for the set of macaddress to listen for and uses memory that has already been freed. To fix this add a reference count to track when it is safe to free the macvlan_port and move the call of macvlan_port_destroy into macvlan_uninit which is guaranteed to be called after the final macvlan_port_close. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22net: fix incorrect spelling in drop monitor protocolNeil Horman
It was pointed out to me recently that my spelling could be better :) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22can: c_can: Do basic c_can configuration _before_ enabling the interruptsJan Altenberg
I ran into some trouble while testing the SocketCAN driver for the BOSCH C_CAN controller. The interface is not correctly initialized, if I put some CAN traffic on the line, _while_ the interface is being started (which means: the interface doesn't come up correcty, if there's some RX traffic while doing 'ifconfig can0 up'). The current implementation enables the controller interrupts _before_ doing the basic c_can configuration. I think, this should be done the other way round. The patch below fixes things for me. Signed-off-by: Jan Altenberg <jan@linutronix.de> Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22net/appletalk: fix atalk_release use after freeArnd Bergmann
The BKL removal in appletalk introduced a use-after-free problem, where atalk_destroy_socket frees a sock, but we still release the socket lock on it. An easy fix is to take an extra reference on the sock and sock_put it when returning from atalk_release. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22ipx: fix ipx_release()Eric Dumazet
Commit b0d0d915d1d1a0 (remove the BKL) added a regression, because sock_put() can free memory while we are going to use it later. Fix is to delay sock_put() _after_ release_sock(). Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22snmp: SNMP_UPD_PO_STATS_BH() always called from softirqEric Dumazet
We dont need to test if we run from softirq context, we definitely are. This saves few instructions in ip_rcv() & ip_rcv_finish() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22l2tp: fix possible oops on l2tp_eth module unloadJames Chapman
A struct used in the l2tp_eth driver for registering network namespace ops was incorrectly marked as __net_initdata, leading to oops when module unloaded. BUG: unable to handle kernel paging request at ffffffffa00ec098 IP: [<ffffffff8123dbd8>] ops_exit_list+0x7/0x4b PGD 142d067 PUD 1431063 PMD 195da8067 PTE 0 Oops: 0000 [#1] SMP last sysfs file: /sys/module/l2tp_eth/refcnt Call Trace: [<ffffffff8123dc94>] ? unregister_pernet_operations+0x32/0x93 [<ffffffff8123dd20>] ? unregister_pernet_device+0x2b/0x38 [<ffffffff81068b6e>] ? sys_delete_module+0x1b8/0x222 [<ffffffff810c7300>] ? do_munmap+0x254/0x318 [<ffffffff812c64e5>] ? page_fault+0x25/0x30 [<ffffffff812c6952>] ? system_call_fastpath+0x16/0x1b Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-22xfrm: Fix initialize repl field of struct xfrm_stateWei Yongjun
Commit 'xfrm: Move IPsec replay detection functions to a separate file' (9fdc4883d92d20842c5acea77a4a21bb1574b495) introduce repl field to struct xfrm_state, and only initialize it under SA's netlink create path, the other path, such as pf_key, ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if the SA is created by pf_key, any input packet with SA's encryption algorithm will cause panic. int xfrm_input() { ... x->repl->advance(x, seq); ... } This patch fixed it by introduce new function __xfrm_init_state(). Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0 EIP is at xfrm_input+0x31c/0x4cc EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000 ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000) Stack: 00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0 00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000 dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32 Call Trace: [<c0786848>] xfrm4_rcv_encap+0x22/0x27 [<c0786868>] xfrm4_rcv+0x1b/0x1d [<c074ee56>] ip_local_deliver_finish+0x112/0x1b1 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074ef77>] ip_local_deliver+0x3e/0x44 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ec03>] ip_rcv_finish+0x30a/0x332 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074f188>] ip_rcv+0x20b/0x247 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c072797d>] __netif_receive_skb+0x373/0x399 [<c0727bc1>] netif_receive_skb+0x4b/0x51 [<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp] [<c072818f>] net_rx_action+0x9a/0x17d [<c0445b5c>] __do_softirq+0xa1/0x149 [<c0445abb>] ? __do_softirq+0x0/0x149 Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21ARM: H1940/RX1950: Change default LED triggersVasily Khoruzhick
Change LED triggers to mimic WinMobile behavior: red blinking when battery is charging, orange solid when battery is charged. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2011-03-21i2c-pxa2xx: Don't clear isr bits too earlyVasily Khoruzhick
isr is passed later into i2c_pxa_irq_txempty and i2c_pxa_irq_rxfull and they may use some other bits than irq sources. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
2011-03-21Merge branches 'for-2639/i2c/i2c-ce4100-v6', 'for-2639/i2c/i2c-eg20t-v3' and ↵Ben Dooks
'for-2639/i2c/i2c-imx' into for-linus/2639/i2c-2
2011-03-21Merge branch 'kbuild' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: kbuild: Make DEBUG_SECTION_MISMATCH selectable, but not on by default genksyms: Regenerate lexer and parser genksyms: Track changes to enum constants genksyms: simplify usage of find_symbol() genksyms: Add helpers for building string lists genksyms: Simplify printing of symbol types genksyms: Simplify lexer genksyms: Do not paste the bison header file to lex.c modpost: fix trailing comma KBuild: silence "'scripts/unifdef' is up to date." kbuild: Add extra gcc checks kbuild: reenable section mismatch analysis unifdef: update to upstream version 2.5
2011-03-21Reduce sequential pointer derefs in scsi_error.c and reduce size as wellJesper Juhl
This patch reduces the number of sequential pointer derefs in drivers/scsi/scsi_error.c This has been submitted a number of times over a couple of years. I believe this version adresses all comments it has gathered over time. Please apply or reject with a reason. The benefits are: - makes the code easier to read. Lots of sequential derefs of the same pointers is not easy on the eye. - theoretically at least, just dereferencing the pointers once can allow the compiler to generally slightly faster code, so in theory this could also be a micro speed optimization. - reduces size of object file (tiny effect: on x86-64, in at least one configuration, the text size decreased from 9439 bytes to 9400) - removes some pointless (mostly trailing) whitespace. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>