summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-05-08Merge tag 'f2fs-for-v3.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce a new gloabl lock scheme - add tracepoints on several major functions - fix the overall cleaning process focused on victim selection - apply the block plugging to merge IOs as much as possible - enhance management of free nids and its list - enhance the readahead mode for node pages - address several cretical deadlock conditions - reduce lock_page calls The other minor bug fixes and enhancements are as follows. - calculation mistakes: overflow - bio types: READ, READA, and READ_SYNC - fix the recovery flow, data races, and null pointer errors" * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: cover free_nid management with spin_lock f2fs: optimize scan_nat_page() f2fs: code cleanup for scan_nat_page() and build_free_nids() f2fs: bugfix for alloc_nid_failed() f2fs: recover when journal contains deleted files f2fs: continue to mount after failing recovery f2fs: avoid deadlock during evict after f2fs_gc f2fs: modify the number of issued pages to merge IOs f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry. f2fs: fix inconsistent using of NM_WOUT_THRESHOLD f2fs: check truncation of mapping after lock_page f2fs: enhance alloc_nid and build_free_nids flows f2fs: add a tracepoint on f2fs_new_inode f2fs: check nid == 0 in add_free_nid f2fs: add REQ_META about metadata requests for submit f2fs: give a chance to merge IOs by IO scheduler f2fs: avoid frequent background GC f2fs: add tracepoints to debug checkpoint request f2fs: add tracepoints for write page operations f2fs: add tracepoints to debug the block allocation ...
2013-05-08Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block core updates from Jens Axboe: - Major bit is Kents prep work for immutable bio vecs. - Stable candidate fix for a scheduling-while-atomic in the queue bypass operation. - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging discard bios. - Tejuns changes to convert the writeback thread pool to the generic workqueue mechanism. - Runtime PM framework, SCSI patches exists on top of these in James' tree. - A few random fixes. * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits) relay: move remove_buf_file inside relay_close_buf partitions/efi.c: replace useless kzalloc's by kmalloc's fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read() block: fix max discard sectors limit blkcg: fix "scheduling while atomic" in blk_queue_bypass_start Documentation: cfq-iosched: update documentation help for cfq tunables writeback: expose the bdi_wq workqueue writeback: replace custom worker pool implementation with unbound workqueue writeback: remove unused bdi_pending_list aoe: Fix unitialized var usage bio-integrity: Add explicit field for owner of bip_buf block: Add an explicit bio flag for bios that own their bvec block: Add bio_alloc_pages() block: Convert some code to bio_for_each_segment_all() block: Add bio_for_each_segment_all() bounce: Refactor __blk_queue_bounce to not use bi_io_vec raid1: use bio_copy_data() pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage pktcdvd: use bio_copy_data() block: Add bio_copy_data() ...
2013-05-08f2fs: cover free_nid management with spin_lockJaegeuk Kim
After build_free_nids() searches free nid candidates from nat pages and current journal blocks, it checks all the candidates if they are allocated so that the nat cache has its nid with an allocated block address. In this procedure, previously we used list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list). But, this is not covered by free_nid_list_lock, resulting in null pointer bug. This patch moves this checking routine inside add_free_nid() in order not to use the spin_lock. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: optimize scan_nat_page()Haicheng Li
When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> [Jaegeuk Kim: fix handling the return value of add_free_nid()] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: code cleanup for scan_nat_page() and build_free_nids()Haicheng Li
This patch does two cleanups: 1. remove unused variable "fcnt" in build_free_nids(). 2. make scan_nat_page() as void type and remove useless variable "fcnt". Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: bugfix for alloc_nid_failed()Haicheng Li
Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS Since there is NOT nmi->free_nid_list_lock spinlock protection between a sequential calling of alloc_nid() and alloc_nid_failed(), some other threads may already add new free_nid to the free_nid_list during this period. We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> [Jaegeuk Kim: fit the coding style] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: recover when journal contains deleted filesChris Fries
When recovering a journal file with fsync data for files that have been deleted, don't bail out on recovery. Signed-off-by: Chris Fries <C.Fries@motorola.com> Reviewed-by: Russell Knize <rknize2@motorola.com> Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com> [Jaegeuk Kim: fit the coding style] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: continue to mount after failing recoveryChris Fries
When unable to roll forward the journal, we shouldn't bail out and not mount, we should continue to attempt the mount. Bad recovery data is likely unrecoverable at this point, and requiring the user to try to mount again doesn't solve any issues. Signed-off-by: Chris Fries <C.Fries@motorola.com> Reviewed-by: Russell Knize <rknize2@motorola.com> Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08f2fs: avoid deadlock during evict after f2fs_gcJaegeuk Kim
o Deadlock case #1 Thread 1: - writeback_sb_inodes - do_writepages - f2fs_write_data_pages - write_cache_pages - f2fs_write_data_page - f2fs_balance_fs - wait mutex_lock(gc_mutex) Thread 2: - f2fs_balance_fs - mutex_lock(gc_mutex) - f2fs_gc - f2fs_iget - wait iget_locked(inode->i_lock) Thread 3: - do_unlinkat - iput - lock(inode->i_lock) - evict - inode_wait_for_writeback o Deadlock case #2 Thread 1: - __writeback_single_inode : set I_SYNC - do_writepages - f2fs_write_data_page - f2fs_balance_fs - f2fs_gc - iput - evict - inode_wait_for_writeback(I_SYNC) In order to avoid this, even though iput is called with the zero-reference count, we need to stop the eviction procedure if the inode is on writeback. So this patch links f2fs_drop_inode which checks the I_SYNC flag. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-08Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge more incoming from Andrew Morton: - Various fixes which were stalled or which I picked up recently - A large rotorooting of the AIO code. Allegedly to improve performance but I don't really have good performance numbers (I might have lost the email) and I can't raise Kent today. I held this out of 3.9 and we could give it another cycle if it's all too late/scary. I ended up taking only the first two thirds of the AIO rotorooting. I left the percpu parts and the batch completion for later. - Linus * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits) aio: don't include aio.h in sched.h aio: kill ki_retry aio: kill ki_key aio: give shared kioctx fields their own cachelines aio: kill struct aio_ring_info aio: kill batch allocation aio: change reqs_active to include unreaped completions aio: use cancellation list lazily aio: use flush_dcache_page() aio: make aio_read_evt() more efficient, convert to hrtimers wait: add wait_event_hrtimeout() aio: refcounting cleanup aio: make aio_put_req() lockless aio: do fget() after aio_get_req() aio: dprintk() -> pr_debug() aio: move private stuff out of aio.h aio: add kiocb_cancel() aio: kill return value of aio_complete() char: add aio_{read,write} to /dev/{null,zero} aio: remove retry-based AIO ...
2013-05-08aio: don't include aio.h in sched.hKent Overstreet
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: kill ki_retryKent Overstreet
Thanks to Zach Brown's work to rip out the retry infrastructure, we don't need this anymore - ki_retry was only called right after the kiocb was initialized. This also refactors and trims some duplicated code, as well as cleaning up the refcounting/error handling a bit. [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()] [akpm@linux-foundation.org: fix file_start_write/file_end_write tests] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: kill ki_keyKent Overstreet
ki_key wasn't actually used for anything previously - it was always 0. Drop it to trim struct kiocb a bit. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: give shared kioctx fields their own cachelinesKent Overstreet
[akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: kill struct aio_ring_infoKent Overstreet
struct aio_ring_info was kind of odd, the only place it's used is where it's embedded in struct kioctx - there's no real need for it. The next patch rearranges struct kioctx and puts various things on their own cachelines - getting rid of struct aio_ring_info now makes that reordering a bit clearer. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: kill batch allocationKent Overstreet
Previously, allocating a kiocb required touching quite a few global (well, per kioctx) cachelines... so batching up allocation to amortize those was worthwhile. But we've gotten rid of some of those, and in another couple of patches kiocb allocation won't require writing to any shared cachelines, so that means we can just rip this code out. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: change reqs_active to include unreaped completionsKent Overstreet
The aio code tries really hard to avoid having to deal with the completion ringbuffer overflowing. To do that, it has to keep track of the number of outstanding kiocbs, and the number of completions currently in the ringbuffer - and it's got to check that every time we allocate a kiocb. Ouch. But - we can improve this quite a bit if we just change reqs_active to mean "number of outstanding requests and unreaped completions" - that means kiocb allocation doesn't have to look at the ringbuffer, which is a fairly significant win. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: use cancellation list lazilyKent Overstreet
Cancelling kiocbs requires adding them to a per kioctx linked list, which is one of the few things we need to take the kioctx lock for in the fast path. But most kiocbs can't be cancelled - so if we just do this lazily, we can avoid quite a bit of locking overhead. While we're at it, instead of using a flag bit switch to using ki_cancel itself to indicate that a kiocb has been cancelled/completed. This lets us get rid of ki_flags entirely. [akpm@linux-foundation.org: remove buggy BUG()] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: use flush_dcache_page()Kent Overstreet
This wasn't causing problems before because it's not needed on x86, but it is needed on other architectures. Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: make aio_read_evt() more efficient, convert to hrtimersKent Overstreet
Previously, aio_read_event() pulled a single completion off the ringbuffer at a time, locking and unlocking each time. Change it to pull off as many events as it can at a time, and copy them directly to userspace. This also fixes a bug where if copying the event to userspace failed, we'd lose the event. Also convert it to wait_event_interruptible_hrtimeout(), which simplifies it quite a bit. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: refcounting cleanupKent Overstreet
The usage of ctx->dead was fubar - it makes no sense to explicitly check it all over the place, especially when we're already using RCU. Now, ctx->dead only indicates whether we've dropped the initial refcount. The new teardown sequence is: set ctx->dead hlist_del_rcu(); synchronize_rcu(); Now we know no system calls can take a new ref, and it's safe to drop the initial ref: put_ioctx(); We also need to ensure there are no more outstanding kiocbs. This was done incorrectly - it was being done in kill_ctx(), and before dropping the initial refcount. At this point, other syscalls may still be submitting kiocbs! Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after kioctx->users has dropped to 0 and we know no more iocbs could be submitted. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: make aio_put_req() locklessKent Overstreet
Freeing a kiocb needed to touch the kioctx for three things: * Pull it off the reqs_active list * Decrementing reqs_active * Issuing a wakeup, if the kioctx was in the process of being freed. This patch moves these to aio_complete(), for a couple reasons: * aio_complete() already has to issue the wakeup, so if we drop the kioctx refcount before aio_complete does its wakeup we don't have to do it twice. * aio_complete currently has to take the kioctx lock, so it makes sense for it to pull the kiocb off the reqs_active list too. * A later patch is going to change reqs_active to include unreaped completions - this will mean allocating a kiocb doesn't have to look at the ringbuffer. So taking the decrement of reqs_active out of kiocb_free() is useful prep work for that patch. This doesn't really affect cancellation, since existing (usb) code that implements a cancel function still calls aio_complete() - we just have to make sure that aio_complete does the necessary teardown for cancelled kiocbs. It does affect code paths where we free kiocbs that were never submitted; they need to decrement reqs_active and pull the kiocb off the reqs_active list. This occurs in two places: kiocb_batch_free(), which is going away in a later patch, and the error path in io_submit_one. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: do fget() after aio_get_req()Kent Overstreet
aio_get_req() will fail if we have the maximum number of requests outstanding, which depending on the application may not be uncommon. So avoid doing an unnecessary fget(). Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: dprintk() -> pr_debug()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: move private stuff out of aio.hKent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: add kiocb_cancel()Kent Overstreet
Minor refactoring, to get rid of some duplicated code [akpm@linux-foundation.org: fix warning] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: kill return value of aio_complete()Kent Overstreet
Nothing used the return value, and it probably wasn't possible to use it safely for the locked versions (aio_complete(), aio_put_req()). Just kill it. Signed-off-by: Kent Overstreet <koverstreet@google.com> Acked-by: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08aio: remove retry-based AIOZach Brown
This removes the retry-based AIO infrastructure now that nothing in tree is using it. We want to remove retry-based AIO because it is fundemantally unsafe. It retries IO submission from a kernel thread that has only assumed the mm of the submitting task. All other task_struct references in the IO submission path will see the kernel thread, not the submitting task. This design flaw means that nothing of any meaningful complexity can use retry-based AIO. This removes all the code and data associated with the retry machinery. The most significant benefit of this is the removal of the locking around the unused run list in the submission path. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Zach Brown <zab@redhat.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-08hugetlbfs: fix mmap failure in unaligned size requestNaoya Horiguchi
The current kernel returns -EINVAL unless a given mmap length is "almost" hugepage aligned. This is because in sys_mmap_pgoff() the given length is passed to vm_mmap_pgoff() as it is without being aligned with hugepage boundary. This is a regression introduced in commit 40716e29243d ("hugetlbfs: fix alignment of huge page requests"), where alignment code is pushed into hugetlb_file_setup() and the variable len in caller side is not changed. To fix this, this patch partially reverts that commit, and adds alignment code in caller side. And it also introduces hstate_sizelog() in order to get proper hstate to specified hugepage size. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881 [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: <iceman_dvd@yahoo.com> Cc: Steven Truelove <steven.truelove@utoronto.ca> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "A couple of fixes + getting rid of __blkdev_put() return value" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc: Use PDE attribute setting accessor functions make blkdev_put() return void block_device_operations->release() should return void mtd_blktrans_ops->release() should return void hfs: SMP race on directory close()
2013-05-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This contains two patchsets from Maxim Patlasov. The first reworks the request throttling so that only async requests are throttled. Wakeup of waiting async requests is also optimized. The second series adds support for async processing of direct IO which optimizes direct IO and enables the use of the AIO userspace interface." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: add flag to turn on async direct IO fuse: truncate file if async dio failed fuse: optimize short direct reads fuse: enable asynchronous processing direct IO fuse: make fuse_direct_io() aware about AIO fuse: add support of async IO fuse: move fuse_release_user_pages() up fuse: optimize wake_up fuse: implement exclusive wakeup for blocked_waitq fuse: skip blocking on allocations of synchronous requests fuse: add flag fc->initialized fuse: make request allocations for background processing explicit
2013-05-07Merge branch 'slab/for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab changes from Pekka Enberg: "The bulk of the changes are more slab unification from Christoph. There's also few fixes from Aaron, Glauber, and Joonsoo thrown into the mix." * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits) mm, slab_common: Fix bootstrap creation of kmalloc caches slab: Return NULL for oversized allocations mm: slab: Verify the nodeid passed to ____cache_alloc_node slub: tid must be retrieved from the percpu area of the current processor slub: Do not dereference NULL pointer in node_match slub: add 'likely' macro to inc_slabs_node() slub: correct to calculate num of acquired objects in get_partial_node() slub: correctly bootstrap boot caches mm/sl[au]b: correct allocation type check in kmalloc_slab() slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections slab: Handle ARCH_DMA_MINALIGN correctly slab: Common definition for kmem_cache_node slab: Rename list3/l3 to node slab: Common Kmalloc cache determination stat: Use size_t for sizes instead of unsigned slab: Common function to create the kmalloc array slab: Common definition for the array of kmalloc caches slab: Common constants for kmalloc boundaries slab: Rename nodelists to node slab: Common name for the per node structures ...
2013-05-07Merge branch 'slab/next' into slab/for-linusPekka Enberg
2013-05-07make blkdev_put() return voidAl Viro
same story as with the previous patches - note that return value of blkdev_close() is lost, since there's nowhere the caller (__fput()) could return it to. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07block_device_operations->release() should return voidAl Viro
The value passed is 0 in all but "it can never happen" cases (and those only in a couple of drivers) *and* it would've been lost on the way out anyway, even if something tried to pass something meaningful. Just don't bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Alex Elder: "This is a big pull. Most of it is culmination of Alex's work to implement RBD image layering, which is now complete (yay!). There is also some work from Yan to fix i_mutex behavior surrounding writes in cephfs, a sync write fix, a fix for RBD images that get resized while they are mapped, and a few patches from me that resolve annoying auth warnings and fix several bugs in the ceph auth code." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (254 commits) rbd: fix image request leak on parent read libceph: use slab cache for osd client requests libceph: allocate ceph message data with a slab allocator libceph: allocate ceph messages with a slab allocator rbd: allocate image object names with a slab allocator rbd: allocate object requests with a slab allocator rbd: allocate name separate from obj_request rbd: allocate image requests with a slab allocator rbd: use binary search for snapshot lookup rbd: clear EXISTS flag if mapped snapshot disappears rbd: kill off the snapshot list rbd: define rbd_snap_size() and rbd_snap_features() rbd: use snap_id not index to look up snap info rbd: look up snapshot name in names buffer rbd: drop obj_request->version rbd: drop rbd_obj_method_sync() version parameter rbd: more version parameter removal rbd: get rid of some version parameters rbd: stop tracking header object version rbd: snap names are pointer to constant data ...
2013-05-06Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "A set of cifs cleanup fixes. The only big one of this set optimizes the cifs error logging, renaming cFYI and cERROR macros to cifs_dbg, and in the process makes it clearer and reduces module size." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: small variable name cleanup CIFS: fix error return code in cifs_atomic_open() cifs: store the real expected sequence number in the mid cifs: on send failure, readjust server sequence number downward cifs: remove ENOSPC handling in smb_sendv [CIFS] cifs: Rename cERROR and cFYI to cifs_dbg fs: cifs: use kmemdup instead of kmalloc + memcpy cifs: replaced kmalloc + memset with kzalloc cifs: ignore the unc= and prefixpath= mount options
2013-05-06autofs - remove autofs dentry mount checkDavid Jeffery
When checking if an autofs mount point is busy it isn't sufficient to only check if it's a mount point. For example, if the mount of an offset mountpoint in a tree is denied for this host by its export and the dentry becomes a process working directory the check incorrectly returns the mount as not in use at expire. This can happen since the default when mounting within a tree is nostrict, which means ingnore mount fails on mounts within the tree and continue. The nostrict option is meant to allow mounting in this case. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-06autofs - fix sparse warning for autofs4_d_manage()Claudiu Ghioc
Fixed the sparse warning: fs/autofs4/root.c:411:5: warning: symbol 'autofs4_d_manage' was not declared. Should it be static?" [ Clearly it should be static as the function is declared static at the top of root.c. - imk ] Signed-off-by: Claudiu Ghioc <claudiu.ghioc@gmail.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-06hfs: SMP race on directory close()Al Viro
->open_dir_list needs protection... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-05cifs: small variable name cleanupDan Carpenter
server and ses->server are the same, but it's a little bit ugly that we lock &ses->server->srv_mutex and unlock &server->srv_mutex. It causes a false positive in Smatch about inconsistent locking. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05CIFS: fix error return code in cifs_atomic_open()Wei Yongjun
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05cifs: store the real expected sequence number in the midJeff Layton
Currently, the signing routines take a pointer to a place to store the expected sequence number for the mid response. It then stores a value that's one below what that sequence number should be, and then adds one to it when verifying the signature on the response. Increment the sequence number before storing the value in the mid, and eliminate the "+1" when checking the signature. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05cifs: on send failure, readjust server sequence number downwardJeff Layton
If sending a call to the server fails for some reason (for instance, the sending thread caught a signal), then we must readjust the sequence number downward again or the next send will have it too high. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05cifs: remove ENOSPC handling in smb_sendvJeff Layton
To my knowledge, no one ever reported seeing this pop. Acked-by: Suresh Jayaraman <sjayaraman@novell.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05[CIFS] cifs: Rename cERROR and cFYI to cifs_dbgJoe Perches
It's not obvious from reading the macro names that these macros are for debugging. Convert the names to a single more typical kernel style cifs_dbg macro. cERROR(1, ...) -> cifs_dbg(VFS, ...) cFYI(1, ...) -> cifs_dbg(FYI, ...) cFYI(DBG2, ...) -> cifs_dbg(NOISY, ...) Move the terminating format newline from the macro to the call site. Add CONFIG_CIFS_DEBUG function cifs_vfs_err to emit the "CIFS VFS: " prefix for VFS messages. Size is reduced ~ 1% when CONFIG_CIFS_DEBUG is set (default y) $ size fs/cifs/cifs.ko* text data bss dec hex filename 265245 2525 132 267902 4167e fs/cifs/cifs.ko.new 268359 2525 132 271016 422a8 fs/cifs/cifs.ko.old Other miscellaneous changes around these conversions: o Miscellaneous typo fixes o Add terminating \n's to almost all formats and remove them from the macros to be more kernel style like. A few formats previously had defective \n's o Remove unnecessary OOM messages as kmalloc() calls dump_stack o Coalesce formats to make grep easier, added missing spaces when coalescing formats o Use %s, __func__ instead of embedded function name o Removed unnecessary "cifs: " prefixes o Convert kzalloc with multiply to kcalloc o Remove unused cifswarn macro Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05fs: cifs: use kmemdup instead of kmalloc + memcpySilviu-Mihai Popescu
This replaces calls to kmalloc followed by memcpy with a single call to kmemdup. This was found via make coccicheck. Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05cifs: replaced kmalloc + memset with kzallocDia Vasile
Signed-off-by: Diana Vasile <kill.elohim@hotmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-05cifs: ignore the unc= and prefixpath= mount optionsJeff Layton
...as advertised for 3.10. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Steve French <smfrench@gmail.com>
2013-05-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second round of VFS updates from Al Viro: "Assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xtensa simdisk: fix braino in "xtensa simdisk: switch to proc_create_data()" hostfs: use kmalloc instead of kzalloc hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h> hostfs: remove "will unlock" comment vfs: use list_move instead of list_del/list_add proc_devtree: Replace include linux/module.h with linux/export.h create_mnt_ns: unidiomatic use of list_add() fs: remove dentry_lru_prune() Removed unused typedef to avoid "unused local typedef" warnings. kill fs/read_write.h fs: Fix hang with BSD accounting on frozen filesystem sun3_scsi: add ->show_info() nubus: Kill nubus_proc_detach_device() more mode_t whack-a-mole... do_coredump(): don't wait for thaw if coredump has already been interrupted do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks")