summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2013-06-04ext4: protect extent conversion after DIO with i_dio_countJan Kara
Make sure extent conversion after DIO happens while i_dio_count is still elevated so that inode_dio_wait() waits until extent conversion is done. This removes the need for explicit waiting for extent conversion in some cases. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: defer clearing of PageWriteback after extent conversionJan Kara
Currently PageWriteback bit gets cleared from put_io_page() called from ext4_end_bio(). This is somewhat inconvenient as extent tree is not fully updated at that time (unwritten extents are not marked as written) so we cannot read the data back yet. This design was dictated by lock ordering as we cannot start a transaction while PageWriteback bit is set (we could easily deadlock with ext4_da_writepages()). But now that we use transaction reservation for extent conversion, locking issues are solved and we can move PageWriteback bit clearing after extent conversion is done. As a result we can remove wait for unwritten extent conversion from ext4_sync_file() because it already implicitely happens through wait_on_page_writeback(). We implement deferring of PageWriteback clearing by queueing completed bios to appropriate io_end and processing all the pages when io_end is going to be freed instead of at the moment ext4_io_end() is called. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: split extent conversion lists to reserved & unreserved partsJan Kara
Now that we have extent conversions with reserved transaction, we have to prevent extent conversions without reserved transaction (from DIO code) to block these (as that would effectively void any transaction reservation we did). So split lists, work items, and work queues to reserved and unreserved parts. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: use transaction reservation for extent conversion in ext4_end_ioJan Kara
Later we would like to clear PageWriteback bit only after extent conversion from unwritten to written extents is performed. However it is not possible to start a transaction after PageWriteback is set because that violates lock ordering (and is easy to deadlock). So we have to reserve a transaction before locking pages and sending them for IO and later we use the transaction for extent conversion from ext4_end_io(). Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: remove buffer_uninit handlingJan Kara
There isn't any need for setting BH_Uninit on buffers anymore. It was only used to signal we need to mark io_end as needing extent conversion in add_bh_to_extent() but now we can mark the io_end directly when mapping extent. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: restructure writeback pathJan Kara
There are two issues with current writeback path in ext4. For one we don't necessarily map complete pages when blocksize < pagesize and thus needn't do any writeback in one iteration. We always map some blocks though so we will eventually finish mapping the page. Just if writeback races with other operations on the file, forward progress is not really guaranteed. The second problem is that current code structure makes it hard to associate all the bios to some range of pages with one io_end structure so that unwritten extents can be converted after all the bios are finished. This will be especially difficult later when io_end will be associated with reserved transaction handle. We restructure the writeback path to a relatively simple loop which first prepares extent of pages, then maps one or more extents so that no page is partially mapped, and once page is fully mapped it is submitted for IO. We keep all the mapping and IO submission information in mpage_da_data structure to somewhat reduce stack usage. Resulting code is somewhat shorter than the old one and hopefully also easier to read. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: better estimate credits needed for ext4_da_writepages()Jan Kara
We limit the number of blocks written in a single loop of ext4_da_writepages() to 64 when inode uses indirect blocks. That is unnecessary as credit estimates for mapping logically continguous run of blocks is rather low even for inode with indirect blocks. So just lift this limitation and properly calculate the number of necessary credits. This better credit estimate will also later allow us to always write at least a single page in one iteration. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: improve writepage credit estimate for files with indirect blocksJan Kara
ext4_ind_trans_blocks() wrongly used 'chunk' argument to decide whether blocks mapped are logically contiguous. That is wrong since the argument informs whether the blocks are physically contiguous. As the blocks mapped are always logically contiguous and that's all ext4_ind_trans_blocks() cares about, just remove the 'chunk' argument. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: deprecate max_writeback_mb_bump sysfs attributeJan Kara
This attribute is now unused so deprecate it. We still show the old default value to keep some compatibility but we don't allow writing to that attribute anymore. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: stop messing with nr_to_write in ext4_da_writepages()Jan Kara
Writeback code got better in how it submits IO and now the number of pages requested to be written is usually higher than original 1024. The number is now dynamically computed based on observed throughput and is set to be about 0.5 s worth of writeback. E.g. on ordinary SATA drive this ends up somewhere around 10000 as my testing shows. So remove the unnecessary smarts from ext4_da_writepages(). Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: provide wrappers for transaction reservation callsJan Kara
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04jbd2: transaction reservation supportJan Kara
In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04ext4: use io_end for multiple biosJan Kara
Change writeback path to create just one io_end structure for the extent to which we submit IO and share it among bios writing that extent. This prevents needless splitting and joining of unwritten extents when they cannot be submitted as a single bio. Bugs in ENOMEM handling found by Linux File System Verification project (linuxtesting.org) and fixed by Alexey Khoroshilov <khoroshilov@ispras.ru>. CC: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-05-31ext4: fix overflow when counting used blocks on 32-bit architecturesJan Kara
The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-31ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archsJan Kara
On 32-bit architectures with 32-bit sector_t computation of data offset in ext4_xattr_fiemap() can overflow resulting in reporting bogus data location. Fix the problem by typing block number to proper type before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-31ext4: fix overflows in SEEK_HOLE, SEEK_DATA implementationsJan Kara
ext4_lblk_t is just u32 so multiplying it by blocksize can easily overflow for files larger than 4 GB. Fix that by properly typing the block offsets before shifting. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-05-31ext4: fix data offset overflow on 32-bit archs in ext4_inline_data_fiemap()Jan Kara
On 32-bit archs when sector_t is defined as 32-bit the logic computing data offset in ext4_inline_data_fiemap(). Fix that by properly typing the shifted value. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: suppress ext4 orphan messages on mountPaul Taysom
Suppress the messages releating to processing the ext4 orphan list ("truncating inode" and "deleting unreferenced inode") unless the debug option is on, since otherwise they end up taking up space in the log that could be used for more useful information. Tested by opening several files, unlinking them, then crashing the system, rebooting the system and examining /var/log/messages. Addresses the problem described in http://crbug.com/220976 Signed-off-by: Paul Taysom <taysom@chromium.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-05-28ext4: make punch hole code path work with bigallocLukas Czerner
Currently punch hole is disabled in file systems with bigalloc feature enabled. However the recent changes in punch hole patch should make it easier to support punching holes on bigalloc enabled file systems. This commit changes partial_cluster handling in ext4_remove_blocks(), ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently partial_cluster is unsigned long long type and it makes sure that we will free the partial cluster if all extents has been released from that cluster. However it has been specifically designed only for truncate. With punch hole we can be freeing just some extents in the cluster leaving the rest untouched. So we have to make sure that we will notice cluster which still has some extents. To do this I've changed partial_cluster to be signed long long type. The only scenario where this could be a problem is when cluster_size == block size, however in that case there would not be any partial clusters so we're safe. For bigger clusters the signed type is enough. Now we use the negative value in partial_cluster to mark such cluster used, hence we know that we must not free it even if all other extents has been freed from such cluster. This scenario can be described in simple diagram: |FFF...FF..FF.UUU| ^----------^ punch hole . - free space | - cluster boundary F - freed extent U - used extent Also update respective tracepoints to use signed long long type for partial_cluster. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: update ext4_ext_remove_space trace pointLukas Czerner
Add "end" variable. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: remove unused code from ext4_remove_blocks()Lukas Czerner
The "head removal" branch in the condition is never used in any code path in ext4 since the function only caller ext4_ext_rm_leaf() will make sure that the extent is properly split before removing blocks. Note that there is a bug in this branch anyway. This commit removes the unused code completely and makes use of ext4_error() instead of printk if dubious range is provided. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: remove unused discard_partial_page_buffersLukas Czerner
The discard_partial_page_buffers is no longer used anywhere so we can simply remove it including the *_no_lock variant and EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: use ext4_zero_partial_blocks in punch_holeLukas Czerner
We're doing to get rid of ext4_discard_partial_page_buffers() since it is duplicating some code and also partially duplicating work of truncate_pagecache_range(), moreover the old implementation was much clearer. Now when the truncate_inode_pages_range() can handle truncating non page aligned regions we can use this to invalidate and zero out block aligned region of the punched out range and then use ext4_block_truncate_page() to zero the unaligned blocks on the start and end of the range. This will greatly simplify the punch hole code. Moreover after this commit we can get rid of the ext4_discard_partial_page_buffers() completely. We also introduce function ext4_prepare_punch_hole() to do come common operations before we attempt to do the actual punch hole on indirect or extent file which saves us some code duplication. This has been tested on ppc64 with 1k block size with fsx and xfstests without any problems. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: truncate_inode_pages() in orphan cleanup pathLukas Czerner
Currently we do not tell mm to zero out tail of the page before truncate in orphan_cleanup(). This is ok, because the page should not be uptodate, however this may eventually change and I might cause problems. Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara for pointing this out. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28Revert "ext4: fix fsx truncate failure"Lukas Czerner
This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9. This commit reintroduces the use of ext4_block_truncate_page() in ext4 truncate operation instead of ext4_discard_partial_page_buffers(). The statement in the commit description that the truncate operation only zero block unaligned portion of the last page is not exactly right, since truncate_pagecache_range() also zeroes and invalidate the unaligned portion of the page. Then there is no need to zero and unmap it once more and ext4_block_truncate_page() was doing the right job, although we still need to update the buffer head containing the last block, which is exactly what ext4_block_truncate_page() is doing. Moreover the problem described in the commit is fixed more properly with commit 15291164b22a357cb211b618adfef4fa82fc0de3 jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer This was tested on ppc64 machine with block size of 1024 bytes without any problems. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28ext4: Call ext4_jbd2_file_inode() after zeroing blockLukas Czerner
In data=ordered mode we should call ext4_jbd2_file_inode() so that crash after the truncate transaction has committed does not expose stall data in the tail of the block. Thanks Jan Kara for pointing that out. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-28Revert "ext4: remove no longer used functions in inode.c"Lukas Czerner
This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6. This commit reintroduces functions ext4_block_truncate_page() and ext4_block_zero_page_range() which has been previously removed in favour of ext4_discard_partial_page_buffers(). In future commits we want to reintroduce those function and remove ext4_discard_partial_page_buffers() since it is duplicating some code and also partially duplicating work of truncate_pagecache_range(), moreover the old implementation was much clearer. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2013-05-22ext4: use ->invalidatepage() length argumentLukas Czerner
->invalidatepage() aop now accepts range to invalidate so we can make use of it in all ext4 invalidatepage routines. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2013-05-22jbd2: change jbd2_journal_invalidatepage to accept lengthLukas Czerner
invalidatepage now accepts range to invalidate and there are two file system using jbd2 also implementing punch hole feature which can benefit from this. We need to implement the same thing for jbd2 layer in order to allow those file system take benefit of this functionality. This commit adds length argument to the jbd2_journal_invalidatepage() and updates all instances in ext4 and ocfs2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
2013-05-22mm: change invalidatepage prototype to accept lengthLukas Czerner
Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
2013-05-14Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Fixed regressions (two stability regressions and a performance regression) introduced during the 3.10-rc1 merge window. Also included is a bug fix relating to allocating blocks after resizing an ext3 file system when using the ext4 file system driver" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd,jbd2: fix oops in jbd2_journal_put_journal_head() ext4: revert "ext4: use io_end for multiple bios" ext4: limit group search loop for non-extent files ext4: fix fio regression
2013-05-11ext4: revert "ext4: use io_end for multiple bios"Theodore Ts'o
This reverts commit 4eec708d263f0ee10861d69251708a225b64cac7. Multiple users have reported crashes which is apparently caused by this commit. Thanks to Dmitry Monakhov for bisecting it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@suse.cz>
2013-05-08Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge more incoming from Andrew Morton: - Various fixes which were stalled or which I picked up recently - A large rotorooting of the AIO code. Allegedly to improve performance but I don't really have good performance numbers (I might have lost the email) and I can't raise Kent today. I held this out of 3.9 and we could give it another cycle if it's all too late/scary. I ended up taking only the first two thirds of the AIO rotorooting. I left the percpu parts and the batch completion for later. - Linus * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits) aio: don't include aio.h in sched.h aio: kill ki_retry aio: kill ki_key aio: give shared kioctx fields their own cachelines aio: kill struct aio_ring_info aio: kill batch allocation aio: change reqs_active to include unreaped completions aio: use cancellation list lazily aio: use flush_dcache_page() aio: make aio_read_evt() more efficient, convert to hrtimers wait: add wait_event_hrtimeout() aio: refcounting cleanup aio: make aio_put_req() lockless aio: do fget() after aio_get_req() aio: dprintk() -> pr_debug() aio: move private stuff out of aio.h aio: add kiocb_cancel() aio: kill return value of aio_complete() char: add aio_{read,write} to /dev/{null,zero} aio: remove retry-based AIO ...
2013-05-08aio: don't include aio.h in sched.hKent Overstreet
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07make blkdev_put() return voidAl Viro
same story as with the previous patches - note that return value of blkdev_close() is lost, since there's nowhere the caller (__fput()) could return it to. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-06ext4: limit group search loop for non-extent filesLachlan McIlroy
In the case where we are allocating for a non-extent file, we must limit the groups we allocate from to those below 2^32 blocks, and ext4_mb_regular_allocator() attempts to do this initially by putting a cap on ngroups for the subsequent search loop. However, the initial target group comes in from the allocation context (ac), and it may already be beyond the artificially limited ngroups. In this case, the limit if (group == ngroups) group = 0; at the top of the loop is never true, and the loop will run away. Catch this case inside the loop and reset the search to start at group 0. [sandeen@redhat.com: add commit msg & comments] Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-05-03ext4: fix fio regressionYan, Zheng
We (Linux Kernel Performance project) found a regression introduced by commit: f7fec032aa ext4: track all extent status in extent status tree The commit causes about 20% performance decrease in fio random write test. Profiler shows that rb_next() uses a lot of CPU time. The call stack is: rb_next ext4_es_find_delayed_extent ext4_map_blocks _ext4_get_block ext4_get_block_write __blockdev_direct_IO ext4_direct_IO generic_file_direct_write __generic_file_aio_write ext4_file_write aio_rw_vect_retry aio_run_iocb do_io_submit sys_io_submit system_call_fastpath io_submit td_io_getevents io_u_queued_complete thread_main main __libc_start_main The cause is that ext4_es_find_delayed_extent() doesn't have an upper bound, it keeps searching until a delayed extent is found. When there are a lots of non-delayed entries in the extent state tree, ext4_es_find_delayed_extent() may uses a lot of CPU time. Reported-by: LKP project <lkp@linux.intel.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Cc: "Theodore Ts'o" <tytso@mit.edu>
2013-05-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
2013-04-23ext4: fix type-widening bug in inode table readahead codeTheodore Ts'o
Due to a missing cast, the high 32-bits of a 64-bit block number used when calculating the readahead block for inode tables can get lost. This means we can end up fetching the wrong blocks for readahead for file systems > 16TB. Linus found this when experimenting with an enhacement to the sparse static code checker which checks for missing widening casts before binary "not" operators. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-22ext4: add check for inodes_count overflow in new resize ioctlTheodore Ts'o
Addresses-Red-Hat-Bugzilla: #913245 Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Cc: stable@vger.kernel.org
2013-04-22ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUGTheodore Ts'o
Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the change made by commit a0b30c1229: ext4: use module parameters instead of debugfs for mballoc_debug Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-04-22ext4: fix online resizing for ext3-compat file systemsTheodore Ts'o
Commit fb0a387dcdc restricts block allocations for indirect-mapped files to block groups less than s_blockfile_groups. However, the online resizing code wasn't setting s_blockfile_groups, so the newly added block groups were not available for non-extent mapped files. Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-04-21ext4: mark metadata blocks using bh flagsTheodore Ts'o
This allows metadata writebacks which are issued via block device writeback to be sent with the current write request flags. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-20ext4: mark all metadata I/O with REQ_METATheodore Ts'o
As Dave Chinner pointed out at the 2013 LSF/MM workshop, it's important that metadata I/O requests are marked as such to avoid priority inversions caused by I/O bandwidth throttling. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-19ext4: fix readdir error in case inline_data+^dir_index.Tao Ma
Zach reported a problem that if inline data is enabled, we don't tell the difference between the offset of '.' and '..'. And a getdents will fail if the user only want to get '.'. And what's worse, we may meet with duplicate dir entries as the offset for inline dir and non-inline one is quite different. This patch just try to resolve this problem if dir_index is disabled. In this case, f_pos is the real offset with the dir block, so for inline dir, we just pretend as if we are a dir block and returns the offset like a norml dir block does. Reported-by: Zach Brown <zab@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-19ext4: fix readdir error in the case of inline_data+dir_indexTao Ma
Zach reported a problem that if inline data is enabled, we don't tell the difference between the offset of '.' and '..'. And a getdents will fail if the user only want to get '.' and what's worse, if there is a conversion happens when the user calls getdents many times, he/she may get the same entry twice. In theory, a dir block would also fail if it is converted to a hashed-index based dir since f_pos will become a hash value, not the real one, but it doesn't happen. And a deep investigation shows that we uses a hash based solution even for a normal dir if the dir_index feature is enabled. So this patch just adds a new htree_inlinedir_to_tree for inline dir, and if we find that the hash index is supported, we will do like what we do for a dir block. Reported-by: Zach Brown <zab@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-19ext4: mext_insert_extents should update extent block checksumDarrick J. Wong
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-19ext4: move quota initialization out of inode allocation transactionJan Kara
Inode allocation transaction is pretty heavy (246 credits with quotas and extents before previous patch, still around 200 after it). This is mostly due to credits required for allocation of quota structures (credits there are heavily overestimated but it's difficult to make better estimates if we don't want to wire non-trivial assumptions about quota format into filesystem). So move quota initialization out of allocation transaction. That way transaction for quota structure allocation will be started only if we need to look up quota structure on disk (rare) and furthermore it will be started for each quota type separately, not for all of them at once. This reduces maximum transaction size to 34 is most cases and to 73 in the worst case. [ Modified by tytso to clean up the cleanup paths for error handling. Also use a separate call to ext4_std_error() for each failure so it is easier for someone who is debugging a problem in this function to determine which function call failed. ] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-18ext4: reserve xattr index for Rich ACL supportTheodore Ts'o
Jan Kara <jack@suse.cz> SUSE is carrying out of tree patches for Rich ACL support for ext4 as they didn't get upstream due to opposition of some VFS maintainers. Reserve xattr index for Rich ACLs so that it cannot be taken by anything else which would force users to backup and reset their Rich ACLs on files. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-12ext4: clear buffer_uninit flag when submitting IOJan Kara
Currently noone cleared buffer_uninit flag. This results in writeback needlessly marking io_end as needing extent conversion scanning extent tree for extents to convert. So clear the buffer_uninit flag once the buffer is submitted for IO and the flag is transformed into EXT4_IO_END_UNWRITTEN flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>