summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2011-10-31ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()Yongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwrittenTao Ma
EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten should be done simultaneously since ext4_end_io_nolock always clear the flag and decrease the counter in the same time. We have found some bugs that the flag is set while leaving i_aiodio_unwritten unchanged(commit 32c80b32c053d). So this patch just tries to create a helper function to wrap them to avoid any future bug. The idea is inspired by Eric. Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-31ext4: optimize locking for end_io extent conversionTheodore Ts'o
Now that we are doing the locking correctly, we need to grab the i_completed_io_lock() twice per end_io. We can clean this up by removing the structure from the i_complted_io_list, and use this as the locking mechanism to prevent ext4_flush_completed_IO() racing against ext4_end_io_work(), instead of clearing the EXT4_IO_END_UNWRITTEN in io->flag. In addition, if the ext4_convert_unwritten_extents() returns an error, we no longer keep the end_io structure on the linked list. This doesn't help, because it tends to lock up the file system and wedges the system. That's one way to call attention to the problem, but it doesn't help the overall robustness of the system. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-30ext4: remove unnecessary call to waitqueue_active()Theodore Ts'o
The usage of waitqueue_active() is not necessary, and introduces (I believe) a hard-to-hit race. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-30ext4: Use correct locking for ext4_end_io_nolock()Tao Ma
We must hold i_completed_io_lock when manipulating anything on the i_completed_io_list linked list. This includes io->lock, which we were checking in ext4_end_io_nolock(). So move this check to ext4_end_io_work(). This also has the bonus of avoiding extra work if it is already done without needing to take the mutex. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29ext4: fix race in xattr block allocation pathEric Sandeen
Ceph users reported that when using Ceph on ext4, the filesystem would often become corrupted, containing inodes with incorrect i_blocks counters. I managed to reproduce this with a very hacked-up "streamtest" binary from the Ceph tree. Ceph is doing a lot of xattr writes, to out-of-inode blocks. There is also another thread which does sync_file_range and close, of the same files. The problem appears to happen due to this race: sync/flush thread xattr-set thread ----------------- ---------------- do_writepages ext4_xattr_set ext4_da_writepages ext4_xattr_set_handle mpage_da_map_blocks ext4_xattr_block_set set DELALLOC_RESERVE ext4_new_meta_blocks ext4_mb_new_blocks if (!i_delalloc_reserved_flag) vfs_dq_alloc_block ext4_get_blocks down_write(i_data_sem) set i_delalloc_reserved_flag ... up_write(i_data_sem) if (i_delalloc_reserved_flag) vfs_dq_alloc_block_nofail In other words, the sync/flush thread pops in and sets i_delalloc_reserved_flag on the inode, which makes the xattr thread think that it's in a delalloc path in ext4_new_meta_blocks(), and add the block for a second time, after already having added it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks The real problem is that we shouldn't be using the DELALLOC_RESERVED state flag, and instead we should be passing EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of using an inode state flag. We'll fix this for now with using i_data_sem to prevent this race, but this is really not the right way to fix things. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-10-29ext4: trace punch_hole correctly in ext4_ext_map_blocksYongqiang Yang
When ext4_ext_map_blocks() is called by punch_hole, trace should trace blocks punched out. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29ext4: clean up AGGRESSIVE_TEST codeYongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29ext4: move variables to their scopeYongqiang Yang
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29ext4: fix quota accounting during migrationDmitry Monakhov
The tmp_inode should have same uid/gid as the original inode. Otherwise new metadata blocks will be accounted to wrong quota-id, which will result in a quota leak after the inode migration is completed. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-29ext4: migrate cleanupDmitry Monakhov
This patch cleanup code a bit, actual logic not changed - Move current block pointer to migrate_structure, let's all walk info will be in one structure. - Get rid of usless null ind-block ptr checks, caller already does that check. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27ext4: optimize memmmove lengths in extent/index insertionsEric Gouriou
ext4_ext_insert_extent() (respectively ext4_ext_insert_index()) was using EXT_MAX_EXTENT() (resp. EXT_MAX_INDEX()) to determine how many entries needed to be moved beyond the insertion point. In practice this means that (320 - I) * 24 bytes were memmove()'d when I is the insertion point, rather than (#entries - I) * 24 bytes. This patch uses EXT_LAST_EXTENT() (resp. EXT_LAST_INDEX()) instead to only move existing entries. The code flow is also simplified slightly to highlight similarities and reduce code duplication in the insertion logic. This patch reduces system CPU consumption by over 25% on a 4kB synchronous append DIO write workload when used with the pre-2.6.39 x86_64 memmove() implementation. With the much faster 2.6.39 memmove() implementation we still see a decrease in system CPU usage between 2% and 7%. Note that the ext_debug() output changes with this patch, splitting some log information between entries. Users of the ext_debug() output should note that the "move %d" units changed from reporting the number of bytes moved to reporting the number of entries moved. Signed-off-by: Eric Gouriou <egouriou@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27ext4: optimize ext4_ext_convert_to_initialized()Eric Gouriou
This patch introduces a fast path in ext4_ext_convert_to_initialized() for the case when the conversion can be performed by transferring the newly initialized blocks from the uninitialized extent into an adjacent initialized extent. Doing so removes the expensive invocations of memmove() which occur during extent insertion and the subsequent merge. In practice this should be the common case for clients performing append writes into files pre-allocated via fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via direct IO and when using a suboptimal implementation of memmove() (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU consumption by 32%. Two new trace points are added to ext4_ext_convert_to_initialized() to offer visibility into its operations. No exit trace point has been added due to the multiplicity of return points. This can be revisited once the upstream cleanup is backported. Signed-off-by: Eric Gouriou <egouriou@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-27jbd2: fix build when CONFIG_BUG is not enabledRandy Dunlap
Fix build error when CONFIG_BUG is not enabled: fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN' by changing __WARN() to WARN_ON(), as suggested by Arnaud Lacombe <lacombar@gmail.com>. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arnaud Lacombe <lacombar@gmail.com>
2011-10-26ext4: don't check io->flag when setting EXT4_STATE_DIO_UNWRITTEN inode stateTao Ma
When we want to convert the unitialized extent in direct write, we can either do it in ext4_end_io_nolock(AIO case) or in ext4_ext_direct_IO(non AIO case) and EXT4_I(inode)->cur_aio_dio is a guard for ext4_ext_map_blocks to find the right case. In e9e3bcecf, we mistakenly change it by: - if (io) + if (io && !(io->flag & EXT4_IO_END_UNWRITTEN)) { io->flag = EXT4_IO_END_UNWRITTEN; - else + atomic_inc(&EXT4_I(inode)->i_aiodio_unwritten); + } else ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN); So now if we map 2 blocks, and the first one set the EXT_IO_END_UNWRITTEN, the 2nd mapping will set inode state because of the check for the flag. This is wrong. Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: fix a wrong comment in __mb_check_buddy()Robin Dong
The comment says the bit should be 0, but the after code assert the bit to be 1. This makes people confused, so fix it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: remove unused variable in mb_find_extent()Robin Dong
The variable 'ord' in function mb_find_extent() is redundant, so remove it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: remove unused variable in ext4_mb_generate_from_pa()Robin Dong
The variable 'count' in function ext4_mb_generate_from_pa() looks useless, so remove it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: use stream-alloc when mb_group_prealloc set to zeroRobin Dong
The kernel will crash on ext4_mb_mark_diskspace_used: BUG_ON(ac->ac_b_ex.fe_len <= 0); after we set /sys/fs/ext4/sda/mb_group_prealloc to zero and create new files in an ext4 filesystem. The reason is: ac_b_ex.fe_len also set to zero(mb_group_prealloc) in ext4_mb_normalize_group_request because the ac_flags contains EXT4_MB_HINT_GROUP_ALLOC. I think when someone set mb_group_prealloc to zero, it means DO NOT USE GROUP PREALLOCATION, so we should set alloc-strategy to STREAM in this case. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: let ext4_page_mkwrite stop started handle in failureYongqiang Yang
The started journal handle should be stopped in failure case. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2011-10-26ext4: handle NULL p_ext in ext4_ext_next_allocated_block()Curt Wohlgemuth
In ext4_ext_next_allocated_block(), the path[depth] might have a p_ext that is NULL -- see ext4_ext_binsearch(). In such a case, dereferencing it will crash the machine. This patch checks for p_ext == NULL in ext4_ext_next_allocated_block() before dereferencinging it. Tested using a hand-crafted an inode with eh_entries == 0 in an extent block, verified that running FIEMAP on it crashes without this patch, works fine with it. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: error handling fix in ext4_ext_convert_to_initialized()Dan Carpenter
When allocated is unsigned it breaks the error handling at the end of the function when we call: allocated = ext4_split_extent(...); if (allocated < 0) err = allocated; I've made it a signed int instead of unsigned. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: use ext4_reserve_inode_write in ext4_xattr_set_handleEric Sandeen
ext4_mark_iloc_dirty() says: * The caller must have previously called ext4_reserve_inode_write(). * Give this, we know that the caller already has write access to iloc->bh. ext4_xattr_set_handle, however, just open-codes it. May as well use the helper function for consistency. No bug here, just tidiness. (Note: on cleanup path, ext4_reserve_inode_write sets the bh to NULL if it returns an error, and brelse() of a null bh is handled gracefully). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: avoid setting directory i_nlink to zeroAndreas Dilger
If a directory with more than EXT4_LINK_MAX subdirectories, the nlink count is set to 1. Subsequently, if any subdirectories are deleted, ext4_dec_count() decrements the i_nlink count, which may go to 0 temporarily before being incremented back to 1. While this is done under i_mutex, which prevents races for directory and inode operations that check i_nlink, the temporary i_nlink == 0 case is exposed to userspace via stat() and similar calls that do not hold i_mutex. Instead, change the code to not decrement i_nlink count for any directories that do not already have i_nlink larger than 2. Reported-by: Cliff White <cliffw@whamcloud.com> Reviewed-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25ext4: prevent stack overrun in ext4_file_openDarrick J. Wong
In ext4_file_open, the filesystem records the mountpoint of the first file that is opened after mounting the filesystem. It does this by allocating a 64-byte stack buffer, calling d_path() to grab the mount point through which this file was accessed, and then memcpy()ing 64 bytes into the superblock's s_last_mounted field, starting from the return value of d_path(), which is stored as "cp". However, if cp > buf (which it frequently is since path components are prepended starting at the end of buf) then we can end up copying stack data into the superblock. Writing stack variables into the superblock doesn't sound like a great idea, so use strlcpy instead. Andi Kleen suggested using strlcpy instead of strncpy. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25ext4: update EOFBLOCKS flag on fallocate properlyDmitry Monakhov
EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE Currently it happens only if new extent was allocated. TESTCASE: fallocate test_file -n -l4096 fallocate test_file -l4096 Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And fsck will complain about that. Also remove ping pong in ext4_fallocate() in case of new extents, where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later ext4_falloc_update_inode() restore it again. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-25ext4: remove messy logic from ext4_ext_rm_leafDmitry Monakhov
- Both callers(truncate and punch_hole) already aligned left end point so we no longer need split logic here. - Remove dead duplicated code. - Call ext4_ext_dirty only after we have updated eh_entries, otherwise we'll loose entries update. Regression caused by d583fb87a3ff0 266'th testcase in xfstests (http://patchwork.ozlabs.org/patch/120872) Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-22ext4: cleanup ext4_ext_grow_indepth codeDmitry Monakhov
Currently code make an impression what grow procedure is very complicated and some mythical paths, blocks are involved. But in fact grow in depth it relatively simple procedure: 1) Just create new meta block and copy root data to that block. 2) Convert root from extent to index if old depth == 0 3) Update root block pointer This patch does: - Reorganize code to make it more self explanatory - Do not pass path parameter to new_meta_block() in order to provoke allocation from inode's group because top-level block should site closer to it's inode, but not to leaf data block. [ This happens anyway, due to logic in mballoc; we should drop the path parameter from new_meta_block() entirely. -- tytso ] Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-21ext4: Allow quota file use root reservationDmitry Monakhov
Quota file is fs's metadata, so it is reasonable to permit use root resevation if necessary. This patch fix 265'th xfstest failure Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-20ext4: fix the deadlock in mpage_da_map_and_submit()Kazuya Mio
If ext4_jbd2_file_inode() in mpage_da_map_and_submit() fails due to journal abort, this function returns to caller without unlocking the page. It leads to the deadlock, and the patch fixes this issue by calling mpage_da_submit_io(). Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-20ext4: fix deadlock in ext4_ordered_write_end()Akira Fujita
If ext4_jbd2_file_inode() in ext4_ordered_write_end() fails for some reasons, this function returns to caller without unlocking the page. It leads to the deadlock, and the patch fixes this issue. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: quiet sparse noise about plain integer as NULL pointerH Hartley Sweeten
The third parameter to ext4_free_blocks is a struct buffer_head *. This parameter should be NULL not 0. This quiets the sparse noise: warning: Using plain integer as NULL pointer Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: add __user decoration to calls of copy_{from,to}_user()H Hartley Sweeten
This quiets the sparse noise: warning: incorrect type in argument 2 (different address spaces) expected void const [noderef] <asn:1>*from got struct fstrim_range *<noident> warning: incorrect type in argument 1 (different address spaces) expected void [noderef] <asn:1>*to got struct fstrim_range *<noident> Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: functions should not be declared externH Hartley Sweeten
The function declarations in ext4.h are already marked extern, so it's not necessary to do so in the .c files. This quiets the sparse noise: warning: function 'ext4_flush_completed_IO' with external linkage has definition warning: function 'ext4_init_inode_table' with external linkage has definition Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: add block plug for .writepagesShaohua Li
Add block plug for ext4 .writepages. Though ext4 .writepages already handles request merge very well, block plug is still helpful to reduce block lock contention. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: Fix comparison endianness problem in MMP initializationDarrick J. Wong
As part of startup, the MMP initialization code does this: mmp->mmp_seq = seq = cpu_to_le32(mmp_new_seq()); Next, mmp->mmp_seq is written out to disk, a delay happens, and then the MMP block is read back in and the sequence value is tested: if (seq != le32_to_cpu(mmp->mmp_seq)) { /* fail the mount */ On a LE system such as x86, the *le32* functions do nothing and this works. Unfortunately, on a BE system such as ppc64, this comparison becomes: if (cpu_to_le32(new_seq) != le32_to_cpu(cpu_to_le32(new_seq)) { /* fail the mount */ Except for a few palindromic sequence numbers, this test always causes the mount to fail, which makes MMP filesystems generally unmountable on ppc64. The attached patch fixes this situation. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-18ext4: MMP: fix error message rate-limiting logic in kmmpdNikitas Angelinas
Current logic would print an error message only once, and then 'failed_writes' would stay at 1. Rework the loop to increment 'failed_writes' and print the error message every s_mmp_update_interval * 60 seconds, as intended according to the comment. Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com> Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Andreas Dilger <adilger@dilger.ca>
2011-10-18ext4: MMP: kmmpd should use nodename from init_uts_ns.name, not sysnameNikitas Angelinas
sysname holds "Linux" by default, i.e. what appears when doing a "uname -s"; nodename should be used to print the machine's hostname, i.e. what is returned when doing a "uname -n" or "hostname", and what gethostname(2)/sethostname(2) manipulate, in order to notify the administrator of the node which is contending to mount the filesystem. Acked-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com> Signed-off-by: Andrew Perepechko <andrew_perepechko@xyratex.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-17ext4: avoid stamping on other memories in ext4_ext_insert_index()Tao Ma
Add a sanity check to make sure ix hasn't gone beyond the valid bounds of the extent block. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08ext4: fix ext4 so it works without CONFIG_PROC_FSFabrice Jouhaud
This fixes a bug which was introduced in dd68314ccf3fb. The problem came from the test of the return value of proc_mkdir which is always false without procfs, and this would initialization of ext4. Signed-off-by: Fabrice Jouhaud <yargil@free.fr> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08ext4: use le32_to_cpu for ext4_extent_idx.ei_block in ext4_ext_search_left()Tao Ma
ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in ext4_ext_search_left(). Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08ext4: remove the obsolete/broken EXT4_IOC_WAIT_FOR_READONLY ioctlTao Ma
There are no users of the EXT4_IOC_WAIT_FOR_READONLY ioctl, and it is also broken. No one sets the set_ro_timer, no one wakes up us and our state is set to TASK_INTERRUPTIBLE not RUNNING. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08ext4: fix the comment describing ext4_ext_search_right()Tao Ma
The comment describing what ext4_ext_search_right() does is incorrect. We return 0 in *phys when *logical is the 'largest' allocated block, not smallest. Fix a few other typos while we're at it. Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-10-08ext4: remove deprecated oldallocLukas Czerner
For a long time now orlov is the default block allocator in the ext4. It performs better than the old one and no one seems to claim otherwise so we can safely drop it and make oldalloc and orlov mount option deprecated. This is a part of the effort to reduce number of ext4 options hence the test matrix. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-06ext4: Free resources in some error path in ext4_fill_superTao Ma
Some of the error path in ext4_fill_super don't release the resouces properly. So this patch just try to release them in the right way. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-06ext4: Free resources in ext4_mb_init()'s error pathsTao Ma
In commit 79a77c5ac, we move ext4_mb_init_backend after the allocation of s_locality_group to avoid memory leak in error path, but there are still some other error paths in ext4_mb_init that need to do the same work. So this patch adds all the error patch for ext4_mb_init. And all the pointers are reset to NULL in case the caller may double free them. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: attempt to fix race in bigalloc code pathAditya Kali
Currently, there exists a race between delayed allocated writes and the writeback when bigalloc feature is in use. The race was because we wanted to determine what blocks in a cluster are under delayed allocation and we were using buffer_delayed(bh) check for it. But, the writeback codepath clears this bit without any synchronization which resulted in a race and an ext4 warning similar to: EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0 reserved data blocks The race existed in two places. (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from writeback code path. (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where buffer_delayed(bh) is set. To fix (1), this patch introduces a new buffer_head state bit - BH_Da_Mapped. This bit is set under the protection of EXT4_I(inode)->i_data_sem when we have actually mapped the delayed allocated blocks during the writeout time. We can now reliably check for this bit inside ext4_find_delalloc_range() to determine whether the reservation for the blocks have already been claimed or not. To fix (2), it was necessary to set buffer_delay(bh) under the protection of i_data_sem. So, I extracted the very beginning of ext4_map_blocks into a new function - ext4_da_map_blocks() - and performed the required setting of bh_delay bit and the quota reservation under the protection of i_data_sem. These two fixes makes the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent, thus removing the race. Tested: I was able to reproduce the problem by running 'dd' and 'fsync' in parallel. Also, xfstests sometimes used to reproduce this race. After the fix both my test and xfstests were successful and no race (warning message) was observed. Google-Bug-Id: 4997027 Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: add some tracepoints in ext4/extents.cAditya Kali
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in ext4/inode.c. Tested: Built and ran the kernel and verified that these tracepoints work. Also ran xfstests. Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters()Theodore Ts'o
Rename the function so it is more clear what is going on. Also rename the various variables so it's clearer what's happening. Also fix a missing blocks to cluster conversion when reading the number of reserved blocks for root. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-09ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters()Theodore Ts'o
This function really claims a number of free clusters, not blocks, so rename it so it's clearer what's going on. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>