summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-03-19Merge tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull XFS fixes from Ben Myers: - Fix for a potential infinite loop which was introduced in commit 4d559a3bcb73 ("xfs: limit speculative prealloc near ENOSPC thresholds") - Fix for the return type of xfs_iomap_eof_prealloc_initial_size from commit a1e16c26660b ("xfs: limit speculative prealloc size on sparse files") - Fix for a failed buffer readahead causing subsequent callers to fail incorrectly * tag 'for-linus-v3.9-rc4' of git://oss.sgi.com/xfs/xfs: xfs: ensure we capture IO errors correctly xfs: fix xfs_iomap_eof_prealloc_initial_size type xfs: fix potential infinite loop in xfs_iomap_prealloc_size()
2013-03-18nfsd: fix startup order in nfsd_reply_cache_initJeff Layton
If we end up doing "goto out_nomem" in this function, we'll call nfsd_reply_cache_shutdown. That will attempt to walk the LRU list and free entries, but that list may not be initialized yet if the server is starting up for the first time. It's also possible for the shrinker to kick in before we've initialized the LRU list. Rearrange the initialization so that the LRU list_head and cache size are initialized before doing any of the allocations that might fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18nfsd: only unhash DRC entries that are in the hashtableJeff Layton
It's not safe to call hlist_del() on a newly initialized hlist_node. That leads to a NULL pointer dereference. Only do that if the entry is hashed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-18xfs: ensure we capture IO errors correctlyDave Chinner
Failed buffer readahead can leave the buffer in the cache marked with an error. Most callers that then issue a subsequent read on the buffer do not zero the b_error field out, and so we may incorectly detect an error during IO completion due to the stale error value left on the buffer. Avoid this problem by zeroing the error before IO submission. This ensures that the only IO errors that are detected those captured from are those captured from bio submission or completion. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit c163f9a1760229a95d04e37b332de7d5c1c225cd)
2013-03-18xfs: fix xfs_iomap_eof_prealloc_initial_size typeMark Tinguely
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to xfs_fsblock_t to reflect the fact that the return value may be an unsigned 64 bits if XFS_BIG_BLKNOS is defined. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit e8108cedb1c5d1dc359690d18ca997e97a0061d2)
2013-03-18xfs: fix potential infinite loop in xfs_iomap_prealloc_size()Brian Foster
If freesp == 0, we could end up in an infinite loop while squashing the preallocation. Break the loop when we've killed the prealloc entirely. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit e78c420bfc2608bb5f9a0b9165b1071c1e31166a)
2013-03-18ext4: fix memory leakage in mext_check_coverageDmitry Monakhov
Regression was introduced by following commit 8c854473 TESTCASE (git://oss.sgi.com/xfs/cmds/xfstests.git): #while true;do ./check 301 || break ;done Also fix potential memory leakage in get_ext_path() once ext4_ext_find_extent() have failed. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Eric's rcu barrier patch fixes a long standing problem with our unmount code hanging on to devices in workqueue helpers. Liu Bo nailed down a difficult assertion for in-memory extent mappings." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix warning of free_extent_map Btrfs: fix warning when creating snapshots Btrfs: return as soon as possible when edquot happens Btrfs: return EIO if we have extent tree corruption btrfs: use rcu_barrier() to wait for bdev puts at unmount Btrfs: remove btrfs_try_spin_lock Btrfs: get better concurrency for snapshot-aware defrag work
2013-03-16Btrfs: fix warning of free_extent_mapLiu Bo
Users report that an extent map's list is still linked when it's actually going to be freed from cache. The story is that a) when we're going to drop an extent map and may split this large one into smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means that it's on the list to be logged, then the smaller ems split from it will also be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected. b) we'll keep ems from unlinking the list and freeing when they are flagged with EXTENT_FLAG_LOGGING, because the log code holds one reference. The end result is the warning, but the truth is that we set the flag EXTENT_FLAG_LOGGING only during fsync. So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, ext3, reiserfs, quota fixes from Jan Kara: "A fix for regression in ext2, and a format string issue in ext3. The rest isn't too serious." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix BUG_ON in evict() on inode deletion reiserfs: Use kstrdup instead of kmalloc/strcpy ext3: Fix format string issues quota: add missing use of dq_data_lock in __dquot_initialize
2013-03-14Btrfs: fix warning when creating snapshotsLiu Bo
Creating snapshot passes extent_root to commit its transaction, but it can lead to the warning of checking root for quota in the __btrfs_end_transaction() when someone else is committing the current transaction. Since we've recorded the needed root in trans_handle, just use it to get rid of the warning. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: return as soon as possible when edquot happensWang Shilong
If one of qgroup fails to reserve firstly, we should return immediately, it is unnecessary to continue check. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: return EIO if we have extent tree corruptionJosef Bacik
The callers of lookup_inline_extent_info all handle getting an error back properly, so return an error if we have corruption instead of being a jerk and panicing. Still WARN_ON() since this is kind of crucial and I've been seeing it a bit too much recently for my taste, I think we're doing something wrong somewhere. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14btrfs: use rcu_barrier() to wait for bdev puts at unmountEric Sandeen
Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: remove btrfs_try_spin_lockLiu Bo
Remove a useless function declaration Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: get better concurrency for snapshot-aware defrag workLiu Bo
Using spinning case instead of blocking will result in better concurrency overall. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14UBIFS: make space fixup work in the remount caseArtem Bityutskiy
The UBIFS space fixup is a useful feature which allows to fixup the "broken" flash space at the time of the first mount. The "broken" space is usually the result of using a "dumb" industrial flasher which is not able to skip empty NAND pages and just writes all 0xFFs to the empty space, which has grave side-effects for UBIFS when UBIFS trise to write useful data to those empty pages. The fix-up feature works roughly like this: 1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image (see -F option) 2. when the file-system is mounted for the first time, UBIFS notices the fixup flag and re-writes the entire media atomically, which may take really a lot of time. 3. UBIFS clears the fixup flag in the superblock. This works fine when the file system is mounted R/W for the very first time. But it did not really work in the case when we first mount the file-system R/O, and then re-mount R/W. The reason was that we started the fixup procedure too late, which we cannot really do because we have to fixup the space before it starts being used. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reported-by: Mark Jackson <mpfj-list@mimc.co.uk> Cc: stable@vger.kernel.org # 3.0+
2013-03-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This tree includes a partial revert for "fs: Limit sys_mount to only request filesystem modules." When I added the new style module aliases to the filesystems I deleted the old ones. A bad move. It turns out that distributions like Arch linux use module aliases when constructing ramdisks. Which meant ultimately that an ext3 filesystem mounted with ext4 would not result in the ext4 module being put into the ramdisk. The other change in this tree adds a handful of filesystem module alias I simply failed to add the first time. Which inconvinienced a few folks using cifs. I don't want to inconvinience folks any longer than I have to so here are these trivial fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fs: Readd the fs module aliases. fs: Limit sys_mount to only request filesystem modules. (Part 3)
2013-03-13nfsd: convert to idr_alloc()Tejun Heo
idr_get_new*() and friends are about to be deprecated. Convert to the new idr_alloc() interface. Only compile-tested. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: J. Bruce Fields <bfields@redhat.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13nfsd: remove unused get_new_stid()Tejun Heo
get_new_stid() is no longer used since commit 3abdb607125 ("nfsd4: simplify idr allocation"). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13cifs: delay super block destruction until all cifsFileInfo objects are goneMateusz Guzik
cifsFileInfo objects hold references to dentries and it is possible that these will still be around in workqueues when VFS decides to kill super block during unmount. This results in panics like this one: BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:943! [..] Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0) [..] Call Trace: [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0 [<ffffffff8119f946>] kill_anon_super+0x16/0x30 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs] [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80 [<ffffffff811a085e>] deactivate_super+0x4e/0x70 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b Fix this by making each cifsFileInfo object hold a reference to cifs super block, which implicitly keeps VFS super block around as well. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: <stable@vger.kernel.org> Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSYSachin Prabhu
NT_SHARING_VIOLATION errors are mapped to ETXTBSY which is unexpected for operations such as unlink where we can hit these errors. The patch maps the error NT_SHARING_VIOLATION to EBUSY instead. The patch also replaces all instances of ETXTBSY in cifs_rename_pending_delete() with EBUSY. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13ext2: Fix BUG_ON in evict() on inode deletionJan Kara
Commit 8e3dffc6 introduced a regression where deleting inode with large extended attributes leads to triggering BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)) in fs/inode.c:evict(). That happens because freeing of xattr block dirtied the inode and it happened after clear_inode() has been called. Fix the issue by moving removal of xattr block into ext2_evict_inode() before clear_inode() call close to a place where data blocks are truncated. That is also more logical place and removes surprising requirement that ext2_free_blocks() mustn't dirty the inode. Reported-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-13fs: Readd the fs module aliases.Eric W. Biederman
I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and ↵Mathieu Desnoyers
security keys Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to compat_process_vm_rw() shows that the compatibility code requires an explicit "access_ok()" check before calling compat_rw_copy_check_uvector(). The same difference seems to appear when we compare fs/read_write.c:do_readv_writev() to fs/compat.c:compat_do_readv_writev(). This subtle difference between the compat and non-compat requirements should probably be debated, as it seems to be error-prone. In fact, there are two others sites that use this function in the Linux kernel, and they both seem to get it wrong: Now shifting our attention to fs/aio.c, we see that aio_setup_iocb() also ends up calling compat_rw_copy_check_uvector() through aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to be missing. Same situation for security/keys/compat.c:compat_keyctl_instantiate_key_iov(). I propose that we add the access_ok() check directly into compat_rw_copy_check_uvector(), so callers don't have to worry about it, and it therefore makes the compat call code similar to its non-compat counterpart. Place the access_ok() check in the same location where copy_from_user() can trigger a -EFAULT error in the non-compat code, so the ABI behaviors are alike on both compat and non-compat. While we are here, fix compat_do_readv_writev() so it checks for compat_rw_copy_check_uvector() negative return values. And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error handling. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12ext4: use s_extent_max_zeroout_kb value as number of kbLukas Czerner
Currently when converting extent to initialized, we have to decide whether to zeroout part/all of the uninitialized extent in order to avoid extent tree growing rapidly. The decision is made by comparing the size of the extent with the configurable value s_extent_max_zeroout_kb which is in kibibytes units. However when converting it to number of blocks we currently use it as it was in bytes. This is obviously bug and it will result in ext4 _never_ zeroout extents, but rather always split and convert parts to initialized while leaving the rest uninitialized in default setting. Fix this by using s_extent_max_zeroout_kb as kibibytes. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-12vfs: fix pipe counter breakageAl Viro
If you open a pipe for neither read nor write, the pipe code will not add any usage counters to the pipe, causing the 'struct pipe_inode_info" to be potentially released early. That doesn't normally matter, since you cannot actually use the pipe, but the pipe release code - particularly fasync handling - still expects the actual pipe infrastructure to all be there. And rather than adding NULL pointer checks, let's just disallow this case, the same way we already do for the named pipe ("fifo") case. This is ancient going back to pre-2.4 days, and until trinity, nobody naver noticed. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12ext4: use atomic64_t for the per-flexbg free_clusters countTheodore Ts'o
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e4208f7e, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
2013-03-11reiserfs: Use kstrdup instead of kmalloc/strcpyIonut-Gabriel Radu
Signed-off-by: Ionut-Gabriel Radu <ihonius@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11ext3: Fix format string issuesLars-Peter Clausen
ext3_msg() takes the printk prefix as the second parameter and the format string as the third parameter. Two callers of ext3_msg omit the prefix and pass the format string as the second parameter and the first parameter to the format string as the third parameter. In both cases this string comes from an arbitrary source. Which means the string may contain format string characters, which will lead to undefined and potentially harmful behavior. The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages in ext3") and is fixed by this patch. CC: stable@vger.kernel.org Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11quota: add missing use of dq_data_lock in __dquot_initializeJeff Mahoney
The bulk of __dquot_initialize runs under the dqptr_sem which protects the inode->i_dquot pointers. It doesn't protect the dereferenced contents, though. Those are protected by the dq_data_lock, which is missing around the dquot_resv_space call. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11jbd2: fix use after free in jbd2_journal_dirty_metadata()Jan Kara
jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-11fs: Limit sys_mount to only request filesystem modules. (Part 3)Eric W. Biederman
Somehow I failed to add the MODULE_ALIAS_FS for cifs, hostfs, hpfs, squashfs, and udf despite what I thought were my careful checks :( Add them now. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-11hostfs: fix a not needed double checkMarco Stornelli
With the commit 3be2be0a32c18b0fd6d623cda63174a332ca0de1 we removed vmtruncate, but actaully there is no need to call inode_newsize_ok() because the checks are already done in inode_change_ok() at the begin of the function. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-03-11ext4: reserve metadata block for every delayed writeLukas Czerner
Currently we only reserve space (data+metadata) in delayed allocation if we're allocating from new cluster (which is always in non-bigalloc file system) which is ok for data blocks, because we reserve the whole cluster. However we have to reserve metadata for every delayed block we're going to write because every block could potentially require metedata block when we need to grow the extent tree. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-03-11ext4: update reserved space after the 'correction'Lukas Czerner
Currently in ext4_ext_map_blocks() in delayed allocation writeback we would update the reservation and after that check whether we claimed cluster outside of the range of the allocation and if so, we'll give the block back to the reservation pool. However this also means that if the number of reserved data block dropped to zero before the correction, we would release all the metadata reservation as well, however we might still need it because the we're not done with the delayed allocation and there might be more blocks to come. This will result in error messages such as: EXT4-fs warning (device sdb): ext4_da_update_reserve_space:361: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 1 data blocks) This will only happen on bigalloc file system and it can be easily reproduced using fiemap-tester from xfstests like this: ./src/fiemap-tester -m DHDHDHDHD -S -p0 /mnt/test/file Or using xfstests such as 225. Fix this by doing the correction first and updating the reservation after that so that we do not accidentally decrease i_reserved_data_blocks to zero. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: do not use yield()Lukas Czerner
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: remove unused variable in ext4_free_blocks()Lukas Czerner
Remove unused variable 'freed' in ext4_free_blocks(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: fix WARN_ON from ext4_releasepage()Jan Kara
ext4_releasepage() warns when it is passed a page with PageChecked set. However this can correctly happen when invalidate_inode_pages2_range() invalidates pages - and we should fail the release in that case. Since the page was dirty anyway, it won't be discarded and no harm has happened but it's good to be safe. Also remove bogus page_has_buffers() check - we are guaranteed page has buffers in this function. Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Tested-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11ext4: fix the wrong number of the allocated blocks in ext4_split_extent()Zheng Liu
This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2013-03-11ext4: update extent status tree after an extent is zeroed outZheng Liu
When we try to split an extent, this extent could be zeroed out and mark as initialized. But we don't know this in ext4_map_blocks because it only returns a length of allocated extent. Meanwhile we will mark this extent as uninitialized because we only check m_flags. This commit update extent status tree when we try to split an unwritten extent. We don't need to worry about the status of this extent because we always mark it as initialized. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-11ext4: fix wrong m_len value after unwritten extent conversionZheng Liu
The ext4_ext_handle_uninitialized_extents() function was assuming the return value of ext4_ext_map_blocks() is equal to map->m_len. This incorrect assumption was harmless until we started use status tree as a extent cache because we need to update status tree according to 'm_len' value. Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent conversion. It shouldn't cause a bug because we update status tree according to checking EXT4_MAP_UNWRITTEN flag. But it should be fixed. After applied this commit, the following error message from self-testing infrastructure disappears. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-11ext4: add self-testing infrastructure to do a sanity checkDmitry Monakhov
This commit adds a self-testing infrastructure like extent tree does to do a sanity check for extent status tree. After status tree is as a extent cache, we'd better to make sure that it caches right result. After applied this commit, we will get a lot of messages when we run xfstests as below. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... kernel: ES cache assertation failed for inode: 230 es_cached ex [974/2/4781/20] != found ex [974/1/4781/1000] ... kernel: ES insert assertation failed for inode: 635 ex_status [0/45/21388/w] != es_status [44/1/21432/u] ... Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: avoid a potential overflow in ext4_es_can_be_merged()Zheng Liu
Check the length of an extent to avoid a potential overflow in ext4_es_can_be_merged(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This is three simple fixes against 3.9-rc1. I have tested each of these fixes and verified they work correctly. The userns oops in key_change_session_keyring and the BUG_ON triggered by proc_ns_follow_link were found by Dave Jones. I am including the enhancement for mount to only trigger requests of filesystem modules here instead of delaying this for the 3.10 merge window because it is both trivial and the kind of change that tends to bit-rot if left untouched for two months." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use nd_jump_link in proc_ns_follow_link fs: Limit sys_mount to only request filesystem modules (Part 2). fs: Limit sys_mount to only request filesystem modules. userns: Stop oopsing in key_change_session_keyring
2013-03-09proc: Use nd_jump_link in proc_ns_follow_linkEric W. Biederman
Update proc_ns_follow_link to use nd_jump_link instead of just manually updating nd.path.dentry. This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave Jones and reproduced trivially with mkdir /proc/self/ns/uts/a. Sigh it looks like the VFS change to require use of nd_jump_link happend while proc_ns_follow_link was baking and since the common case of proc_ns_follow_link continued to work without problems the need for making this change was overlooked. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are scattered fixes and one performance improvement. The biggest functional change is in how we throttle metadata changes. The new code bumps our average file creation rate up by ~13% in fs_mark, and lowers CPU usage. Stefan bisected out a regression in our allocation code that made balance loop on extents larger than 256MB." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: improve the delayed inode throttling Btrfs: fix a mismerge in btrfs_balance() Btrfs: enforce min_bytes parameter during extent allocation Btrfs: allow running defrag in parallel to administrative tasks Btrfs: avoid deadlock on transaction waiting list Btrfs: do not BUG_ON on aborted situation Btrfs: do not BUG_ON in prepare_to_reloc Btrfs: free all recorded tree blocks on error Btrfs: build up error handling for merge_reloc_roots Btrfs: check for NULL pointer in updating reloc roots Btrfs: fix unclosed transaction handler when the async transaction commitment fails Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails Btrfs: use set_nlink if our i_nlink is 0
2013-03-08Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "A small set of cifs fixes which includes one for a recent regression in the write path (pointed out by Anton), some fixes for rename problems and as promised for 3.9 removing the obsolete sockopt mount option (and the accompanying deprecation warning)." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix missing of oplock_read value in smb30_values structure cifs: don't try to unlock pagecache page after releasing it cifs: remove the sockopt= mount option cifs: Check server capability before attempting silly rename cifs: Fix bug when checking error condition in cifs_rename_pending_delete()
2013-03-08vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlinkLinus Torvalds
It's "normal" - it can happen if the file descriptor you followed was opened with O_NOFOLLOW. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-07Merge tag 'ecryptfs-3.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Minor code cleanups and new Kconfig option to disable /dev/ecryptfs The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to userspace communication channel to be compiled out. This may be the first step in it being eventually removed." Hmm. I'm not sure whether these should be called "fixes", and it probably should have gone in the merge window. But I'll let it slide. * tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: allow userspace messaging to be disabled eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid() ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check eCryptfs: remove unneeded checks in virt_to_scatterlist() eCryptfs: Fix -Wmissing-prototypes warnings eCryptfs: Fix -Wunused-but-set-variable warnings eCryptfs: initialize payload_len in keystore.c