summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-03-16Btrfs: fix warning of free_extent_mapLiu Bo
Users report that an extent map's list is still linked when it's actually going to be freed from cache. The story is that a) when we're going to drop an extent map and may split this large one into smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means that it's on the list to be logged, then the smaller ems split from it will also be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected. b) we'll keep ems from unlinking the list and freeing when they are flagged with EXTENT_FLAG_LOGGING, because the log code holds one reference. The end result is the warning, but the truth is that we set the flag EXTENT_FLAG_LOGGING only during fsync. So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, ext3, reiserfs, quota fixes from Jan Kara: "A fix for regression in ext2, and a format string issue in ext3. The rest isn't too serious." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix BUG_ON in evict() on inode deletion reiserfs: Use kstrdup instead of kmalloc/strcpy ext3: Fix format string issues quota: add missing use of dq_data_lock in __dquot_initialize
2013-03-14Btrfs: fix warning when creating snapshotsLiu Bo
Creating snapshot passes extent_root to commit its transaction, but it can lead to the warning of checking root for quota in the __btrfs_end_transaction() when someone else is committing the current transaction. Since we've recorded the needed root in trans_handle, just use it to get rid of the warning. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: return as soon as possible when edquot happensWang Shilong
If one of qgroup fails to reserve firstly, we should return immediately, it is unnecessary to continue check. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: return EIO if we have extent tree corruptionJosef Bacik
The callers of lookup_inline_extent_info all handle getting an error back properly, so return an error if we have corruption instead of being a jerk and panicing. Still WARN_ON() since this is kind of crucial and I've been seeing it a bit too much recently for my taste, I think we're doing something wrong somewhere. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14btrfs: use rcu_barrier() to wait for bdev puts at unmountEric Sandeen
Doing this would reliably fail with -EBUSY for me: # mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2 ... unable to open /dev/sdb2: Device or resource busy because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it. Using systemtap to track bdev gets & puts shows a kworker thread doing a blkdev put after mkfs attempts a get; this is left over from the unmount path: btrfs_close_devices __btrfs_close_devices call_rcu(&device->rcu, free_device); free_device INIT_WORK(&device->rcu_work, __free_device); schedule_work(&device->rcu_work); so unmount might complete before __free_device fires & does its blkdev_put. Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait until all blkdev_put()s are done, and the device is truly free once unmount completes. Cc: stable@vger.kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: remove btrfs_try_spin_lockLiu Bo
Remove a useless function declaration Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14Btrfs: get better concurrency for snapshot-aware defrag workLiu Bo
Using spinning case instead of blocking will result in better concurrency overall. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This tree includes a partial revert for "fs: Limit sys_mount to only request filesystem modules." When I added the new style module aliases to the filesystems I deleted the old ones. A bad move. It turns out that distributions like Arch linux use module aliases when constructing ramdisks. Which meant ultimately that an ext3 filesystem mounted with ext4 would not result in the ext4 module being put into the ramdisk. The other change in this tree adds a handful of filesystem module alias I simply failed to add the first time. Which inconvinienced a few folks using cifs. I don't want to inconvinience folks any longer than I have to so here are these trivial fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: fs: Readd the fs module aliases. fs: Limit sys_mount to only request filesystem modules. (Part 3)
2013-03-13nfsd: convert to idr_alloc()Tejun Heo
idr_get_new*() and friends are about to be deprecated. Convert to the new idr_alloc() interface. Only compile-tested. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: J. Bruce Fields <bfields@redhat.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13nfsd: remove unused get_new_stid()Tejun Heo
get_new_stid() is no longer used since commit 3abdb607125 ("nfsd4: simplify idr allocation"). Remove it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-13cifs: delay super block destruction until all cifsFileInfo objects are goneMateusz Guzik
cifsFileInfo objects hold references to dentries and it is possible that these will still be around in workqueues when VFS decides to kill super block during unmount. This results in panics like this one: BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:943! [..] Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0) [..] Call Trace: [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0 [<ffffffff8119f946>] kill_anon_super+0x16/0x30 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs] [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80 [<ffffffff811a085e>] deactivate_super+0x4e/0x70 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b Fix this by making each cifsFileInfo object hold a reference to cifs super block, which implicitly keeps VFS super block around as well. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: <stable@vger.kernel.org> Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13cifs: map NT_STATUS_SHARING_VIOLATION to EBUSY instead of ETXTBSYSachin Prabhu
NT_SHARING_VIOLATION errors are mapped to ETXTBSY which is unexpected for operations such as unlink where we can hit these errors. The patch maps the error NT_SHARING_VIOLATION to EBUSY instead. The patch also replaces all instances of ETXTBSY in cifs_rename_pending_delete() with EBUSY. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-13ext2: Fix BUG_ON in evict() on inode deletionJan Kara
Commit 8e3dffc6 introduced a regression where deleting inode with large extended attributes leads to triggering BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)) in fs/inode.c:evict(). That happens because freeing of xattr block dirtied the inode and it happened after clear_inode() has been called. Fix the issue by moving removal of xattr block into ext2_evict_inode() before clear_inode() call close to a place where data blocks are truncated. That is also more logical place and removes surprising requirement that ext2_free_blocks() mustn't dirty the inode. Reported-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-13fs: Readd the fs module aliases.Eric W. Biederman
I had assumed that the only use of module aliases for filesystems prior to "fs: Limit sys_mount to only request filesystem modules." was in request_module. It turns out I was wrong. At least mkinitcpio in Arch linux uses these aliases. So readd the preexising aliases, to keep from breaking userspace. Userspace eventually will have to follow and use the same aliases the kernel does. So at some point we may be delete these aliases without problems. However that day is not today. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and ↵Mathieu Desnoyers
security keys Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to compat_process_vm_rw() shows that the compatibility code requires an explicit "access_ok()" check before calling compat_rw_copy_check_uvector(). The same difference seems to appear when we compare fs/read_write.c:do_readv_writev() to fs/compat.c:compat_do_readv_writev(). This subtle difference between the compat and non-compat requirements should probably be debated, as it seems to be error-prone. In fact, there are two others sites that use this function in the Linux kernel, and they both seem to get it wrong: Now shifting our attention to fs/aio.c, we see that aio_setup_iocb() also ends up calling compat_rw_copy_check_uvector() through aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to be missing. Same situation for security/keys/compat.c:compat_keyctl_instantiate_key_iov(). I propose that we add the access_ok() check directly into compat_rw_copy_check_uvector(), so callers don't have to worry about it, and it therefore makes the compat call code similar to its non-compat counterpart. Place the access_ok() check in the same location where copy_from_user() can trigger a -EFAULT error in the non-compat code, so the ABI behaviors are alike on both compat and non-compat. While we are here, fix compat_do_readv_writev() so it checks for compat_rw_copy_check_uvector() negative return values. And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error handling. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12ext4: use s_extent_max_zeroout_kb value as number of kbLukas Czerner
Currently when converting extent to initialized, we have to decide whether to zeroout part/all of the uninitialized extent in order to avoid extent tree growing rapidly. The decision is made by comparing the size of the extent with the configurable value s_extent_max_zeroout_kb which is in kibibytes units. However when converting it to number of blocks we currently use it as it was in bytes. This is obviously bug and it will result in ext4 _never_ zeroout extents, but rather always split and convert parts to initialized while leaving the rest uninitialized in default setting. Fix this by using s_extent_max_zeroout_kb as kibibytes. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-12vfs: fix pipe counter breakageAl Viro
If you open a pipe for neither read nor write, the pipe code will not add any usage counters to the pipe, causing the 'struct pipe_inode_info" to be potentially released early. That doesn't normally matter, since you cannot actually use the pipe, but the pipe release code - particularly fasync handling - still expects the actual pipe infrastructure to all be there. And rather than adding NULL pointer checks, let's just disallow this case, the same way we already do for the named pipe ("fifo") case. This is ancient going back to pre-2.4 days, and until trinity, nobody naver noticed. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12ext4: use atomic64_t for the per-flexbg free_clusters countTheodore Ts'o
A user who was using a 8TB+ file system and with a very large flexbg size (> 65536) could cause the atomic_t used in the struct flex_groups to overflow. This was detected by PaX security patchset: http://forums.grsecurity.net/viewtopic.php?f=3&t=3289&p=12551#p12551 This bug was introduced in commit 9f24e4208f7e, so it's been around since 2.6.30. :-( Fix this by using an atomic64_t for struct orlav_stats's free_clusters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
2013-03-11reiserfs: Use kstrdup instead of kmalloc/strcpyIonut-Gabriel Radu
Signed-off-by: Ionut-Gabriel Radu <ihonius@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11ext3: Fix format string issuesLars-Peter Clausen
ext3_msg() takes the printk prefix as the second parameter and the format string as the third parameter. Two callers of ext3_msg omit the prefix and pass the format string as the second parameter and the first parameter to the format string as the third parameter. In both cases this string comes from an arbitrary source. Which means the string may contain format string characters, which will lead to undefined and potentially harmful behavior. The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages in ext3") and is fixed by this patch. CC: stable@vger.kernel.org Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11quota: add missing use of dq_data_lock in __dquot_initializeJeff Mahoney
The bulk of __dquot_initialize runs under the dqptr_sem which protects the inode->i_dquot pointers. It doesn't protect the dereferenced contents, though. Those are protected by the dq_data_lock, which is missing around the dquot_resv_space call. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11jbd2: fix use after free in jbd2_journal_dirty_metadata()Jan Kara
jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2013-03-11fs: Limit sys_mount to only request filesystem modules. (Part 3)Eric W. Biederman
Somehow I failed to add the MODULE_ALIAS_FS for cifs, hostfs, hpfs, squashfs, and udf despite what I thought were my careful checks :( Add them now. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-11hostfs: fix a not needed double checkMarco Stornelli
With the commit 3be2be0a32c18b0fd6d623cda63174a332ca0de1 we removed vmtruncate, but actaully there is no need to call inode_newsize_ok() because the checks are already done in inode_change_ok() at the begin of the function. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2013-03-11ext4: reserve metadata block for every delayed writeLukas Czerner
Currently we only reserve space (data+metadata) in delayed allocation if we're allocating from new cluster (which is always in non-bigalloc file system) which is ok for data blocks, because we reserve the whole cluster. However we have to reserve metadata for every delayed block we're going to write because every block could potentially require metedata block when we need to grow the extent tree. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2013-03-11ext4: update reserved space after the 'correction'Lukas Czerner
Currently in ext4_ext_map_blocks() in delayed allocation writeback we would update the reservation and after that check whether we claimed cluster outside of the range of the allocation and if so, we'll give the block back to the reservation pool. However this also means that if the number of reserved data block dropped to zero before the correction, we would release all the metadata reservation as well, however we might still need it because the we're not done with the delayed allocation and there might be more blocks to come. This will result in error messages such as: EXT4-fs warning (device sdb): ext4_da_update_reserve_space:361: ino 12, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 1 data blocks) This will only happen on bigalloc file system and it can be easily reproduced using fiemap-tester from xfstests like this: ./src/fiemap-tester -m DHDHDHDHD -S -p0 /mnt/test/file Or using xfstests such as 225. Fix this by doing the correction first and updating the reservation after that so that we do not accidentally decrease i_reserved_data_blocks to zero. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: do not use yield()Lukas Czerner
Using yield() is strongly discouraged (see sched/core.c) especially since we can just use cond_resched(). Replace all use of yield() with cond_resched(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: remove unused variable in ext4_free_blocks()Lukas Czerner
Remove unused variable 'freed' in ext4_free_blocks(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: fix WARN_ON from ext4_releasepage()Jan Kara
ext4_releasepage() warns when it is passed a page with PageChecked set. However this can correctly happen when invalidate_inode_pages2_range() invalidates pages - and we should fail the release in that case. Since the page was dirty anyway, it won't be discarded and no harm has happened but it's good to be safe. Also remove bogus page_has_buffers() check - we are guaranteed page has buffers in this function. Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Tested-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11ext4: fix the wrong number of the allocated blocks in ext4_split_extent()Zheng Liu
This commit fixes a wrong return value of the number of the allocated blocks in ext4_split_extent. When the length of blocks we want to allocate is greater than the length of the current extent, we return a wrong number. Let's see what happens in the following case when we call ext4_split_extent(). map: [48, 72] ex: [32, 64, u] 'ex' will be split into two parts: ex1: [32, 47, u] ex2: [48, 64, w] 'map->m_len' is returned from this function, and the value is 24. But the real length is 16. So it should be fixed. Meanwhile in this commit we use right length of the allocated blocks when get_reserved_cluster_alloc in ext4_ext_handle_uninitialized_extents is called. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
2013-03-11ext4: update extent status tree after an extent is zeroed outZheng Liu
When we try to split an extent, this extent could be zeroed out and mark as initialized. But we don't know this in ext4_map_blocks because it only returns a length of allocated extent. Meanwhile we will mark this extent as uninitialized because we only check m_flags. This commit update extent status tree when we try to split an unwritten extent. We don't need to worry about the status of this extent because we always mark it as initialized. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-11ext4: fix wrong m_len value after unwritten extent conversionZheng Liu
The ext4_ext_handle_uninitialized_extents() function was assuming the return value of ext4_ext_map_blocks() is equal to map->m_len. This incorrect assumption was harmless until we started use status tree as a extent cache because we need to update status tree according to 'm_len' value. Meanwhile this commit marks EXT4_MAP_MAPPED flag after unwritten extent conversion. It shouldn't cause a bug because we update status tree according to checking EXT4_MAP_UNWRITTEN flag. But it should be fixed. After applied this commit, the following error message from self-testing infrastructure disappears. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-11ext4: add self-testing infrastructure to do a sanity checkDmitry Monakhov
This commit adds a self-testing infrastructure like extent tree does to do a sanity check for extent status tree. After status tree is as a extent cache, we'd better to make sure that it caches right result. After applied this commit, we will get a lot of messages when we run xfstests as below. ... kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len 3 in ext4_map_blocks (allocation) ... kernel: ES cache assertation failed for inode: 230 es_cached ex [974/2/4781/20] != found ex [974/1/4781/1000] ... kernel: ES insert assertation failed for inode: 635 ex_status [0/45/21388/w] != es_status [44/1/21432/u] ... Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-03-11ext4: avoid a potential overflow in ext4_es_can_be_merged()Zheng Liu
Check the length of an extent to avoid a potential overflow in ext4_es_can_be_merged(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dmitry Monakhov <dmonakhov@openvz.org>
2013-03-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace bugfixes from Eric Biederman: "This is three simple fixes against 3.9-rc1. I have tested each of these fixes and verified they work correctly. The userns oops in key_change_session_keyring and the BUG_ON triggered by proc_ns_follow_link were found by Dave Jones. I am including the enhancement for mount to only trigger requests of filesystem modules here instead of delaying this for the 3.10 merge window because it is both trivial and the kind of change that tends to bit-rot if left untouched for two months." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Use nd_jump_link in proc_ns_follow_link fs: Limit sys_mount to only request filesystem modules (Part 2). fs: Limit sys_mount to only request filesystem modules. userns: Stop oopsing in key_change_session_keyring
2013-03-09proc: Use nd_jump_link in proc_ns_follow_linkEric W. Biederman
Update proc_ns_follow_link to use nd_jump_link instead of just manually updating nd.path.dentry. This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave Jones and reproduced trivially with mkdir /proc/self/ns/uts/a. Sigh it looks like the VFS change to require use of nd_jump_link happend while proc_ns_follow_link was baking and since the common case of proc_ns_follow_link continued to work without problems the need for making this change was overlooked. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "These are scattered fixes and one performance improvement. The biggest functional change is in how we throttle metadata changes. The new code bumps our average file creation rate up by ~13% in fs_mark, and lowers CPU usage. Stefan bisected out a regression in our allocation code that made balance loop on extents larger than 256MB." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: improve the delayed inode throttling Btrfs: fix a mismerge in btrfs_balance() Btrfs: enforce min_bytes parameter during extent allocation Btrfs: allow running defrag in parallel to administrative tasks Btrfs: avoid deadlock on transaction waiting list Btrfs: do not BUG_ON on aborted situation Btrfs: do not BUG_ON in prepare_to_reloc Btrfs: free all recorded tree blocks on error Btrfs: build up error handling for merge_reloc_roots Btrfs: check for NULL pointer in updating reloc roots Btrfs: fix unclosed transaction handler when the async transaction commitment fails Btrfs: fix wrong handle at error path of create_snapshot() when the commit fails Btrfs: use set_nlink if our i_nlink is 0
2013-03-08Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS fixes from Steve French: "A small set of cifs fixes which includes one for a recent regression in the write path (pointed out by Anton), some fixes for rename problems and as promised for 3.9 removing the obsolete sockopt mount option (and the accompanying deprecation warning)." * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix missing of oplock_read value in smb30_values structure cifs: don't try to unlock pagecache page after releasing it cifs: remove the sockopt= mount option cifs: Check server capability before attempting silly rename cifs: Fix bug when checking error condition in cifs_rename_pending_delete()
2013-03-08vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlinkLinus Torvalds
It's "normal" - it can happen if the file descriptor you followed was opened with O_NOFOLLOW. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-07Merge tag 'ecryptfs-3.9-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Minor code cleanups and new Kconfig option to disable /dev/ecryptfs The code cleanups fix up W=1 compiler warnings and some unnecessary checks. The new Kconfig option, defaulting to N, allows the rarely used eCryptfs kernel to userspace communication channel to be compiled out. This may be the first step in it being eventually removed." Hmm. I'm not sure whether these should be called "fixes", and it probably should have gone in the merge window. But I'll let it slide. * tag 'ecryptfs-3.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: allow userspace messaging to be disabled eCryptfs: Fix redundant error check on ecryptfs_find_daemon_by_euid() ecryptfs: ecryptfs_msg_ctx_alloc_to_free(): remove kfree() redundant null check eCryptfs: decrypt_pki_encrypted_session_key(): remove kfree() redundant null check eCryptfs: remove unneeded checks in virt_to_scatterlist() eCryptfs: Fix -Wmissing-prototypes warnings eCryptfs: Fix -Wunused-but-set-variable warnings eCryptfs: initialize payload_len in keystore.c
2013-03-07Btrfs: improve the delayed inode throttlingChris Mason
The delayed inode code batches up changes to the btree in hopes of doing them in bulk. As the changes build up, processes kick off worker threads and wait for them to make progress. The current code kicks off an async work queue item for each delayed node, which creates a lot of churn. It also uses a fixed 1 HZ waiting period for the throttle, which allows us to build a lot of pending work and can slow down the commit. This changes us to watch a sequence counter as it is bumped during the operations. We kick off fewer work items and have each work item do more work. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-07fs: Limit sys_mount to only request filesystem modules (Part 2).Eric W. Biederman
Add missing MODULE_ALIAS_FS("ocfs2") how did I miss that? Remove unnecessary MODULE_ALIAS_FS("devpts") devpts can not be modular. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-07Btrfs: fix a mismerge in btrfs_balance()Ilya Dryomov
Raid56 merge (merge commit e942f88) had mistakenly removed a call to __cancel_balance(), which resulted in balance not cleaning up after itself after a successful finish. (Cleanup includes switching the state, removing the balance item and releasing mut_ex_op testnset lock.) Bring it back. Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-07CIFS: Fix missing of oplock_read value in smb30_values structurePavel Shilovsky
Cc: stable@vger.kernel.org Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-07cifs: don't try to unlock pagecache page after releasing itJeff Layton
We had a recent fix to fix the release of pagecache pages when cifs_writev_requeue writes fail. Unfortunately, it releases the page before trying to unlock it. At that point, the page might be gone by the time the unlock comes in. Unlock the page first before checking the value of "rc", and only then end writeback and release the pages. The page lock isn't required for any of those operations so this should be safe. Reported-by: Anton Altaparmakov <aia21@cam.ac.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-07cifs: remove the sockopt= mount optionJeff Layton
...as promised for 3.9. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-07Merge branch 'master' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9
2013-03-07cifs: Check server capability before attempting silly renameSachin Prabhu
cifs_rename_pending_delete() attempts to silly rename file using CIFSSMBRenameOpenFile(). This uses the SET_FILE_INFORMATION TRANS2 command with information level set to the passthru info-level SMB_SET_FILE_RENAME_INFORMATION. We need to check to make sure that the server support passthru info-levels before attempting the silly rename or else we will fail to rename the file. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2013-03-07cifs: Fix bug when checking error condition in cifs_rename_pending_delete()Sachin Prabhu
Fix check for error condition after setting attributes with CIFSSMBSetFileInfo(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com>