summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-01-04move fs/partitions to block/Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: fix the rest of sget() racesAl Viro
unfortunately, just checking MS_BORN after having grabbed ->s_umount in sget() is not enough; places that pick superblock from a list and grab s_umount shared need the same check in addition to checking for ->s_root; otherwise three-way race between failing mount, sget() and such list-walker can leave us with list-walker coming *second*, when temporary active ref grabbed by sget() (to be dropped when sget() notices that original mount has failed by checking MS_BORN) has lead to deactivate_locked_super() from failing ->mount() *not* doing ->kill_sb() and just releasing ->s_umount. Once sget() gets through and notices that MS_BORN had never been set it will drop the active ref and fs will be shut down and kicked out of all lists, but it's too late for something like sync_supers(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: new helper - vfs_ustat()Al Viro
... and bury user_get_super()/statfs_by_dentry() - they are purely internal now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: live vfsmounts never have NULL ->mnt_sbAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: for usbfs, etc. internal vfsmounts ->mnt_sb->s_root == ->mnt_rootAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: pipe.c is really non-modularAl Viro
... so no exitcalls there. Not much would work if pipe(2) would stop working, after all... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: fix the stupidity with i_dentry in inode destructorsAl Viro
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: mnt_drop_write_file()Al Viro
new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04constify seq_file stuffAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: make do_kern_mount() staticAl Viro
the only user outside of fs/namespace.c has died Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: convert fs_supers to hlistAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04make nfs_follow_remote_path() handle ERR_PTR() passed as root_mntAl Viro
... rather than duplicating that in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: kill ->mnt_devname use in afs printksAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04btrfs, nfs, apparmor: don't pull mnt_namespace.h for no reason...Al Viro
it's not needed anymore; we used to, back when we had to do mount_subtree() by hand, complete with put_mnt_ns() in it. No more... Apparmor didn't need it since the __d_path() fix. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: dentry_reset_mounted() doesn't use vfsmount argumentAl Viro
lose it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04unexport put_mnt_ns(), make create_mnt_ns() static outrightAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: add missing parens in pnode.h macrosAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: more mnt_parent cleanupsAl Viro
a) mount --move is checking that ->mnt_parent is non-NULL before looking if that parent happens to be shared; ->mnt_parent is never NULL and it's not even an misspelled !mnt_has_parent() b) pivot_root open-codes is_path_reachable(), poorly. c) so does path_is_under(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: new internal helper: mnt_has_parent(mnt)Al Viro
vfsmounts have ->mnt_parent pointing either to a different vfsmount or to itself; it's never NULL and termination condition in loops traversing the tree towards root is mnt == mnt->mnt_parent. At least one place (see the next patch) is confused about what's going on; let's add an explicit helper checking it right way and use it in all places where we need it. Not that there had been too many, but... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04vfs: kill pointless helpers in namespace.cAl Viro
mnt_{inc,dec}_count() is not cleaner than doing the corresponding mnt_add_count() directly and mnt_set_count() is not used at all. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04new helpers: fh_{want,drop}_write()Al Viro
A bunch of places in nfsd does mnt_{want,drop}_write on vfsmount of export of given fhandle. Switched to obvious inlined helpers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04switch a bunch of places to mnt_want_write_file()Al Viro
it's both faster (in case when file has been opened for write) and cleaner. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04trim fs/internal.hAl Viro
some stuff in there can actually become static; some belongs to pnode.h as it's a private interface between namespace.c and pnode.c... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04pull manipulations of rpc_cred inside alloc_nfs_open_context()Al Viro
No need to duplicate them in both callers; make it return ERR_PTR(-ENOMEM) on allocation failure instead of NULL and it'll be able to report rpc_lookup_cred() failures just fine. Callers are much happier that way... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: disable use of dcache for readdir etc.
2011-12-30Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: log all dirty inodes in xfs_fs_sync_fs xfs: log the inode in ->write_inode calls for kupdate
2011-12-30procfs: do not confuse jiffies with cputime64_tAndreas Schwab
Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time for nohz") did not take into account that one some architectures jiffies and cputime use different units. This causes get_idle_time() to return numbers in the wrong units, making the idle time fields in /proc/stat wrong. Instead of converting the usec value returned by get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function usecs_to_cputime64 to convert it to the correct unit of cputime64_t. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Artem S. Tashkinov" <t.artem@mailcity.com> Cc: Dave Jones <davej@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-29ceph: disable use of dcache for readdir etc.Sage Weil
Ceph attempts to use the dcache to satisfy negative lookups and readdir when the entire directory contents are in cache. Disable this behavior until lingering bugs in this code are shaken out; we'll re-enable these hooks once things are fully stable. Signed-off-by: Sage Weil <sage@newdream.net>
2011-12-26vfs: fix handling of lock allocation failure in lease-break caseLinus Torvalds
Bruce Fields notes that commit 778fc546f749 ("locks: fix tracking of inprogress lease breaks") introduced a possible error pointer dereference on failure to allocate memory. locks_conflict() will dereference the passed-in new lease lock structure that may be an error pointer. This means an open (without O_NONBLOCK set) on a file with a lease applied (generally only done when Samba or nfsd (with v4) is running) could crash if a kmalloc() fails. So instead of playing games with IS_ERROR() all over the place, just check the allocation failure early. That makes the code more straightforward, and avoids this possible bad pointer dereference. Based-on-patch-by: J. Bruce Fields <bfields@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-24Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linuxLinus Torvalds
for linus: writeback reason binary tracing format fix * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: show writeback reason with __print_symbolic
2011-12-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: call d_instantiate after all ops are setup Btrfs: fix worker lock misuse in find_worker
2011-12-23xfs: log all dirty inodes in xfs_fs_sync_fsChristoph Hellwig
Since Linux 2.6.36 the writeback code has introduces various measures for live lock prevention during sync(). Unfortunately some of these are actively harmful for the XFS model, where the inode gets marked dirty for metadata from the data I/O handler. The older_than_this checks that are now more strictly enforced since writeback: avoid livelocking WB_SYNC_ALL writeback by only calling into __writeback_inodes_sb and thus only sampling the current cut off time once. But on a slow enough devices the previous asynchronous sync pass might not have fully completed yet, and thus XFS might mark metadata dirty only after that sampling of the cut off time for the blocking pass already happened. I have not myself reproduced this myself on a real system, but by introducing artificial delay into the XFS I/O completion workqueues it can be reproduced easily. Fix this by iterating over all XFS inodes in ->sync_fs and log all that are dirty. This might log inode that only got redirtied after the previous pass, but given how cheap delayed logging of inodes is it isn't a major concern for performance. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-23xfs: log the inode in ->write_inode calls for kupdateChristoph Hellwig
If the writeback code writes back an inode because it has expired we currently use the non-blockin ->write_inode path. This means any inode that is pinned is skipped. With delayed logging and a workload that has very little log traffic otherwise it is very likely that an inode that gets constantly written to is always pinned, and thus we keep refusing to write it. The VM writeback code at that point redirties it and doesn't try to write it again for another 30 seconds. This means under certain scenarious time based metadata writeback never happens. Fix this by calling into xfs_log_inode for kupdate in addition to data integrity syncs, and thus transfer the inode to the log ASAP. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2011-12-23Btrfs: call d_instantiate after all ops are setupAl Viro
This closes races where btrfs is calling d_instantiate too soon during inode creation. All of the callers of btrfs_add_nondir are updated to instantiate after the inode is fully setup in memory. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-23Btrfs: fix worker lock misuse in find_workerChris Mason
Dan Carpenter noticed that we were doing a double unlock on the worker lock, and sometimes picking a worker thread without the lock held. This fixes both errors. Signed-off-by: Chris Mason <chris.mason@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2011-12-20Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix a regression in nfs_file_llseek() NFSv4: Do not accept delegated opens when a delegation recall is in effect NFSv4: Ensure correct locking when accessing the 'lock_states' list NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits. NFSv4: Don't error if we handled it in nfs4_recovery_handle_error SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot SUNRPC: Fix the execution time statistics in the face of RPC restarts
2011-12-20nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()Haogang Chen
There is a potential integer overflow in nilfs_ioctl_clean_segments(). When a large argv[n].v_nmembs is passed from the userspace, the subsequent call to vmalloc() will allocate a buffer smaller than expected, which leads to out-of-bound access in nilfs_ioctl_move_blocks() and lfs_clean_segments(). The following check does not prevent the overflow because nsegs is also controlled by the userspace and could be very large. if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment) goto out_free; This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and returns -EINVAL when overflow. Signed-off-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-20nilfs2: unbreak compat ioctlThomas Meyer
commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all other NILFS compat ioctls. Make them work again. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-12-18writeback: show writeback reason with __print_symbolicWu Fengguang
This makes the binary trace understandable by trace-cmd. CC: Dave Chinner <david@fromorbit.com> CC: Curt Wohlgemuth <curtw@google.com> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-12-16Merge branches 'for-linus' and 'for-linus-3.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: unplug every once and a while Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code Btrfs: only set cache_generation if we setup the block group Btrfs: don't panic if orphan item already exists Btrfs: fix leaked space in truncate Btrfs: fix how we do delalloc reservations and how we free reservations on error Btrfs: deal with enospc from dirtying inodes properly Btrfs: fix num_workers_starting bug and other bugs in async thread BTRFS: Establish i_ops before calling d_instantiate Btrfs: add a cond_resched() into the worker loop Btrfs: fix ctime update of on-disk inode btrfs: keep orphans for subvolume deletion Btrfs: fix inaccurate available space on raid0 profile Btrfs: fix wrong disk space information of the files Btrfs: fix wrong i_size when truncating a file to a larger size Btrfs: fix btrfs_end_bio to deal with write errors to a single mirror * 'for-linus-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: lower the dirty balance poll interval
2011-12-16btrfs: lower the dirty balance poll intervalWu Fengguang
Tests show that the original large intervals can easily make the dirty limit exceeded on 100 concurrent dd's. So adapt to as large as the next check point selected by the dirty throttling algorithm. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15NFS: Fix a regression in nfs_file_llseek()Trond Myklebust
After commit 06222e491e663dac939f04b125c9dc52126a75c4 (fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek) the behaviour of llseek() was changed so that it always revalidates the file size. The bug appears to be due to a logic error in the afore-mentioned commit, which always evaluates to 'true'. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>=3.1]
2011-12-15Btrfs: unplug every once and a whileChris Mason
The btrfs io submission threads can build up massive plug lists. This keeps things more reasonable so we don't hand over huge dumps of IO at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15Merge branch 'for-chris' of ↵Chris Mason
http://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-work into integration Conflicts: fs/btrfs/inode.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15Btrfs: deal with NULL srv_rsv in the delalloc inode reservation codeChris Mason
btrfs_update_inode is sometimes called with a null reservation. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-15Btrfs: only set cache_generation if we setup the block groupJosef Bacik
A user reported a problem booting into a new kernel with the old format inodes. He was panicing in cow_file_range while writing out the inode cache. This is because if the block group is not cached we'll just skip writing out the cache, however if it gets dirtied again in the same transaction and it finished caching we'd go ahead and write it out, but since we set cache_generation to the transid we think we've already truncated it and will just carry on, running into cow_file_range and blowing up. We need to make sure we only set cache_generation if we've done the truncate. The user tested this patch and verified that the panic no longer occured. Thanks, Reported-and-Tested-by: Klaus Bitto <klaus.bitto@gmail.com> Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15Btrfs: don't panic if orphan item already existsJosef Bacik
I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in a loop. This is because we will add an orphan item, do the truncate, the truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're left with an orphan item still in the fs. Then we come back later to do another truncate and it blows up because we already have an orphan item. This is ok so just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15Btrfs: fix leaked space in truncateJosef Bacik
We were occasionaly leaking space when running xfstest 269. This is because if we failed to start the transaction in the truncate loop we'd just goto out, but we need to break so that the inode is removed from the orphan list and the space is properly freed. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15Btrfs: fix how we do delalloc reservations and how we free reservations on errorJosef Bacik
Running xfstests 269 with some tracing my scripts kept spitting out errors about releasing bytes that we didn't actually have reserved. This took me down a huge rabbit hole and it turns out the way we deal with reserved_extents is wrong, we need to only be setting it if the reservation succeeds, otherwise the free() method will come in and unreserve space that isn't actually reserved yet, which can lead to other warnings and such. The math was all working out right in the end, but it caused all sorts of other issues in addition to making my scripts yell and scream and generally make it impossible for me to track down the original issue I was looking for. The other problem is with our error handling in the reservation code. There are two cases that we need to deal with 1) We raced with free. In this case free won't free anything because csum_bytes is modified before we dro the lock in our reservation path, so free rightly doesn't release any space because the reservation code may be depending on that reservation. However if we fail, we need the reservation side to do the free at that point since that space is no longer in use. So as it stands the code was doing this fine and it worked out, except in case #2 2) We don't race with free. Nobody comes in and changes anything, and our reservation fails. In this case we didn't reserve anything anyway and we just need to clean up csum_bytes but not free anything. So we keep track of csum_bytes before we drop the lock and if it hasn't changed we know we can just decrement csum_bytes and carry on. Because of the case where we can race with free()'s since we have to drop our spin_lock to do the reservation, I'm going to serialize all reservations with the i_mutex. We already get this for free in the heavy use paths, truncate and file write all hold the i_mutex, just needed to add it to page_mkwrite and various ioctl/balance things. With this patch my space leak scripts no longer scream bloody murder. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-12-15Btrfs: deal with enospc from dirtying inodes properlyJosef Bacik
Now that we're properly keeping track of delayed inode space we've been getting a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is because a bunch of people call mark_inode_dirty, which is void so we can't return ENOSPC. This needs to be fixed in a few areas 1) file_update_time - this updates the mtime and such when writing to a file, which will call mark_inode_dirty. So copy file_update_time into btrfs so we can call btrfs_dirty_inode directly and return an error if we get one appropriately. 2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't setting ->setattr for symlinks, even though we should have been. This catches one of the cases where we were getting errors in mark_inode_dirty. 3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly instead of mark_inode_dirty. This lets us return errors properly for truncate and chown/anything related to setattr. 4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and print an error if we have one. The only remaining user we can't control for this is touch_atime(), but we don't really want to keep people from walking down the tree if we don't have space to save the atime update, so just complain but don't worry about it. With this patch xfstests 83 complains a handful of times instead of hundreds of times. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>