summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2012-02-23xfs: remove log space waitqueuesChristoph Hellwig
The tic->t_wait waitqueues can never have more than a single waiter on them, so we can easily replace them with a task_struct pointer and wake_up_process. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-23xfs: cleanup xfs_log_space_wakeChristoph Hellwig
Remove the now unused opportunistic parameter, and use the the xlog_writeq_wake and xlog_reserveq_wake helpers now that we don't have to care about the opportunistic wakeups. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-23xfs: remove xfs_trans_unlocked_itemChristoph Hellwig
There is no reason to wake up log space waiters when unlocking inodes or dquots, and the commit log has no explanation for this function either. Given that we now have exact log space wakeups everywhere we can assume the reason for this function was to paper over log space races in earlier XFS versions. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-23xfs: do exact log space wakeups in xlog_ungrant_log_spaceChristoph Hellwig
The only reason that xfs_log_space_wake had to do opportunistic wakeups was that the old xfs_log_move_tail calling convention didn't allow for exact wakeups when not updating the log tail LSN. Since this issue has been fixed we can do exact wakeups now. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-23xfs: split tail_lsn assignments from log space wakeupsChristoph Hellwig
Currently xfs_log_move_tail has a tail_lsn argument that is horribly overloaded: it may contain either an actual lsn to assign to the log tail, 0 as a special case to use the last sync LSN, or 1 to indicate that no tail LSN assignment should be performed, and we should opportunisticly wake up at one task waiting for log space even if we did not move the LSN. Remove the tail lsn assigned from xfs_log_move_tail and make the two callers use xlog_assign_tail_lsn instead of the current variant of partially using the code in xfs_log_move_tail and partially opencoding it. Note that means we grow an addition lock roundtrip on the AIL lock for each bulk update or delete, which is still far less than what we had before introducing the bulk operations. If this proves to be a problem we can still add a variant of xlog_assign_tail_lsn that expects the lock to be held already. Also rename the remainder of xfs_log_move_tail to xfs_log_space_wake as that name describes its functionality much better. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-23xfs: cleanup quota check on disk blocks and inodes reservationsMitsuo Hayasaka
This patch is a cleanup of quota check on disk blocks and inodes reservations, and changes it as follows. (1) add a total_count variable to store the total number of current usages and new reservations for disk blocks and inodes, respectively. (2) make it more readable to check if the local variables softlimit and hardlimit are positive. It has been changed as follows. if (softlimit > 0ULL) -> if (softlimit) if (hardlimit > 0ULL) -> if (hardlimit) This is because they are defined as xfs_qcnt_t which is unsigned. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-21xfs: make inode quota check more generalMitsuo Hayasaka
The xfs checks quota when reserving disk blocks and inodes. In the block reservation, it checks if the total number of blocks including current usage and new reservation exceed quota. In the inode reservation, it checks using the total number of inodes including only current usage without new reservation. However, this inode quota check works well since the caller of xfs_trans_dquot() always sets the argument of the number of new inode reservation to 1 or 0 and inode is reserved one by one in current xfs. To make it more general, this patch changes it to the same way as the block quota check. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit c922bbc819324558e61402a7a76c10c550ca61bc)
2012-02-21xfs: change available ranges of softlimit and hardlimit in quota checkMitsuo Hayasaka
In general, quota allows us to use disk blocks and inodes up to each limit, that is, they are available if they don't exceed their limitations. Current xfs sets their available ranges to lower than them except disk inode quota check. So, this patch changes the ranges to not beyond them. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 20f12d8ac01917d96860f352f67eddd912df0afb)
2012-02-13XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intendedJesper Juhl
It looks to me like the two ASSERT()s in xfs_trans_add_item() really want to do a compare (==) rather than assignment (=). This patch changes it from the latter to the former. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 05293485a0b6b1f803e8a3c0ff188c38f6969985)
2012-02-10xfs: use a normal shrinker for the dquot freelistChristoph Hellwig
Stop reusing dquots from the freelist when allocating new ones directly, and implement a shrinker that actually follows the specifications for the interface. The shrinker implementation is still highly suboptimal at this point, but we can gradually work on it. This also fixes an bug in the previous lock ordering, where we would take the hash and dqlist locks inside of the freelist lock against the normal lock ordering. This is only solvable by introducing the dispose list, and thus not when using direct reclaim of unused dquots for new allocations. As a side-effect the quota upper bound and used to free ratio values in /proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the new world order. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 04da0c8196ac0b12fb6b84f4b7a51ad2fa56d869)
2012-02-03Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usageChandra Seetharaman
Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage of quota macros. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-03Change xfs_sb_from_disk() interface to take a mount pointerChandra Seetharaman
Change xfs_sb_from_disk() interface to take a mount pointer instead of a superblock pointer. This is to print mount point specific error messages in future fixes. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-03Define a new function xfs_inode_dquot()Chandra Seetharaman
Define a new function xfs_inode_dquot() that takes a inode pointer and a disk quota type and returns the quota pointer for the specified quota type. This simplifies the xfs_qm_dqget() error path significantly. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-03Define a new function xfs_this_quota_on()Chandra Seetharaman
Create a new function xfs_this_quota_on() that takes a xfs_mount data structure and a disk quota type and returns true if the specified type of quota is ON in the xfs_mount data structure. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-02-02xfs: kill the unused XFS_BB_FSB_OFFSET macroAmit Sahrawat
Removing the macro, as this is no more needed in the code. Tried to find the reference when it was last used - but the usage for this seemed to have been dropped long time ago. Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-31xfs: show uuid when mount fails due to duplicate uuidMitsuo Hayasaka
When a system tries to mount a filesystem (FS) using UUID, the xfs returns -EINVAL and shows a message if a FS with the same UUID has been already mounted. It is useful to output the duplicate UUID with it. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-31xfs: pass KM_SLEEP flag to kmem_realloc() in xlog_recover_add_to_cnt_trans()Mitsuo Hayasaka
The kmem_realloc() in xfs is given KM_* memory allocation flags. And it allocates memory using kmalloc() after they are converted to gfp_mask flags. In xlog_recover_add_to_cont_trans(), 0u is passed to kmem_realloc(), instead of them. I guess it is preferred to use them, and here memory must be allocated but don't have to be done with GFP_ATOMIC. So, this patch changes it to KM_SLEEP. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-25xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()Jan Kara
Commit b52a360b forgot to call xfs_iunlock() when it detected corrupted symplink and bailed out. Fix it by jumping to 'out' instead of doing return. CC: stable@kernel.org CC: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Alex Elder <elder@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-19Merge branches 'sched-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds
'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/accounting, proc: Fix /proc/stat interrupts sum * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracepoints/module: Fix disabling tracepoints with taint CRAP or OOT x86/kprobes: Add arch/x86/tools/insn_sanity to .gitignore x86/kprobes: Fix typo transferred from Intel manual * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, syscall: Need __ARCH_WANT_SYS_IPC for 32 bits x86, tsc: Fix SMI induced variation in quick_pit_calibrate() x86, opcode: ANDN and Group 17 in x86-opcode-map.txt x86/kconfig: Move the ZONE_DMA entry under a menu x86/UV2: Add accounting for BAU strong nacks x86/UV2: Ack BAU interrupt earlier x86/UV2: Remove stale no-resources test for UV2 BAU x86/UV2: Work around BAU bug x86/UV2: Fix BAU destination timeout initialization x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode x86: Get rid of dubious one-bit signed bitfield
2012-01-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qnx4: don't leak ->BitMap on late failure exits qnx4: reduce the insane nesting in qnx4_checkroot() qnx4: di_fname is an array, for crying out loud... vfs: remove printk from set_nlink() wake up s_wait_unfrozen when ->freeze_fs fails
2012-01-19qnx4: don't leak ->BitMap on late failure exitsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-19qnx4: reduce the insane nesting in qnx4_checkroot()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-19qnx4: di_fname is an array, for crying out loud...Al Viro
(struct qnx4_inode_entry *)(bh->b_data + some_offset)->di_fname is not going to be NULL, TYVM... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit: (29 commits) audit: no leading space in audit_log_d_path prefix audit: treat s_id as an untrusted string audit: fix signedness bug in audit_log_execve_info() audit: comparison on interprocess fields audit: implement all object interfield comparisons audit: allow interfield comparison between gid and ogid audit: complex interfield comparison helper audit: allow interfield comparison in audit rules Kernel: Audit Support For The ARM Platform audit: do not call audit_getname on error audit: only allow tasks to set their loginuid if it is -1 audit: remove task argument to audit_set_loginuid audit: allow audit matching on inode gid audit: allow matching on obj_uid audit: remove audit_finish_fork as it can't be called audit: reject entry,always rules audit: inline audit_free to simplify the look of generic code audit: drop audit_set_macxattr as it doesn't do anything audit: inline checks for not needing to collect aux records audit: drop some potentially inadvisable likely notations ... Use evil merge to fix up grammar mistakes in Kconfig file. Bad speling and horrible grammar (and copious swearing) is to be expected, but let's keep it to commit messages and comments, rather than expose it to users in config help texts or printouts.
2012-01-17Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: cleanup xfs_file_aio_write xfs: always return with the iolock held from xfs_file_aio_write_checks xfs: remove the i_new_size field in struct xfs_inode xfs: remove the i_size field in struct xfs_inode xfs: replace i_pin_wait with a bit waitqueue xfs: replace i_flock with a sleeping bitlock xfs: make i_flags an unsigned long xfs: remove the if_ext_max field in struct xfs_ifork xfs: remove the unused dm_attrs structure xfs: cleanup xfs_iomap_eof_align_last_fsb xfs: remove xfs_itruncate_data
2012-01-17Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: take allocation of ->tree_root into open_ctree() btrfs: let ->s_fs_info point to fs_info, not root... btrfs: consolidate failure exits in btrfs_mount() a bit btrfs: make free_fs_info() call ->kill_sb() unconditional btrfs: merge free_fs_info() calls on fill_super failures btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() btrfs: make open_ctree() return int btrfs: sanitizing ->fs_info, part 5 btrfs: sanitizing ->fs_info, part 4 btrfs: sanitizing ->fs_info, part 3 btrfs: sanitizing ->fs_info, part 2 btrfs: sanitizing ->fs_info, part 1 btrfs: fix a deadlock in btrfs_scan_one_device() btrfs: fix mount/umount race btrfs: get ->kill_sb() of its own btrfs: preparation to fixing mount/umount race
2012-01-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: use larger system chunks Btrfs: add a delalloc mutex to inodes for delalloc reservations Btrfs: space leak tracepoints Btrfs: protect orphan block rsv with spin_lock Btrfs: add allocator tracepoints Btrfs: don't call btrfs_throttle in file write Btrfs: release space on error in page_mkwrite Btrfs: fix btrfsck error 400 when truncating a compressed Btrfs: do not use btrfs_end_transaction_throttle everywhere Btrfs: add balance progress reporting Btrfs: allow for resuming restriper after it was paused Btrfs: allow for canceling restriper Btrfs: allow for pausing restriper Btrfs: add skip_balance mount option Btrfs: recover balance on mount Btrfs: save balance parameters to disk Btrfs: soft profile changing mode (aka soft convert) Btrfs: implement online profile changing Btrfs: do not reduce profile in do_chunk_alloc() Btrfs: virtual address space subset filter ... Fix up trivial conflict in fs/btrfs/ioctl.c due to the use of the new mnt_drop_write_file() helper.
2012-01-17proc: clean up and fix /proc/<pid>/mem handlingLinus Torvalds
Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very robust, and it also doesn't match the permission checking of any of the other related files. This changes it to do the permission checks at open time, and instead of tracking the process, it tracks the VM at the time of the open. That simplifies the code a lot, but does mean that if you hold the file descriptor open over an execve(), you'll continue to read from the _old_ VM. That is different from our previous behavior, but much simpler. If somebody actually finds a load where this matters, we'll need to revert this commit. I suspect that nobody will ever notice - because the process mapping addresses will also have changed as part of the execve. So you cannot actually usefully access the fd across a VM change simply because all the offsets for IO would have changed too. Reported-by: Jüri Aedla <asd@ut.ee> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-17vfs: remove printk from set_nlink()Miklos Szeredi
Don't log a message for set_nlink(0). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-17wake up s_wait_unfrozen when ->freeze_fs failsKazuya Mio
dd slept infinitely when fsfeeze failed because of EIO. To fix this problem, if ->freeze_fs fails, freeze_super() wakes up the tasks waiting for the filesystem to become unfrozen. When s_frozen isn't SB_UNFROZEN in __generic_file_aio_write(), the function sleeps until FITHAW ioctl wakes up s_wait_unfrozen. However, if ->freeze_fs fails, s_frozen is set to SB_UNFROZEN and then freeze_super() returns an error number. In this case, FITHAW ioctl returns EINVAL because s_frozen is already SB_UNFROZEN. There is no way to wake up s_wait_unfrozen, so __generic_file_aio_write() sleeps infinitely. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-17audit: do not call audit_getname on errorEric Paris
Just a code cleanup really. We don't need to make a function call just for it to return on error. This also makes the VFS function even easier to follow and removes a conditional on a hot path. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: only allow tasks to set their loginuid if it is -1Eric Paris
At the moment we allow tasks to set their loginuid if they have CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they log in and it be impossible to ever reset. We had to make it mutable even after it was once set (with the CAP) because on update and admin might have to restart sshd. Now sshd would get his loginuid and the next user which logged in using ssh would not be able to set his loginuid. Systemd has changed how userspace works and allowed us to make the kernel work the way it should. With systemd users (even admins) are not supposed to restart services directly. The system will restart the service for them. Thus since systemd is going to loginuid==-1, sshd would get -1, and sshd would be allowed to set a new loginuid without special permissions. If an admin in this system were to manually start an sshd he is inserting himself into the system chain of trust and thus, logically, it's his loginuid that should be used! Since we have old systems I make this a Kconfig option. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17audit: remove task argument to audit_set_loginuidEric Paris
The function always deals with current. Don't expose an option pretending one can use it for something. You can't. Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-17xfs: cleanup xfs_file_aio_writeChristoph Hellwig
With all the size field updates out of the way xfs_file_aio_write can be further simplified by pushing all iolock handling into xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using the generic generic_write_sync helper for synchronous writes. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: always return with the iolock held from xfs_file_aio_write_checksChristoph Hellwig
While xfs_iunlock is fine with 0 lockflags the calling conventions are much cleaner if xfs_file_aio_write_checks never returns without the iolock held. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: remove the i_new_size field in struct xfs_inodeChristoph Hellwig
Now that we use the VFS i_size field throughout XFS there is no need for the i_new_size field any more given that the VFS i_size field gets updated in ->write_end before unlocking the page, and thus is always uptodate when writeback could see a page. Removing i_new_size also has the advantage that we will never have to trim back di_size during a failed buffered write, given that it never gets updated past i_size. Note that currently the generic direct I/O code only updates i_size after calling our end_io handler, which requires a small workaround to make sure di_size actually makes it to disk. I hope to fix this properly in the generic code. A downside is that we lose the support for parallel non-overlapping O_DIRECT appending writes that recently was added. I don't think keeping the complex and fragile i_new_size infrastructure for this is a good tradeoff - if we really care about parallel appending writers we should investigate turning the iolock into a range lock, which would also allow for parallel non-overlapping buffered writers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: remove the i_size field in struct xfs_inodeChristoph Hellwig
There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: replace i_pin_wait with a bit waitqueueChristoph Hellwig
Replace i_pin_wait, which is only used during synchronous inode flushing with a bit waitqueue. This trades off a much smaller inode against slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit) bytes in the XFS inode. Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: replace i_flock with a sleeping bitlockChristoph Hellwig
We almost never block on i_flock, the exception is synchronous inode flushing. Instead of bloating the inode with a 16/24-byte completion that we abuse as a semaphore just implement it as a bitlock that uses a bit waitqueue for the rare sleeping path. This primarily is a tradeoff between a much smaller inode and a faster non-blocking path vs faster wakeups, and we are much better off with the former. A small downside is that we will lose lockdep checking for i_flock, but given that it's always taken inside the ilock that should be acceptable. Note that for example the inode writeback locking is implemented in a very similar way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: make i_flags an unsigned longChristoph Hellwig
To be used for bit wakeup i_flags needs to be an unsigned long or we'll run into trouble on big endian systems. Because of the 1-byte i_update field right after it this actually causes a fairly large size increase on its own (4 or 8 bytes), but that increase will be more than offset by the next two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-17xfs: remove the if_ext_max field in struct xfs_iforkChristoph Hellwig
We spent a lot of effort to maintain this field, but it always equals to the fork size divided by the constant size of an extent. The prime use of it is to assert that the two stay in sync. Just divide the fork size by the extent size in the few places that we actually use it and remove the overhead of maintaining it. Also introduce a few helpers to consolidate the places where we actually care about the value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2012-01-16Merge tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
NFS client bugfixes and cleanups for Linux 3.3 (pull 2) * tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pnfsblock: alloc short extent before submit bio pnfsblock: remove rpc_call_ops from struct parallel_io pnfsblock: move find lock page logic out of bl_write_pagelist pnfsblock: cleanup bl_mark_sectors_init pnfsblock: limit bio page count pnfsblock: don't spinlock when freeing block_dev pnfsblock: clean up _add_entry pnfsblock: set read/write tk_status to pnfs_error pnfsblock: acquire im_lock in _preload_range NFS4: fix compile warnings in nfs4proc.c nfs: check for integer overflow in decode_devicenotify_args() NFS: cleanup endian type in decode_ds_addr() NFS: add an endian notation
2012-01-16Btrfs: use larger system chunksChris Mason
system chunks by default are very small. This makes them slightly larger and also fixes the conditional checks to make sure we don't allocate a billion of them at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Btrfs: add a delalloc mutex to inodes for delalloc reservationsJosef Bacik
I was using i_mutex for this, but we're getting bogus lockdep warnings by doing that and theres no real way to get rid of those, so just stop using i_mutex to protect delalloc metadata reservations and use a delalloc mutex instead. This shouldn't be contended often at all, only if you are writing and mmap writing to the file at the same time. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16Btrfs: space leak tracepointsJosef Bacik
This in addition to a script in my btrfs-tracing tree will help track down space leaks when we're getting space left over in block groups on umount. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16Btrfs: protect orphan block rsv with spin_lockJosef Bacik
We've been seeing warnings coming out of the orphan commit stuff forever from ceph. Turns out it's because we're racing with checking if the orphan block reserve is set, because we clear it outside of the spin_lock. So leave the normal fastpath checks where they are, but take the spin_lock and _recheck_ to make sure we haven't had an orphan block rsv added in the meantime. Then clear the root's orphan block rsv and release the lock. With this patch a user said the warnings went away and they usually showed up pretty soon after he started ceph. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16Btrfs: add allocator tracepointsJosef Bacik
I used these tracepoints when figuring out what the cluster stuff was doing, so add them to mainline in case we need to profile this stuff again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-01-16Btrfs: don't call btrfs_throttle in file writeJosef Bacik
Btrfs_throttle will make us wait if there is a currently committing transaction until we can open new transactions, which is ridiculous since we don't actually start any transactions within the file write path anyway, so all this does is introduce big latencies if we have a sync/fsync heavy workload going on while somebody else is trying to do work. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Btrfs: release space on error in page_mkwriteJosef Bacik
If updating the inode gave us an ENOSPC we were just returning in page_mkwrite, which is a problem since we make our reservation right before trying to update the inode, so fix the out label so that we actually free our reservation. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Btrfs: fix btrfsck error 400 when truncating a compressedMiao Xie
Reproduce steps: # mkfs.btrfs /dev/sdb5 # mount /dev/sdb5 -o compress=lzo /mnt # dd if=/dev/zero of=/mnt/tmpfile bs=128K count=1 # sync # truncate -s 64K /mnt/tmpfile root 5 inode 257 errors 400 This is because of the wrong if condition, which is used to check if we should subtract the bytes of the dropped range from i_blocks/i_bytes of i-node or not. When we truncate a compressed extent, btrfs substracts the bytes of the whole extent, it's wrong. We should substract the real size that we truncate, no matter it is a compressed extent or not. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>