summaryrefslogtreecommitdiff
path: root/fs/btrfs/ioctl.c
AgeCommit message (Collapse)Author
2012-10-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
2012-10-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
2012-09-27switch simple cases of fget_light to fdgetAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-27switch btrfs_ioctl_clone() to fget_light()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-27switch btrfs_ioctl_snap_create_transid() to fget_light()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-21userns: Convert btrfs to use kuid/kgid where appropriateEric W. Biederman
Cc: Chris Mason <chris.mason@fusionio.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I've split out the big send/receive update from my last pull request and now have just the fixes in my for-linus branch. The send/recv branch will wander over to linux-next shortly though. The largest patches in this pull are Josef's patches to fix DIO locking problems and his patch to fix a crash during balance. They are both well tested. The rest are smaller fixes that we've had queued. The last rc came out while I was hacking new and exciting ways to recover from a misplaced rm -rf on my dev box, so these missed rc3." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits) Btrfs: fix that repair code is spuriously executed for transid failures Btrfs: fix ordered extent leak when failing to start a transaction Btrfs: fix a dio write regression Btrfs: fix deadlock with freeze and sync V2 Btrfs: revert checksum error statistic which can cause a BUG() Btrfs: remove superblock writing after fatal error Btrfs: allow delayed refs to be merged Btrfs: fix enospc problems when deleting a subvol Btrfs: fix wrong mtime and ctime when creating snapshots Btrfs: fix race in run_clustered_refs Btrfs: don't run __tree_mod_log_free_eb on leaves Btrfs: increase the size of the free space cache Btrfs: barrier before waitqueue_active Btrfs: fix deadlock in wait_for_more_refs btrfs: fix second lock in btrfs_delete_delayed_items() Btrfs: don't allocate a seperate csums array for direct reads Btrfs: do not strdup non existent strings Btrfs: do not use missing devices when showing devname Btrfs: fix that error value is changed by mistake Btrfs: lock extents as we map them in DIO ...
2012-08-28Btrfs: fix some endian bugs handling the root timesDan Carpenter
"trans->transid" is cpu endian but we want to store the data as little endian. "item->ctime.nsec" is only 32 bits, not 64. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-09Btrfs: remove mnt_want_write call in btrfs_mksubvolAlexander Block
We got a recursive lock in mksubvol because the caller already held a lock. I think we got into this due to a merge error. Commit a874a63 removed the mnt_want_write call from btrfs_mksubvol and added a replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid. Commit e7848683 however tried to move all calls to mnt_want_write above i_mutex. So somewhere while merging this, it got mixed up. The solution is to remove the mnt_want_write call completely from mksubvol. Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull second vfs pile from Al Viro: "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the deadlock reproduced by xfstests 068), symlink and hardlink restriction patches, plus assorted cleanups and fixes. Note that another fsfreeze deadlock (emergency thaw one) is *not* dealt with - the series by Fernando conflicts a lot with Jan's, breaks userland ABI (FIFREEZE semantics gets changed) and trades the deadlock for massive vfsmount leak; this is going to be handled next cycle. There probably will be another pull request, but that stuff won't be in it." Fix up trivial conflicts due to unrelated changes next to each other in drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c} * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) delousing target_core_file a bit Documentation: Correct s_umount state for freeze_fs/unfreeze_fs fs: Remove old freezing mechanism ext2: Implement freezing btrfs: Convert to new freezing mechanism nilfs2: Convert to new freezing mechanism ntfs: Convert to new freezing mechanism fuse: Convert to new freezing mechanism gfs2: Convert to new freezing mechanism ocfs2: Convert to new freezing mechanism xfs: Convert to new freezing code ext4: Convert to new freezing mechanism fs: Protect write paths by sb_start_write - sb_end_write fs: Skip atime update on frozen filesystem fs: Add freezing handling to mnt_want_write() / mnt_drop_write() fs: Improve filesystem freezing handling switch the protection of percpu_counter list to spinlock nfsd: Push mnt_want_write() outside of i_mutex btrfs: Push mnt_want_write() outside of i_mutex fat: Push mnt_want_write() outside of i_mutex ...
2012-07-30btrfs: Push mnt_want_write() outside of i_mutexJan Kara
When mnt_want_write() starts to handle freezing it will get a full lock semantics requiring proper lock ordering. So push mnt_want_write() call consistently outside of i_mutex. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-25Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linusChris Mason
This is the kernel portion of btrfs send/receive Conflicts: fs/btrfs/Makefile fs/btrfs/backref.h fs/btrfs/ctree.c fs/btrfs/ioctl.c fs/btrfs/ioctl.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receiveAlexander Block
This patch introduces the BTRFS_IOC_SEND ioctl that is required for send. It allows btrfs-progs to implement full and incremental sends. Patches for btrfs-progs will follow. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25Btrfs: introduce subvol uuids and timesAlexander Block
This patch introduces uuids for subvolumes. Each subvolume has it's own uuid. In case it was snapshotted, it also contains parent_uuid. In case it was received, it also contains received_uuid. It also introduces subvolume ctime/otime/stime/rtime. The first two are comparable to the times found in inodes. otime is the origin/creation time and ctime is the change time. stime/rtime are only valid on received subvolumes. stime is the time of the subvolume when it was sent. rtime is the time of the subvolume when it was received. Additionally to the times, we have a transid for each time. They are updated at the same place as the times. btrfs receive uses stransid and rtransid to find out if a received subvolume changed in the meantime. If an older kernel mounts a filesystem with the extented fields, all fields become invalid. The next mount with a new kernel will detect this and reset the fields. Signed-off-by: Alexander Block <ablock84@googlemail.com> Reviewed-by: David Sterba <dave@jikos.cz> Reviewed-by: Arne Jansen <sensille@gmx.net> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25Btrfs: Check INCOMPAT flags on remount and add helper functionMitch Harder
In support of the recently added capability to remount with lzo compression, provide a helper function to check the compression INCOMPAT flags when remounting with lzo compression, and set the flags if necessary. Also, implement the new helper function when defragmenting with explicit lzo compression and when setting the default subvolume. Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linusChris Mason
Conflicts: fs/btrfs/ioctl.c fs/btrfs/ioctl.h fs/btrfs/transaction.c fs/btrfs/transaction.h Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25btrfs: allow cross-subvolume file cloneDavid Sterba
Lift the EXDEV condition and allow different root trees for files being cloned, then pass source inode's root when searching for extents. Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from one filesystem are mounted separately. Signed-off-by: David Sterba <dsterba@suse.cz>
2012-07-23Btrfs: do not set subvolume flags in readonly modeLiu Bo
$ mkfs.btrfs /dev/sdb7 $ btrfstune -S1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs mount: block device /dev/sdb7 is write-protected, mounting read-only $ btrfs dev add /dev/sdb8 /mnt/btrfs/ Now we get a btrfs in which mnt flags has readonly but sb flags does not. So for those ioctls that only check sb flags with MS_RDONLY, it is going to be a problem. Setting subvolume flags is such an ioctl, we should use mnt_want_write_file() to check RO flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-07-23Btrfs: use mnt_want_write_file instead of mnt_want_writeLiu Bo
mnt_want_write_file is faster when file has been opened for write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-07-23Btrfs: remove redundant r/o check for superblockLiu Bo
mnt_want_write() and mnt_want_write_file() will check sb->s_flags with MS_RDONLY, and we don't need to do it ourselves. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-07-23Btrfs: check write access to mount earlier while creating snapshotsLiu Bo
Move check of write access to mount into upper functions so that we can use mnt_want_write_file instead, which is faster than mnt_want_write. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-07-23btrfs: join DEV_STATS ioctls to oneDavid Sterba
Commit c11d2c236cc260b36 (Btrfs: add ioctl to get and reset the device stats) introduced two ioctls doing almost the same thing distinguished by just the ioctl number which encodes "do reset after read". I have suggested http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html to implement it via the ioctl args. This hasn't happen, and I think we should use a more clean way to pass flags and should not waste ioctl numbers. CC: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: David Sterba <dsterba@suse.cz>
2012-07-23btrfs: ignore unfragmented file checks in defrag when compression enabled - ↵Andrew Mahone
rebased Rebased on btrfs-next and retested. Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip checks for adjacent extents and extent size when deciding whether to defrag, as these can prevent an uncompressed and unfragmented file from being compressed as requested. Signed-off-by: Andrew Mahone <andrew.mahone@gmail.com>
2012-07-22btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-12Btrfs: add qgroup inheritanceArne Jansen
When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-12Btrfs: add qgroup ioctlsArne Jansen
Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-06-16Btrfs: cast devid to unsigned long long for printk %lluChris Mason
Avoid warning in 32 bit machines Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: do not resize a seeding deviceLiu Bo
Seeding devices are not supposed to change any more. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-06-15Btrfs: fix defrag regressionLi Zefan
If a file has 3 small extents: | ext1 | ext2 | ext3 | Running "btrfs fi defrag" will only defrag the last two extents, if those extent mappings hasn't been read into memory from disk. This bug was introduced by commit 17ce6ef8d731af5edac8c39e806db4c7e1f6956f ("Btrfs: add a check to decide if we should defrag the range") The cause is, that commit looked into previous and next extents using lookup_extent_mapping() only. While at it, remove the code that checks the previous extent, since it's sufficient to check the next extent. Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-06-15Btrfs: use rcu to protect device->nameJosef Bacik
Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-31Merge branch 'for-chris' of git://git.jan-o-sch.net/btrfs-unstable into ↵Chris Mason
for-linus Conflicts: fs/btrfs/ulist.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-30Btrfs: add ioctl to get and reset the device statsStefan Behrens
An ioctl interface is added to get the device statistic counters. A second ioctl is added to atomically get and reset these counters. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-30Btrfs: do not do balance in readonly modeLiu Bo
In normal cases, we would not be allowed to do balance in RO mode. However, when we're using a seeding device and adding another device to sprout, things will change: $ mkfs.btrfs /dev/sdb7 $ btrfstune -S 1 /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs -o ro $ btrfs fi bal /mnt/btrfs -----------------------> fail. $ btrfs dev add /dev/sdb8 /mnt/btrfs $ btrfs fi bal /mnt/btrfs -----------------------> works! It should not be designed as an exception, and we'd better add another check for mnt flags. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com>
2012-05-30Btrfs: NUL-terminate path buffer in DEV_INFO ioctl resultJim Meyering
A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer would not be NUL-terminated in the DEV_INFO ioctl result buffer. Signed-off-by: Jim Meyering <meyering@redhat.com>
2012-05-30Fix minor type issuesDaniel J Blueman
Address some minor type issues identified by sparse checker. Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-05-30Btrfs: use i_version instead of our own sequenceJosef Bacik
We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-26Btrfs: don't set for_cow parameter for tree block functionsJan Schmidt
Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed parameter for_cow = 1. In fact, these two functions should never mark their tree modification operations as for_cow, because they can change the number of blocks referenced by a tree. Hence, we remove the extra for_cow parameter from these functions and make them pass a zero down. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18Btrfs: fix btrfs_ioctl_dev_info() crash on missing deviceStefan Behrens
When a filesystem is mounted with the degraded option, it is possible that some of the devices are not there. btrfs_ioctl_dev_info() crashs in this case because the device name is a NULL pointer. This ioctl was only used for scrub. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-03-29Btrfs: update to the right index of defragmentLiu Bo
When we use autodefrag, we forget to update the index which indicates the last page we've dirty. And we'll set dirty flags on a same set of pages again and again. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: do not bother to defrag an extent if it is a big real extentLiu Bo
$ mkfs.btrfs /dev/sdb7 $ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3072 10 eof /mnt/btrfs/foobar: 1 extent found Now we have a big real extent [0, 40960), but autodefrag will still defrag it. $ sync $ filefrag -v /mnt/btrfs/foobar Filesystem type is: 9123683e File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096) ext logical physical expected length flags 0 0 3082 10 eof /mnt/btrfs/foobar: 1 extent found So if we already find a big real extent, we're ok about that, just skip it. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: add a check to decide if we should defrag the rangeLiu Bo
If our file's layout is as follows: | hole | data1 | hole | data2 | we do not need to defrag this file, because this file has holes and cannot be merged into one extent. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: fix the mismatch of page->mappingLiu Bo
commit 600a45e1d5e376f679ff9ecc4ce9452710a6d27c (Btrfs: fix deadlock on page lock when doing auto-defragment) fixes the deadlock on page, but it also introduces another bug. A page may have been truncated after unlock & lock. So we need to find it again to get the right one. And since we've held i_mutex lock, inode size remains unchanged and we can drop isize overflow checks. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Btrfs: fix race between direct io and autodefragLiu Bo
The bug is from running xfstests 209 with autodefrag. The race is as follows: t1 t2(autodefrag) direct IO invalidate pagecache dio(old data) add_inode_defrag invalidate pagecache endio direct IO invalidate pagecache run_defrag readpage(old data) set page dirty (old data) dio(new data, rewrite) invalidate pagecache (*) endio t2(autodefrag) will get old data into pagecache via readpage and set pagecache dirty. Meanwhile, invalidate pagecache(*) will fail due to dirty flags in pages. So the old data may be flushed into disk by flush thread, which will lead to data loss. And so does the case of user defragment progs. The patch fixes this race by holding i_mutex when we readpage and set page dirty. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-29Merge git://git.jan-o-sch.net/btrfs-unstable into for-linusChris Mason
Conflicts: fs/btrfs/transaction.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-03-27Btrfs: fix regression in scrub path resolvingJan Schmidt
In commit 4692cf58 we introduced new backref walking code for btrfs. This assumes we're searching live roots, which requires a transaction context. While scrubbing, however, we must not join a transaction because this could deadlock with the commit path. Additionally, what scrub really wants to do is resolving a logical address in the commit root it's currently checking. This patch adds support for logical to path resolving on commit roots and makes scrub use that. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-03-22btrfs: replace many BUG_ONs with proper error handlingJeff Mahoney
btrfs currently handles most errors with BUG_ON. This patch is a work-in- progress but aims to handle most errors other than internal logic errors and ENOMEM more gracefully. This iteration prevents most crashes but can run into lockups with the page lock on occasion when the timing "works out." Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2012-03-22btrfs: Don't BUG_ON errors from btrfs_create_subvol_root()Mark Fasheh
This is called from only one place - create_subvol() which passes errors safely back out to it's caller, btrfs_mksubvol where they are handled. Additionally, btrfs_create_subvol_root() itself bug's needlessly from error return of btrfs_update_inode(). Since create_subvol() was fixed to catch errors we can bubble this one up too. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2012-03-22btrfs: drop gfp_t from lock_extentJeff Mahoney
lock_extent and unlock_extent are always called with GFP_NOFS, drop the argument and use GFP_NOFS consistently. Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2012-02-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...
2012-02-23Btrfs: add extra sanity checks on the path names in btrfs_mksubvolChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>