summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2012-05-30Btrfs: use i_version instead of our own sequenceJosef Bacik
We've been keeping around the inode sequence number in hopes that somebody would use it, but nobody uses it and people actually use i_version which serves the same purpose, so use i_version where we used the incore inode's sequence number and that way the sequence is updated properly across the board, and not just in file write. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-11Btrfs: cleanup: use consistent lock namingDan Carpenter
It confuses Smatch that we use two names for the same lock. Plus the shorter name is nicer. This doesn't change how the code works, it's just a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-05-11Btrfs: change integrity checker to support big blocksStefan Behrens
The integrity checker used to be coded for nodesize == leafsize == sectorsize == PAGE_CACHE_SIZE. This is now changed to support sizes for nodesize and leafsize which are N * PAGE_CACHE_SIZE. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-05-11Btrfs: remove the useless assignment to *entry in function tree_insert of ↵Wang Sheng-Hui
file extent_io.c In tree_insert, var *entry is used in the loop only, and is useless out of the loop. Remove the useless assignment after the loop. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-05-11Btrfs: fix the comment for find_first_extent_bitWang Sheng-Hui
The return value of find_first_extent_bit is 1 or 0, no < 0. And if found something, return 0; if nothing was found, return 1. Fix the comment. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-05-11Btrfs: fix btrfs_release_extent_buffer_page with the right usage of ↵Wang Sheng-Hui
num_extent_pages num_extent_pages returns the number of pages in the specific range, not the index of the last page in the eb range. btrfs_release_extent_buffer_page is called with start_idx set 0 in current codes, so it's not a problem yet. But the logic is indeed wrong. Fix it here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-05-11Btrfs: cleanup the comment for clear_state_bit in extent_io.cWang Sheng-Hui
No 'delete' arg is used for clear_state_bit. Cleanup the comment. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-05-11btrfs/ctree.c: remove the unnecessary 'return -1;' at the end of bin_searchWang Sheng-Hui
The code path should not reach there. Remove it. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The big ones here are a memory leak we introduced in rc1, and a scheduling while atomic if the transid on disk doesn't match the transid we expected. This happens for corrupt blocks, or out of date disks. It also fixes up the ioctl definition for our ioctl to resolve logical inode numbers. The __u32 was a merging error and doesn't match what we ship in the progs." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: avoid sleeping in verify_parent_transid while atomic Btrfs: fix crash in scrub repair code when device is missing btrfs: Fix mismatching struct members in ioctl.h Btrfs: fix page leak when allocing extent buffers Btrfs: Add properly locking around add_root_to_dirty_list
2012-05-06Btrfs: avoid sleeping in verify_parent_transid while atomicChris Mason
verify_parent_transid needs to lock the extent range to make sure no IO is underway, and so it can safely clear the uptodate bits if our checks fail. But, a few callers are using it with spinlocks held. Most of the time, the generation numbers are going to match, and we don't want to switch to a blocking lock just for the error case. This adds an atomic flag to verify_parent_transid, and changes it to return EAGAIN if it needs to block to properly verifiy things. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix crash in scrub repair code when device is missingStefan Behrens
Fix that when scrub tries to repair an I/O or checksum error and one of the devices containing the mirror is missing, it crashes in bio_add_page because the bdev is a NULL pointer for missing devices. Reported-by: Marco L. Crociani <marco.crociani@gmail.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04btrfs: Fix mismatching struct members in ioctl.hAlexander Block
Fix the size members of btrfs_ioctl_ino_path_args and btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used __u64 and the kernel headers used __u32 before. Signed-off-by: Alexander Block <ablock84@googlemail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: fix page leak when allocing extent buffersJosef Bacik
If we happen to alloc a extent buffer and then alloc a page and notice that page is already attached to an extent buffer, we will only unlock it and free our existing eb. Any pages currently attached to that eb will be properly freed, but we don't do the page_cache_release() on the page where we noticed the other extent buffer which can cause us to leak pages and I hope cause the weird issues we've been seeing in this area. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-05-04Btrfs: Add properly locking around add_root_to_dirty_listChris Mason
add_root_to_dirty_list happens once at the very beginning of the transaction, but it is still racey. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has our collection of bug fixes. I missed the last rc because I thought our patches were making NFS crash during my xfs test runs. Turns out it was an NFS client bug fixed by someone else while I tried to bisect it. All of these fixes are small, but some are fairly high impact. The biggest are fixes for our mount -o remount handling, a deadlock due to GFP_KERNEL allocations in readdir, and a RAID10 error handling bug. This was tested against both 3.3 and Linus' master as of this morning." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (26 commits) Btrfs: reduce lock contention during extent insertion Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir Btrfs: Fix space checking during fs resize Btrfs: fix block_rsv and space_info lock ordering Btrfs: Prevent root_list corruption Btrfs: fix repair code for RAID10 Btrfs: do not start delalloc inodes during sync Btrfs: fix that check_int_data mount option was ignored Btrfs: don't count CRC or header errors twice while scrubbing Btrfs: fix btrfs_ioctl_dev_info() crash on missing device btrfs: don't return EINTR Btrfs: double unlock bug in error handling Btrfs: always store the mirror we read the eb from fs/btrfs/volumes.c: add missing free_fs_devices btrfs: fix early abort in 'remount' Btrfs: fix max chunk size check in chunk allocator Btrfs: add missing read locks in backref.c Btrfs: don't call free_extent_buffer twice in iterate_irefs Btrfs: Make free_ipath() deal gracefully with NULL pointers Btrfs: avoid possible use-after-free in clear_extent_bit() ...
2012-04-27Btrfs: reduce lock contention during extent insertionChris Mason
We're spending huge amounts of time on lock contention during end_io processing because we unconditionally assume we are overwriting an existing extent in the file for each IO. This checks to see if we are outside i_size, and if so, it uses a less expensive readonly search of the btree to look for existing extents. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdirChris Mason
Btrfs has an optimization where it will preallocate dentries during readdir to fill in enough information to open the inode without an extra lookup. But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and that leads to deadlocks because our readdir code has tree locks held. For now, disable this optimization. We'll fix the gfp mask in the next merge window. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Fix space checking during fs resizeDaniel J Blueman
Fix out-of-space checking, addressing a warning and potential resource leak when resizing the filesystem down while allocating blocks. Signed-off-by: Daniel J Blueman <daniel@quora.org> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix block_rsv and space_info lock orderingStefan Behrens
may_commit_transaction() calls spin_lock(&space_info->lock); spin_lock(&delayed_rsv->lock); and update_global_block_rsv() calls spin_lock(&block_rsv->lock); spin_lock(&sinfo->lock); Lockdep complains about this at run time. Everywhere except in update_global_block_rsv(), the space_info lock is the outer lock, therefore the locking order in update_global_block_rsv() is changed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: Prevent root_list corruptionDaniel J Blueman
I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add correct locking to address this. Signed-off-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: fix repair code for RAID10Jan Schmidt
btrfs_map_block sets mirror_num, so that the repair code knows eventually which device gave us the read error. For RAID10, mirror_num must be 1 or 2. Before this fix mirror_num was incorrectly related to our stripe index. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-27Btrfs: do not start delalloc inodes during syncJosef Bacik
btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and start writing them out, but it doesn't splice the list or anything so as long as somebody is doing work on the box you could end up in this section _forever_. So just remove it, it's not needed anyway since sync will start writeback on all inodes anyway, all we need to do is wait for ordered extents and then we can commit the transaction. In my horrible torture test sync goes from taking 4 minutes to about 1.5 minutes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-18Btrfs: fix that check_int_data mount option was ignoredStefan Behrens
The bitfield member mount_opt was too small by one bit to hold the mount option that enabled to include data extents in the integrity checker. Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR option was added (git rebase silently merges so that the increase of the size of the bitfield member is lost), the bit limit was removed entirely. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18Btrfs: don't count CRC or header errors twice while scrubbingStefan Behrens
Each CRC or header error was counted twice, this is now fixed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18Btrfs: fix btrfs_ioctl_dev_info() crash on missing deviceStefan Behrens
When a filesystem is mounted with the degraded option, it is possible that some of the devices are not there. btrfs_ioctl_dev_info() crashs in this case because the device name is a NULL pointer. This ioctl was only used for scrub. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-04-18btrfs: don't return EINTRArne Jansen
It is basically a good thing if we are interruptible when waiting for free space, but the generality in which it is implemented currently leads to system calls being interruptible that are not documented this way. For example git can't handle interrupted unlink(), leading to corrupt repos under space pressure. Instead we raise the bar to only be interruptible by SIGKILL. Thanks to David Sterba for suggesting this. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18Btrfs: double unlock bug in error handlingDan Carpenter
The caller expects this function to return with the lock held and releases it immediately on error. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-04-18Btrfs: always store the mirror we read the eb fromJosef Bacik
A user reported a panic where we were trying to fix a bad mirror but the mirror number we were giving was 0, which is invalid. This is because we don't do the transid verification until after the read, so as far as the read code is concerned the read was a success. So instead store the mirror we read from so that if there is some failure post read we know which mirror to try next and which mirror needs to be fixed if we find a good copy of the block. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2012-04-18fs/btrfs/volumes.c: add missing free_fs_devicesJulia Lawall
Free fs_devices as done in the error-handling code just below. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2012-04-18btrfs: fix early abort in 'remount'Sergei Trofimovich
Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Josef Bacik <josef@redhat.com> Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2012-04-18Btrfs: fix max chunk size check in chunk allocatorIlya Dryomov
Fix a bug, where in case we need to adjust stripe_size so that the length of the resulting chunk is less than or equal to max_chunk_size, DUP chunks turn out to be only half as big as they could be. Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-04-18Btrfs: add missing read locks in backref.cJan Schmidt
iref_to_path and iterate_irefs both increment the eb's refcount to use it after releasing the path. Both depend on consistent data remaining in the extent buffer and need a read lock to protect it. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18Btrfs: don't call free_extent_buffer twice in iterate_irefsJan Schmidt
Avoid calling free_extent_buffer more than once when the iterator function returns non-zero. The only code that uses this is scrub repair for corrupted nodatasum blocks. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-04-18Btrfs: Make free_ipath() deal gracefully with NULL pointersJesper Juhl
Make free_ipath() behave like most other freeing functions in the kernel and gracefully do nothing when passed a NULL pointer. Besides this making the bahaviour consistent with functions such as kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that function we have this code: ... ipath = init_ipath(size, root, path); if (IS_ERR(ipath)) { ret = PTR_ERR(ipath); ipath = NULL; goto out; } ... out: btrfs_free_path(path); free_ipath(ipath); ... If we ever take the true branch of that 'if' statement we'll end up passing a NULL pointer to free_ipath() which will subsequently dereference it and we'll go "Boom" :-( This patch will avoid that. Signed-off-by: Jesper Juhl <jj@chaosbits.net>
2012-04-18Btrfs: avoid possible use-after-free in clear_extent_bit()Li Zefan
clear_extent_bit() { next_node = rb_next(&state->rb_node); ... clear_state_bit(state); <-- this may free next_node if (next_node) { state = rb_entry(next_node); ... } } clear_state_bit() calls merge_state() which may free the next node of the passing extent_state, so clear_extent_bit() may end up referencing freed memory. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18Btrfs: retrurn void from clear_state_bitLi Zefan
Currently it returns a set of bits that were cleared, but this return value is not used at all. Moreover it doesn't seem to be useful, because we may clear the bits of a few extent_states, but only the cleared bits of last one is returned. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-18btrfs: add missing unlocks to transaction abort pathsDavid Sterba
Added in commit 49b25e0540904be0bf558b84475c69d72e4de66e ("btrfs: enhance transaction abort infrastructure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
2012-04-18Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZELiu Bo
Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE. It will lead to hanging-on while writing something. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
2012-04-18btrfs: don't add both copies of DUP to reada extent treeArne Jansen
Normally when there are 2 copies of a block, we add both to the reada extent tree and prefetch only the one that is easier to reach. This way we can better utilize multiple devices. In case of DUP this makes no sense as both copies reside on the same device. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18btrfs: fix race in readaArne Jansen
When inserting into the radix tree returns EEXIST, get the existing entry without giving up the spinlock in between. There was a race for both the zones trees and the extent tree. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-04-18Btrfs: avoid setting ->d_op twiceLi Zefan
Follow those instructions, and you'll trigger a warning in the beginning of d_set_d_op(): # mkfs.btrfs /dev/loop3 # mount /dev/loop3 /mnt # btrfs sub create /mnt/sub # btrfs sub snap /mnt /mnt/snap # touch /mnt/snap/sub touch: cannot touch `tmp': Permission denied __d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and then simple_lookup() reset it to simple_dentry_operations, which triggered the warning. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A bunch of endianness fixes and a couple of nfsd error value fixes. Speaking of endianness stuff, I'm rather tempted to slap ccflags-y += -D__CHECK_ENDIAN__ in fs/Makefile, if not making it default for the entire tree; nfsd regressions I've caught make one hell of a pile and we'd obviously benefit from having that kind of stuff caught earlier..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: lockd: fix the endianness bug ocfs2: ->e_leaf_clusters endianness breakage ocfs2: ->rl_count endianness breakage ocfs: ->rl_used breakage on big-endian ocfs2: ->l_next_free_req breakage on big-endian btrfs: btrfs_root_readonly() broken on big-endian ext4: fix endianness breakage in ext4_split_extent_at() nfsd: fix compose_entry_fh() failure exits nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid() nfsd: fix endianness breakage in TEST_STATEID handling nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails nfsd: fix b0rken error value for setattr on read-only mount
2012-04-14Merge branch 'for-linus-min' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull the minimal btrfs branch from Chris Mason: "We have a use-after-free in there, along with errors when mount -o discard is enabled, and a BUG_ON(we should compile with UP more often)." * 'for-linus-min' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use commit root when loading free space cache Btrfs: fix use-after-free in __btrfs_end_transaction Btrfs: check return value of bio_alloc() properly Btrfs: remove lock assert from get_restripe_target() Btrfs: fix eof while discarding extents Btrfs: fix uninit variable in repair_eb_io_failure Revert "Btrfs: increase the global block reserve estimates"
2012-04-13btrfs: btrfs_root_readonly() broken on big-endianAl Viro
->root_flags is __le64 and all accesses to it go through the helpers that do proper conversions. Except for btrfs_root_readonly(), which checks bit 0 as in host-endian... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13Btrfs: use commit root when loading free space cacheJosef Bacik
A user reported that booting his box up with btrfs root on 3.4 was way slower than on 3.3 because I removed the ideal caching code. It turns out that we don't load the free space cache if we're in a commit for deadlock reasons, but since we're reading the cache and it hasn't changed yet we are safe reading the inode and free space item from the commit root, so do that and remove all of the deadlock checks so we don't unnecessarily skip loading the free space cache. The user reported this fixed the slowness. Thanks, Tested-by: Calvin Walton <calvin.walton@kepstin.ca> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix use-after-free in __btrfs_end_transactionDave Jones
49b25e0540904be0bf558b84475c69d72e4de66e introduced a use-after-free bug that caused spurious -EIO's to be returned. Do the check before we free the transaction. Cc: David Sterba <dsterba@suse.cz> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: check return value of bio_alloc() properlyTsutomu Itoh
bio_alloc() has the possibility of returning NULL. So, it is necessary to check the return value. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: remove lock assert from get_restripe_target()Ilya Dryomov
This fixes a regression introduced by fc67c450. spin_is_locked() always returns 0 on UP kernels, which caused assert in get_restripe_target() to be fired on every call from btrfs_reduce_alloc_profile() on UP systems. Remove it completely for now, it's not clear if it's going to be needed in future. Reported-by: Bobby Powers <bobbypowers@gmail.com> Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Tested-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix eof while discarding extentsLiu Bo
We miscalculate the length of extents we're discarding, and it leads to an eof of device. Reported-by: Daniel Blueman <daniel@quora.org> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-04-12Btrfs: fix uninit variable in repair_eb_io_failureChris Mason
We'd have to be passing bogus extent buffers for this uninit variable to actually be used, but set it to zero just in case. Signed-off-by: Chris Mason <chris.mason@oracle.com>