summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2013-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixup from Chris Mason: "Geert and James both sent this one in, sorry guys" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs/raid56: Add missing #include <linux/vmalloc.h>
2013-03-03btrfs/raid56: Add missing #include <linux/vmalloc.h>Geert Uytterhoeven
tilegx_defconfig: fs/btrfs/raid56.c: In function 'btrfs_alloc_stripe_hash_table': fs/btrfs/raid56.c:206:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] fs/btrfs/raid56.c:206:9: warning: assignment makes pointer from integer without a cast [enabled by default] fs/btrfs/raid56.c:226:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "The biggest feature in the pull is the new (and still experimental) raid56 code that David Woodhouse started long ago. I'm still working on the parity logging setup that will avoid inconsistent parity after a crash, so this is only for testing right now. But, I'd really like to get it out to a broader audience to hammer out any performance issues or other problems. scrub does not yet correct errors on raid5/6 either. Josef has another pass at fsync performance. The big change here is to combine waiting for metadata with waiting for data, which is a big latency win. It is also step one toward using atomics from the hardware during a commit. Mark Fasheh has a new way to use btrfs send/receive to send only the metadata changes. SUSE is using this to make snapper more efficient at finding changes between snapshosts. Snapshot-aware defrag is also included. Otherwise we have a large number of fixes and cleanups. Eric Sandeen wins the award for removing the most lines, and I'm hoping we steal this idea from XFS over and over again." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits) btrfs: fixup/remove module.h usage as required Btrfs: delete inline extents when we find them during logging btrfs: try harder to allocate raid56 stripe cache Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails Btrfs: fix missing deleted items in btrfs_clean_quota_tree btrfs: use only inline_pages from extent buffer Btrfs: fix wrong reserved space when deleting a snapshot/subvolume Btrfs: fix wrong reserved space in qgroup during snap/subv creation Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot btrfs: remove a printk from scan_one_device Btrfs: fix NULL pointer after aborting a transaction Btrfs: fix memory leak of log roots Btrfs: copy everything if we've created an inline extent btrfs: cleanup for open-coded alignment Btrfs: do not change inode flags in rename Btrfs: use reserved space for creating a snapshot clear chunk_alloc flag on retryable failure ...
2013-03-01btrfs: fixup/remove module.h usage as requiredPaul Gortmaker
We want to avoid module.h where posible, since it in turn includes nearly all of header space. This means removing it where it is not required, and using export.h where we are only exporting symbols via EXPORT_SYMBOL and friends. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-01Btrfs: delete inline extents when we find them during loggingJosef Bacik
Apparently when we do inline extents we allow the data to overlap the last chunk of the btrfs_file_extent_item, which means that we can possibly have a btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item. This messes with us when we try to overwrite the extent when logging new extents since we expect for it to be the right size. To fix this just delete the item and try to do the insert again which will give us the proper sized btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer would blow up because we're trying to write past the end of the leaf. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01btrfs: try harder to allocate raid56 stripe cacheDavid Sterba
The stripe hash table is large, starting with allocation order 4 and can go as high as order 7 in case lock debugging is turned on and structure padding happens. Observed mount failure: mount: page allocation failure: order:7, mode:0x200050 Pid: 8234, comm: mount Tainted: G W 3.8.0-default+ #267 Call Trace: [<ffffffff81114353>] warn_alloc_failed+0xf3/0x140 [<ffffffff811171d2>] ? __alloc_pages_direct_compact+0x92/0x250 [<ffffffff81117ac3>] __alloc_pages_nodemask+0x733/0x9d0 [<ffffffff81152878>] ? cache_alloc_refill+0x3f8/0x840 [<ffffffff811528bc>] cache_alloc_refill+0x43c/0x840 [<ffffffff811302eb>] ? is_kernel_percpu_address+0x4b/0x90 [<ffffffffa00a00ac>] ? btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs] [<ffffffff811531d7>] kmem_cache_alloc_trace+0x247/0x270 [<ffffffffa00a00ac>] btrfs_alloc_stripe_hash_table+0x5c/0x130 [btrfs] [<ffffffffa003133f>] open_ctree+0xb2f/0x1f90 [btrfs] [<ffffffff81397289>] ? string+0x49/0xe0 [<ffffffff813987b3>] ? vsnprintf+0x443/0x5d0 [<ffffffffa0007cb6>] btrfs_mount+0x526/0x600 [btrfs] [<ffffffff8115127c>] ? cache_alloc_debugcheck_after+0x4c/0x200 [<ffffffff81162b90>] mount_fs+0x20/0xe0 [<ffffffff8117db26>] vfs_kern_mount+0x76/0x120 [<ffffffff811801b6>] do_mount+0x386/0x980 [<ffffffff8112a5cb>] ? strndup_user+0x5b/0x80 [<ffffffff81180840>] sys_mount+0x90/0xe0 [<ffffffff81962e99>] system_call_fastpath+0x16/0x1b Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logicWang Shilong
The original code is a little confusing and not clear, The right way to deal with the kernel code like this: [...] if (ret) goto out; [...] So i move the common clean_up code to the place labeled with out_fail, this will be easier to maintain. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve failsWang Shilong
commit eb6b88d92c6df083dd09a8c471011e3788dfd7c6 leads into another bug. If it is just because qgroup_reserve fails, the function btrfs_qgroup_free should not be called, otherwise, it will cause the wrong quota accounting. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01Btrfs: remove reduplicate check about root in the function ↵Wang Shilong
btrfs_clean_quota_tree The check work has been done just before the function btrfs_clean_quota_tree is called, it is not necessary to check it again, remove it. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path failsWang Shilong
Return ENOMEM rather trigger BUG_ON, fix it. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-01Btrfs: fix missing deleted items in btrfs_clean_quota_treeWang Shilong
Steps to reproduce: i=0 ncases=100 mkfs.btrfs <disk> mount <disk> <mnt> btrfs quota enable <mnt> btrfs qgroup create 2/1 <mnt> while [ $i -le $ncases ] do btrfs qgroup create 1/$i <mnt> btrfs qgroup assign 1/$i 2/1 <mnt> i=$(($i+1)) done btrfs quota disable <mnt> umount <mnt> btrfsck <mnt> You can also use the commands: btrfs-debug-tree <disk> | grep QGROUP You will find there are still items existed.The reasons why this happens is because the original code just checks slots[0]==0 and returns. We try to fix it by deleting the leaf one by one. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Merge tag 'writeback-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux Pull writeback fixes from Wu Fengguang: "Two writeback fixes - fix negative (setpoint - dirty) in 32bit archs - use down_read_trylock() in writeback_inodes_sb(_nr)_if_idle()" * tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: Negative (setpoint-dirty) in bdi_position_ratio() vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them
2013-02-28btrfs: use only inline_pages from extent bufferDavid Sterba
The nodesize is capped at 64k and there are enough pages preallocated in extent_buffer::inline_pages. The fallback to kmalloc never happened because even on the smallest page size considered (4k) inline_pages covered the needs. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: fix wrong reserved space when deleting a snapshot/subvolumeMiao Xie
When deleting a snapshot/subvolume, we need remove root ref/backref, dir entries and update the dir inode, so we must reserve free space for those operations. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: fix wrong reserved space in qgroup during snap/subv creationMiao Xie
There are two problems in the space reservation of the snapshot/ subvolume creation. - don't reserve the space for the root item insertion - the space which is reserved in the qgroup is different with the free space reservation. we need reserve free space for 7 items, but in qgroup reservation, we need reserve space only for 3 items. So we implement new metadata reservation functions for the snapshot/subvolume creation. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshotMiao Xie
Since we have grabbed the parent inode at the beginning of the snapshot creation, and both sync and async snapshot creation release it after the pending snapshots are actually created, it is safe to access the parent inode directly during the snapshot creation, we needn't use dget_parent/dput to fix the parent dentry and get the dir inode. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28btrfs: remove a printk from scan_one_deviceDavid Sterba
Dave pointed out that he saw messages from btrfs although there was no such filesystem on his computers. The automatic device scan is called on every new blockdevice if the usual distro udev rule set is used. The printk introduced in 6f60cbd3ae442c was a remainder from copying portions of code from btrfs_get_bdev_and_sb which is used under different conditions and the warning makes sense there. Reported-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: fix NULL pointer after aborting a transactionLiu Bo
While doing cleanup work on an aborted transaction, we've set the global running transaction pointer to NULL _before_ waiting all other transaction handles to finish, so others'd hit NULL pointer crash when referencing the global running transaction pointer. This first sets a hint to avoid new transaction handle joining, then waits other existing handles to abort or finish so that we can safely set the above global pointer to NULL. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: fix memory leak of log rootsLiu Bo
When we abort a transaction while fsyncing, we'll skip freeing log roots part of committing a transaction, which leads to memory leak. This adds a 'free log roots' in putting super when no more users hold references on log roots, so it's safe and clean. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-28Btrfs: copy everything if we've created an inline extentJosef Bacik
I noticed while looking into a tree logging bug that we aren't logging inline extents properly. Since this requires copying and it shouldn't happen too often just force us to copy everything for the inode into the tree log when we have an inline extent. With this patch we have valid data after a crash when we write an inline extent. Thanks, Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-26btrfs: cleanup for open-coded alignmentQu Wenruo
Though most of the btrfs codes are using ALIGN macro for page alignment, there are still some codes using open-coded alignment like the following: ------ u64 mask = ((u64)root->stripesize - 1); u64 ret = (val + mask) & ~mask; ------ Or even hidden one: ------ num_bytes = (end - start + blocksize) & ~(blocksize - 1); ------ Sometimes these open-coded alignment is not so easy to understand for newbie like me. This commit changes the open-coded alignment to the ALIGN macro for a better readability. Also there is a previous patch from David Sterba with similar changes, but the patch is for 3.2 kernel and seems not merged. http://www.spinics.net/lists/linux-btrfs/msg12747.html Cc: David Sterba <dave@jikos.cz> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: do not change inode flags in renameLiu Bo
Before we forced to change a file's NOCOW and COMPRESS flag due to the parent directory's, but this ends up a bad idea, because it confuses end users a lot about file's NOCOW status, eg. if someone change a file to NOCOW via 'chattr' and then rename it in the current directory which is without NOCOW attribute, the file will lose the NOCOW flag silently. This diables 'change flags in rename', so from now on we'll only inherit flags from the parent directory on creation stage while in other places we can use 'chattr' to set NOCOW or COMPRESS flags. Reported-by: Marios Titas <redneb8888@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: use reserved space for creating a snapshotLiu Bo
While inserting dir index and updating inode for a snapshot, we'd add delayed items which consume trans->block_rsv, if we don't have any space reserved in this trans handle, we either just return or reserve space again. But before creating pending snapshots during committing transaction, we've done a release on this trans handle, so we don't have space reserved in it at this stage. What we're using is block_rsv of pending snapshots which has already reserved well enough space for both inserting dir index and updating inode, so we need to set trans handle to indicate that we have space now. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26clear chunk_alloc flag on retryable failureAlexandre Oliva
I've experienced filesystem freezes with permanent spikes in the active process count for quite a while, particularly on filesystems whose available raw space has already been fully allocated to chunks. While looking into this, I found a pretty obvious error in do_chunk_alloc: it sets space_info->chunk_alloc, but if btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving that flag set, which causes any other threads waiting for space_info->chunk_alloc to become zero to spin indefinitely. I haven't double-checked that this patch fixes the failure I've observed fully (it's not exactly trivial to trigger), but it surely is a bug and the fix is trivial, so... Please put it in :-) What I saw in that function also happens to explain why in some cases I see filesystems allocate a huge number of chunks that remain unused (leading to the scenario above, of not having more chunks to allocate). It happens for data and metadata, but not necessarily both. I'm guessing some thread sets the force_alloc flag on the corresponding space_info, and then several threads trying to get disk space end up attempting to allocate a new chunk concurrently. All of them will see the force_alloc flag and bump their local copy of force up to the level they see first, and they won't clear it even if another thread succeeds in allocating a chunk, thus clearing the force flag. Then each thread that observed the force flag will, on its turn, force the allocation of a new chunk. And any threads that come in while it does that will see the force flag still set and pick it up, and so on. This sounds like a problem to me, but... what should the correct behavior be? Clear force_flag once we copy it to a local force? Reset force to the incoming value on every loop? Set the flag to our incoming force if we have it at first, clear our local flag, and move it from the space_info when we determined that we are the thread that's going to perform the allocation? btrfs: clear chunk_alloc flag on retryable failure From: Alexandre Oliva <oliva@gnu.org> If btrfs_alloc_chunk fails with e.g. ENOMEM, we exit do_chunk_alloc without clearing chunk_alloc in space_info. As a result, any further calls to do_chunk_alloc on that filesystem will start busy-waiting for chunk_alloc to be cleared, but it never will be. This patch adjusts do_chunk_alloc so that it clears this flag in case of an error. Signed-off-by: Alexandre Oliva <oliva@gnu.org> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: fix backref walking race with tree deletionsJan Schmidt
When a subvolume is removed, we remove the root item from the root tree, while the tree blocks and backrefs remain for a while. When backref walking comes across one of those orphan tree blocks, it can find a backref for a no longer existing root. This is all good, we only must tolerate __resolve_indirect_ref returning an error and continue with the good refs found. Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com> Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26Btrfs: make sure NODATACOW also gets NODATASUM setJosef Bacik
A user reported hitting the BUG_ON() in btrfs_finished_ordered_io() where we had csums on a NOCOW extent. This can happen if we have NODATACOW set but not NODATASUM set, which can happen in two cases, either we mount with -o nodatacow and then write into preallocated space, or chattr +C a directory and move a file into that directory. Liu has fixed the move case in a different place, but this fixes the mount -o nodatacow case. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-26fs: encode_fh: return FILEID_INVALID if invalid fid_typeNamjae Jeon
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-23new helper: file_inode(file)Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...
2013-02-21Merge tag 'driver-core-3.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...
2013-02-21Btrfs: fix remount vs autodefragMiao Xie
If we remount the fs to close the auto defragment or make the fs R/O, we should stop the auto defragment. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-21Btrfs: fix wrong outstanding_extents when doing DIO writeMiao Xie
When running the 083th case of xfstests on the filesystem with "compress-force=lzo", the following WARNINGs were triggered. WARNING: at fs/btrfs/inode.c:7908 WARNING: at fs/btrfs/inode.c:7909 WARNING: at fs/btrfs/inode.c:7911 WARNING: at fs/btrfs/extent-tree.c:4510 WARNING: at fs/btrfs/extent-tree.c:4511 This problem was introduced by the patch "Btrfs: fix deadlock due to unsubmitted". In this patch, there are two bugs which caused the above problem. The 1st one is a off-by-one bug, if the DIO write return 0, it is also a short write, we need release the reserved space for it. But we didn't do it in that patch. Fix it by change "ret > 0" to "ret >= 0". The 2nd one is ->outstanding_extents was increased twice when a short write happened. As we know, ->outstanding_extents is a counter to keep track of the number of extent items we may use duo to delalloc, when we reserve the free space for a delalloc write, we assume that the write will introduce just one extent item, so we increase ->outstanding_extents by 1 at that time. And then we will increase it every time we split the write, it is done at the beginning of btrfs_get_blocks_direct(). So when a short write happens, we needn't increase ->outstanding_extents again. But this patch done. In order to fix the 2nd problem, I re-write the logic for ->outstanding_extents operation. We don't increase it at the beginning of btrfs_get_blocks_direct(), instead, we just increase it when the split actually happens. Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20Btrfs: snapshot-aware defragLiu Bo
This comes from one of btrfs's project ideas, As we defragment files, we break any sharing from other snapshots. The balancing code will preserve the sharing, and defrag needs to grow this as well. Now we're able to fill the blank with this patch, in which we make full use of backref walking stuff. Here is the basic idea, o set the writeback ranges started by defragment with flag EXTENT_DEFRAG o at endio, after we finish updating fs tree, we use backref walking to find all parents of the ranges and re-link them with the new COWed file layout by adding corresponding backrefs. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20Btrfs: fix max chunk size on raid5/6Chris Mason
We try to limit the size of a chunk to 10GB, which keeps the unit of work reasonable during balance and resize operations. The limit checks were taking into account the number of copies of the data we had but what they really should be doing is comparing against the logical size of the chunk we're creating. This moves the code around a little to use the count of data stripes from raid5/6. Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20btrfs: limit fallocate extent reservation to 256MBZach Brown
Very large fallocate requests are cpu bound and result in extents with a repeating pattern of ever decreasing size: $ time fallocate -l 1T file real 0m13.039s ( an excerpt of the extents from btrfs-debug-tree: ) prealloc data disk byte 1536292564992 nr 397312 prealloc data disk byte 1536292962304 nr 196608 prealloc data disk byte 1536293158912 nr 98304 prealloc data disk byte 1536293257216 nr 49152 prealloc data disk byte 1536293306368 nr 24576 prealloc data disk byte 1536293330944 nr 12288 prealloc data disk byte 1536293343232 nr 8192 prealloc data disk byte 1536293351424 nr 4096 prealloc data disk byte 1536293355520 nr 4096 prealloc data disk byte 1536293359616 nr 4096 The excessive cpu use comes from __btrfs_prealloc_file_range() trying to allocate the entire remaining size after each extent is allocated. btrfs_reserve_extent() repeatedly cuts this requested size in half until it gets down to the size that the allocators can return. We limit the problem for now by capping each reservation at 256 meg. The small extents come from a masking bug when decreasing the requested reservation size. The high 32bits are cleared and the remaining low bits might happen to reserve a small size. Fix this by using round_down() which properly casts the mask. After these fixes huge fallocate requests are fast and result in nice large extents: $ time fallocate -l 1T file real 0m0.082s prealloc data disk byte 1112425889792 nr 268435456 prealloc data disk byte 1112694325248 nr 268435456 prealloc data disk byte 1112962760704 nr 268435456 Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20btrfs: Init io_lock after cloning btrfs device structThomas Gleixner
__btrfs_close_devices() clones btrfs device structs with memcpy(). Some of the fields in the clone are reinitialized, but it's missing to init io_lock. In mainline this goes unnoticed, but on RT it leaves the plist pointing to the original about to be freed lock struct. Initialize io_lock after cloning, so no references to the original struct are left. Reported-and-tested-by: Mike Galbraith <efault@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-02-20Merge branch 'raid56-experimental' into for-linus-3.9Chris Mason
Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/ctree.h fs/btrfs/extent-tree.c fs/btrfs/inode.c fs/btrfs/volumes.c
2013-02-20Merge branch 'master' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 Signed-off-by: Chris Mason <chris.mason@fusionio.com> Conflicts: fs/btrfs/disk-io.c
2013-02-20Btrfs: fix missing release of qgroup reservation in commit_transaction()Miao Xie
We forget to free qgroup reservation in commit_transaction(),fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix missing check before disabling quotaWang Shilong
The original code forget to check whether quota has been disabled firstly, and it will return 'EINVAL' and return error to users if quota has been disabled,it will be unfriendly and confusing for users to see that. So just return directly if quota has been disabled. Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix cleaner thread not working with inode cache optionLiu Bo
Right now inode cache inode is treated as the same as space cache inode, ie. keep inode in memory till putting super. But this leads to an awkward situation. If we're going to delete a snapshot/subvolume, btrfs will not actually delete it and return free space, but will add it to dead roots list until the last inode on this snap/subvol being destroyed. Then we'll fetch deleted roots and cleanup them via cleaner thread. So here is the problem, if we enable inode cache option, each snap/subvol has a cached inode which is used to store inode allcation information. And this cache inode will be kept in memory, as the above said. So with inode cache, snap/subvol can only be added into dead roots list during freeing roots stage in umount, so that we can ONLY get space back after another remount(we cleanup dead roots on mount). But the real thing is we'll no more use the snap/subvol if we mark it deleted, so we can safely iput its cache inode when we delete snap/subvol. Another thing is that we need to change the rules of droping inode, we don't keep snap/subvol's cache inode in memory till end so that we can add snap/subvol into dead roots list in time. Reported-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix uncompleted transactionMiao Xie
In some cases, we need commit the current transaction, but don't want to start a new one if there is no running transaction, so we introduce the function - btrfs_attach_transaction(), which can catch the current transaction, and return -ENOENT if there is no running transaction. But no running transaction doesn't mean the current transction completely, because we removed the running transaction before it completes. In some cases, it doesn't matter. But in some special cases, such as freeze fs, we hope the transaction is fully on disk, it will introduce some bugs, for example, we may feeze the fs and dump the data in the disk, if the transction doesn't complete, we would dump inconsistent data. So we need fix the above problem for those cases. We fixes this problem by introducing a function: btrfs_attach_transaction_barrier() if we hope all the transaction is fully on the disk, even they are not running, we can use this function. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix the deadlock between the transaction start/attach and commitMiao Xie
Now btrfs_commit_transaction() does this ret = btrfs_run_ordered_operations(root, 0) which async flushes all inodes on the ordered operations list, it introduced a deadlock that transaction-start task, transaction-commit task and the flush workers waited for each other. (See the following URL to get the detail http://marc.info/?l=linux-btrfs&m=136070705732646&w=2) As we know, if ->in_commit is set, it means someone is committing the current transaction, we should not try to join it if we are not JOIN or JOIN_NOLOCK, wait is the best choice for it. In this way, we can avoid the above problem. In this way, there is another benefit: there is no new transaction handle to block the transaction which is on the way of commit, once we set ->in_commit. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: fix the qgroup reserved space is released prematurelyMiao Xie
In start_transactio(), we will try to join the transaction again after the current transaction is committed, so we should not release the reserved space of the qgroup. Fix it. Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20btrfs: define BTRFS_MAGIC as a u64 valueZach Brown
super.magic is an le64 but it's treated as an unterminated string when compared against BTRFS_MAGIC which is defined as a string. Instead define BTRFS_MAGIC as a normal hex value and use endian helpers to compare it to the super's magic. I tested this by mounting an fs made before the change and made sure that it didn't introduce sparse errors. This matches a similar cleanup that is pending in btrfs-progs. David Sterba pointed out that we should fix the kernel side as well :). Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: set/change the label of a mounted file systemjeff.liu
With this new ioctl(2) BTRFS_IOC_SET_FSLABEL, we can set/change the label of a mounted file system. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Goffredo Baroncelli <kreijack@inwind.it> Reviewed-by: David Sterba <dsterba@suse.cz> Reviewed-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: Add a new ioctl to get the label of a mounted file systemjeff.liu
Add a new ioctl(2) BTRFS_IOC_GET_FSLABLE, so that we can get the label upon a mounted filesystem. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Goffredo Baroncelli <kreijack@inwind.it> Cc: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: place ordered operations on a per transaction listJosef Bacik
Miao made the ordered operations stuff run async, which introduced a deadlock where we could get somebody (sync) racing in and committing the transaction while a commit was already happening. The new committer would try and flush ordered operations which would hang waiting for the commit to finish because it is done asynchronously and no longer inherits the callers trans handle. To fix this we need to make the ordered operations list a per transaction list. We can get new inodes added to the ordered operation list by truncating them and then having another process writing to them, so this makes it so that anybody trying to add an ordered operation _must_ start a transaction in order to add itself to the list, which will keep new inodes from getting added to the ordered operations list after we start committing. This should fix the deadlock and also keeps us from doing a lot more work than we need to during commit. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-02-20Btrfs: relax the block group size limit for bitmapsJosef Bacik
Dave pointed out that xfstests 273 will tell you that it failed to load the space cache for a block group when it remounts. This is because we run out of space writing out the block group cache. This is ok and is working as it should, but let's try to be a bit nicer. This happens because the block group was 100mb, but bitmap entries cover 128mb, so we were only getting extent entries for this block group, which ended up being too many to fit in the free space cache. So relax the bitmap size requirements to block groups that are at least half the size a bitmap will cover or larger, that way we can still keep the amount of space used in the free space cache low enough to be able to write it out. With this patch I no longer fail to write out the free space cache. Thanks, Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>