summaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2012-12-12Btrfs: don't allow degraded mount if too many devices are missingStefan Behrens
The current behavior is to allow mounting or remounting a filesystem writeable in degraded mode if at least one writeable device is present. The next failed write access to a missing device which is above the tolerance of the configured level of redundancy results in an read-only enforcement. Even without this, the next time barrier_all_devices() is called and more devices are missing than tolerable, the switch to read-only mode takes place. In order to behave predictably and to provide proper feedback to the user at mount time, this patch compares the number of missing devices with the number of devices that are tolerated to be missing according to the configured RAID level. If more devices are missing than tolerated, e.g. if two devices are missing in case of RAID1, only a read-only mount and remount is allowed. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-12Btrfs: Fix typo in fs/btrfsMasanari Iida
Correct spelling typo in btrfs. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-12Btrfs: Remove the invalid shrink size check up from btrfs_shrink_dev()jeff.liu
Remove an invalid size check up from btrfs_shrink_dev(). The new size should not larger than the device->total_bytes as it was already verified before coming to here(i.e. new_size < old_size). Remove invalid check up for btrfs_shrink_dev(). Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-12writeback: remove nr_pages_dirtied arg from balance_dirty_pages_ratelimited_nr()Namjae Jeon
There is no reason to pass the nr_pages_dirtied argument, because nr_pages_dirtied value from the caller is unused in balance_dirty_pages_ratelimited_nr(). Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11Btrfs: make ordered extent be flushed by multi-taskMiao Xie
Though the process of the ordered extents is a bit different with the delalloc inode flush, but we can see it as a subset of the delalloc inode flush, so we also handle them by flush workers. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: make ordered operations be handled by multi-taskMiao Xie
The process of the ordered operations is similar to the delalloc inode flush, so we handle them by flush workers. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: make delalloc inodes be flushed by multi-taskMiao Xie
This patch introduce a new worker pool named "flush_workers", and if we want to force all the inode with pending delalloc to the disks, we can queue those inodes into the work queue of the worker pool, in this way, those inodes will be flushed by multi-task. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: fill the global reserve when unpinning spaceJosef Bacik
Dave gave me an image of a very full file system that would abort the transaction because it ran out of space while committing the transaction. This is because we would think there was plenty of room to create a snapshot even though the global reserve was not full. This happens because we calculate the global reserve size before we unpin any space, so after we unpin the space we allow reservations to occur even though we haven't reserved all of the space for our global reserve. Fix this by adding to the global reserve while unpinning in order to make sure we always have enough space to do our work. With this patch we no longer end up with an aborted transaction, we return ENOSPC properly to the person trying to create the snapshot. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: cleanup unused argumentsLiu Bo
'disk_key' is not used at all. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: kill unnecessary arguments in del_ptrLiu Bo
The argument 'tree_mod_log' is not necessary since all of callers enable it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: reorder tree mod log operations in deleting a pointerLiu Bo
Since we don't use MOD_LOG_KEY_REMOVE_WHILE_MOVING to add nritems during rewinding, we should insert a MOD_LOG_KEY_REMOVE operation first. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: MOD_LOG_KEY_REMOVE_WHILE_MOVING never change node's nritemsLiu Bo
Key MOD_LOG_KEY_REMOVE_WHILE_MOVING means that we're doing memmove inside an extent buffer node, and the node's number of items remains unchanged (unless we are inserting a single pointer, but we have MOD_LOG_KEY_ADD for that). So we don't need to increase node's number of items during rewinding, otherwise we may get an node larger than leafsize and cause general protection errors later. Here is the details, - If we do memory move for inserting a single pointer, we need to add node's nritems by one, and we honor MOD_LOG_KEY_ADD for adding. - If we do memory move for deleting a single pointer, we need to decrease node's nritems by one, and we honor MOD_LOG_KEY_REMOVE for deleting. - If we do memory move for balance left/right, we need to decrease node's nritems, and we honor MOD_LOG_KEY_REMOVE for balaning. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: fix unnecessary while loop when search the free space, cacheMiao Xie
When we find a bitmap free space entry, we may check the previous extent entry covers the offset or not. But if we find this entry is also a bitmap entry, we will continue to check the previous entry of the current one by a while loop. It is unnecessary because it is impossible that the extent entry which is in front of a bitmap entry can cover the offset of the entry after that bitmap entry. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: recheck bio against block device when we map the bioJosef Bacik
Alex reported a problem where we were writing between chunks on a rbd device. The thing is we do bio_add_page using logical offsets, but the physical offset may be different. So when we map the bio now check to see if the bio is still ok with the physical offset, and if it is not split the bio up and redo the bio_add_page with the physical sector. This fixes the problem for Alex and doesn't affect performance in the normal case. Thanks, Reported-and-tested-by: Alex Elder <elder@inktank.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: improve the noflush reservationMiao Xie
In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: fix wrong comment in can_overcommit()Miao Xie
The comment is not coincident with the code. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-11Btrfs: cleanup duplicated division functionsMiao Xie
div_factor{_fine} has been implemented for two times, cleanup it. And I move them into a independent file named math.h because they are common math functions. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-11-19Fix misspellings of "whether" in comments.Adam Buchbinder
"Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-01Btrfs: Fix printk and variable nameMasanari Iida
Correct spelling typo in btrfs. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-30btrfs: unpin_extent_cache: fix the typo and unnecessary arguementsLiu Bo
- unpint->unpin - prealloc is no more used Signed-off-by: Liu Bo <liub.liubo@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This has our series of fixes for the next rc. The biggest batch is from Jan Schmidt, fixing up some problems in our subvolume quota code and fixing btrfs send/receive to work with the new extended inode refs." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: do not bug when we fail to commit the transaction Btrfs: fix memory leak when cloning root's node Btrfs: Use btrfs_update_inode_fallback when creating a snapshot Btrfs: Send: preserve ownership (uid and gid) also for symlinks. Btrfs: fix deadlock caused by the nested chunk allocation btrfs: Return EINVAL when length to trim is less than FSB Btrfs: fix memory leak in btrfs_quota_enable() Btrfs: send correct rdev and mode in btrfs-send Btrfs: extended inode refs support for send mechanism Btrfs: Fix wrong error handling code Fix a sign bug causing invalid memory access in the ino_paths ioctl. Btrfs: comment for loop in tree_mod_log_insert_move Btrfs: fix extent buffer reference for tree mod log roots Btrfs: determine level of old roots Btrfs: tree mod log's old roots could still be part of the tree Btrfs: fix a tree mod logging issue for root replacement operations Btrfs: don't put removals from push_node_left into tree mod log twice
2012-10-25Btrfs: do not bug when we fail to commit the transactionJosef Bacik
We BUG if we fail to commit the transaction when creating a snapshot, which is just obnoxious. Remove the BUG_ON(). Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-25Btrfs: fix memory leak when cloning root's nodeLiu Bo
After cloning root's node, we forgot to dec the src's ref which can lead to a memory leak. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-10-25Merge branch 'for-chris-fixed' of git://git.jan-o-sch.net/btrfs-unstableChris Mason
2012-10-25Btrfs: Use btrfs_update_inode_fallback when creating a snapshotJosef Bacik
On a really full file system I was getting ENOSPC back from btrfs_update_inode when trying to update the parent inode when creating a snapshot. Just use the fallback method so we can update the inode and not have to worry about having a delayed ref. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-25Btrfs: Send: preserve ownership (uid and gid) also for symlinks.Alex Lyakas
This patch also requires a change in the user-space part of "receive". We need to use "lchown" instead of "chown". We will do this in the following patch. Signed-off-by: Alex Lyakas <alex.btrfs@zadarastorage.com> if (S_ISREG(sctx->cur_inode_mode)) {
2012-10-25Btrfs: fix deadlock caused by the nested chunk allocationMiao Xie
Steps to reproduce: # mkfs.btrfs -m raid1 <disk1> <disk2> # btrfstune -S 1 <disk1> # mount <disk1> <mnt> # btrfs device add <disk3> <disk4> <mnt> # mount -o remount,rw <mnt> # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1 Deadlock happened. It is because of the nested chunk allocation. When we wrote the data into the filesystem, we would allocate the data chunk because there was no data chunk in the filesystem. At the end of the data chunk allocation, we should insert the metadata of the data chunk into the extent tree, but there was no raid1 chunk, so we tried to lock the chunk allocation mutex to allocate the new chunk, but we had held the mutex, the deadlock happened. By rights, we would allocate the raid1 chunk when we added the second device because the profile of the seed filesystem is raid1 and we had two devices. But we didn't do that in fact. It is because the last step of the first device insertion didn't commit the transaction. So when we added the second device, we didn't cow the tree, and just inserted the relative metadata into the leaves which were generated by the first device insertion, and its profile was dup. So, I fix this problem by commiting the transaction at the end of the first device insertion. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-25btrfs: Return EINVAL when length to trim is less than FSBLukas Czerner
Currently if len argument in btrfs_ioctl_fitrim() is smaller than one FSB we will continue and finally return 0 bytes discarded. However if the length to discard is smaller then file system block we should really return EINVAL. Signed-off-by: Lukas Czerner <lczerner@redhat.com>
2012-10-25Btrfs: fix memory leak in btrfs_quota_enable()Tsutomu Itoh
We should free quota_root before returning from the error handling code. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-25Btrfs: send correct rdev and mode in btrfs-sendArne Jansen
When sending a device file, the stream was missing the mode. Also the rdev was encoded wrongly. Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-10-25Btrfs: extended inode refs support for send mechanismJan Schmidt
This adds support for the new extended inode refs to btrfs send. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-25Btrfs: Fix wrong error handling codeStefan Behrens
gcc says "warning: comparison of unsigned expression >= 0 is always true" because i is an unsigned long. And gcc is right this time. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-10-25Fix a sign bug causing invalid memory access in the ino_paths ioctl.Gabriel de Perthuis
To see the problem, create many hardlinks to the same file (120 should do it), then look up paths by inode with: ls -i btrfs inspect inode-resolve -v $ino /mnt/btrfs I noticed the memory layout of the fspath->val data had some irregularities (some unnecessary gaps that stop appearing about halfway), so I'm not sure there aren't any bugs left in it.
2012-10-24Btrfs: comment for loop in tree_mod_log_insert_moveJan Schmidt
Emphasis the way tree_mod_log_insert_move avoids adding MOD_LOG_KEY_REMOVE_WHILE_MOVING operations, depending on the direction of the move operation. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-24Btrfs: fix extent buffer reference for tree mod log rootsJan Schmidt
In get_old_root we grab a lock on the extent buffer before we obtain a reference on that buffer. That order is changed now. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-24Btrfs: determine level of old rootsJan Schmidt
In btrfs_find_all_roots' termination condition, we compare the level of the old buffer we got from btrfs_search_old_slot to the level of the current root node. We'd better compare it to the level of the rewinded root node. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-24Btrfs: tree mod log's old roots could still be part of the treeJan Schmidt
Tree mod log treated old root buffers as always empty buffers when starting the rewind operations. However, the old root may still be part of the current tree at a lower level, with still some valid entries. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-23Btrfs: fix a tree mod logging issue for root replacement operationsJan Schmidt
Avoid the implicit free by tree_mod_log_set_root_pointer, which is wrong in two places. Where needed, we call tree_mod_log_free_eb explicitly now. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-23Btrfs: don't put removals from push_node_left into tree mod log twiceJan Schmidt
Independant of the check (push_items < src_items) tree_mod_log_eb_copy did log the removal of the old data entries from the source buffer. Therefore, we must not call tree_mod_log_eb_move if the check evaluates to true, as that would log the removal twice, finally resulting in (rewinded) buffers with wrong values for header_nritems. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-10-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace compile fixes from Eric W Biederman: "This tree contains three trivial fixes. One compiler warning, one thinko fix, and one build fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: btrfs: Fix compilation with user namespace support enabled userns: Fix posix_acl_file_xattr_userns gid conversion userns: Properly print bluetooth socket uids
2012-10-12btrfs: Fix compilation with user namespace support enabledEric W. Biederman
When compiling with user namespace support btrfs fails like: fs/btrfs/tree-log.c: In function ‘fill_inode_item’: fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’ fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’ fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’ fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’ Fix this by using i_uid_read and i_gid_read in Cc: Chris Mason <chris.mason@fusionio.com> Cc: Josef Bacik <jbacik@fusionio.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12audit: overhaul __audit_inode_child to accomodate retryingJeff Layton
In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12audit: reverse arguments to audit_inode_childJeff Layton
Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "This is a large pull, with the bulk of the updates coming from: - Hole punching - send/receive fixes - fsync performance - Disk format extension allowing more hardlinks inside a single directory (btrfs-progs patch required to enable the compat bit for this one) I'm cooking more unrelated RAID code, but I wanted to make sure this original batch makes it in. The largest updates here are relatively old and have been in testing for some time." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits) btrfs: init ref_index to zero in add_inode_ref Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer Btrfs: fix page leakage Btrfs: do not warn_on when we cannot alloc a page for an extent buffer Btrfs: don't bug on enomem in readpage Btrfs: cleanup pages properly when ENOMEM in compression Btrfs: make filesystem read-only when submitting barrier fails Btrfs: detect corrupted filesystem after write I/O errors Btrfs: make compress and nodatacow mount options mutually exclusive btrfs: fix message printing Btrfs: don't bother committing delayed inode updates when fsyncing btrfs: move inline function code to header file Btrfs: remove unnecessary IS_ERR in bio_readpage_error() btrfs: remove unused function btrfs_insert_some_items() Btrfs: don't commit instead of overcommitting Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called Btrfs: be smarter about dropping things from the tree log Btrfs: don't lookup csums for prealloc extents Btrfs: cache extent state when writing out dirty metadata pages Btrfs: do not hold the file extent leaf locked when adding extent item ...
2012-10-09btrfs: init ref_index to zero in add_inode_refChris Mason
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-10-09Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_bufferWang Sheng-Hui
In csum_dirty_buffer, we first get eb from page->private. Then we check if the page is the first page of eb. Later we check it again. Remove the repeated check here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-10-09Btrfs: fix page leakageJosef Bacik
Alloc_dummy_extent_buffer will not free the first page in the eb array if we fail to allocate a page, fix this. Thanks, Reported-by: David Sterba <dave@jikos.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09Btrfs: do not warn_on when we cannot alloc a page for an extent bufferJosef Bacik
It's just annoying and the user will have gotten a nice OOM killer message so they are already fully aware they are screwed :). Thanks, Reported-by: Jérôme Poulin <jeromepoulin@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09Btrfs: don't bug on enomem in readpageJosef Bacik
Get rid of the BUG_ON(ret == -ENOMEM) in __extent_read_full_page. Thanks, Reported-by: Jérôme Poulin <jeromepoulin@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09Btrfs: cleanup pages properly when ENOMEM in compressionJosef Bacik
We were freeing non-existent pages which was causing a panic for a user who was suffering from ENOMEM. This patch fixes the problem. Thanks, Reported-by: Jérôme Poulin <jeromepoulin@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>